## General Overview pages, getting started guides, and general documentation. --- title: Alerts and Notifications source: https://docs.snowflake.com/en/guides-overview-alerts.md section: General --- # Alerts and Notifications You can use Snowflake alerts to send notifications and perform actions automatically. In SQL, you can send a notification to an email address or queue by calling a built-in stored procedure.
[Snowflake Alerts](/user-guide/alerts)
If you need to send a notification or perform an action when data in Snowflake meets certain conditions, you can set up a Snowflake Alert. You can also [pass configuration values](#label-alerts-config) to an alert to parameterize its condition and action logic. Learn how to create, configure, and maintain Snowflake alerts.
[](/user-guide/notifications/about-notifications)
You can configure Snowflake to send notifications about [Snowpipe](/user-guide/data-load-snowpipe-intro) and [task](/user-guide/tasks-intro) errors to a cloud provider queue (Amazon SNS, Microsoft Azure Event Grid, or Google Cloud Pub/Sub). You can also use a SQL statement to send a notification to an email address, a cloud provider queue, or a webhook. Learn how to configure Snowflake to send notifications.
--- title: API Reference source: https://docs.snowflake.com/en/api-reference.md section: General --- # API Reference These topics provide reference information for the APIs available in Snowflake. **APIs for connecting to Snowflake**
Connector / Driver / Client API Resources Go Driver - [Developer Guide](/developer-guide/golang/go-driver) - [API Reference](https://pkg.go.dev/github.com/snowflakedb/gosnowflake#pkg-index) JDBC Driver - [Developer Guide](/developer-guide/jdbc/jdbc) - [JDBC API Support Reference](/developer-guide/jdbc/jdbc-api) .NET Driver - [Developer Guide](/developer-guide/dotnet/dotnet-driver) - [Source code in GitHub](https://github.com/snowflakedb/snowflake-connector-net/) Node.js Driver - [Developer Guide](/developer-guide/node-js/nodejs-driver) - [Source code in GitHub](https://github.com/snowflakedb/snowflake-connector-nodejs/) ODBC Driver - [Developer Guide](/developer-guide/odbc/odbc) - [ODBC Driver API Support Reference](/developer-guide/odbc/odbc-api) PHP PDO Driver - [Developer Guide](/developer-guide/php-pdo/php-pdo-driver) - [Source code in GitHub](https://github.com/snowflakedb/pdo_snowflake/) Snowflake Connector for Kafka - [Developer Guide](/user-guide/kafka-connector/index) - [Source code in GitHub](https://github.com/snowflakedb/snowflake-kafka-connector) Snowflake Connector for Python - [Developer Guide](/developer-guide/python-connector/python-connector) - [API Reference](/developer-guide/python-connector/python-connector-api) - [Getting Started With Python](https://quickstarts.snowflake.com/guide/getting_started_with_python/index.html?index=..%2F..index) Snowflake Connector for Spark - [Developer Guide](/user-guide/spark-connector) - [Source code in GitHub](https://github.com/snowflakedb/spark-snowflake) %sf-python% - [Developer Guide](/developer-guide/snowflake-python-api/snowflake-python-overview) - [API Reference](developer-guide/snowflake-python-api/reference/latest/index) %sf-rest% - [Developer Guide](/developer-guide/snowflake-rest-api/snowflake-rest-api) - [API Reference](developer-guide/snowflake-rest-api/reference) Snowflake SQL API - [Developer Guide](/developer-guide/sql-api/index) - [API Reference](/developer-guide/sql-api/reference) - [SQL API Playground](https://api.developers.snowflake.com/)
**APIs for extending Snowflake:**
Extensibility Feature Resources User-Defined Functions (UDFs) - [Developer Guide](/developer-guide/udf/udf-overview) - [Getting Started With User-Defined Functions](https://quickstarts.snowflake.com/guide/getting_started_with_user_defined_functions/index.html?index=..%2F..index) Snowpark for Scala - [Developer Guide](/developer-guide/snowpark/scala/index) - [API Reference](developer-guide/snowpark/reference/scala/com/snowflake/snowpark/index.html) - [Getting Started With Snowpark in Scala](https://quickstarts.snowflake.com/guide/getting_started_with_snowpark_scala/index.html) Snowpark for Java - [Developer Guide](/developer-guide/snowpark/java/index) - [API Reference](developer-guide/snowpark/reference/java/index.html) Snowpark for Python - [Developer Guide](/developer-guide/snowpark/python/index) - [API Reference](/developer-guide/snowpark/reference/python/latest/index.html) Snowflake ML for Python - [Developer Guide](/developer-guide/snowflake-ml/overview) - [API Reference](/developer-guide/snowpark-ml/reference/latest/index.html) External Functions - [Developer Guide](/sql-reference/external-functions) Stored Procedures - [Developer Guide](/developer-guide/stored-procedure/stored-procedures-overview) - [API Reference](/developer-guide/stored-procedure/stored-procedures-api)
--- title: Appendices source: https://docs.snowflake.com/en/appendices.md section: General --- # Appendices - [](/sql-reference/conventions) Notational conventions used in the Snowflake documentation. - [](/sql-reference/reserved-keywords) List of words reserved for Snowflake SQL. --- title: Applications and tools for connecting to Snowflake source: https://docs.snowflake.com/en/guides-overview-connecting.md section: General --- # Applications and tools for connecting to Snowflake Snowflake provides several different applications and tools that you can use to access databases in Snowflake. For information about configuring clients, driver, libraries, and third-party applications to connect to Snowflake, see [](/user-guide/gen-conn-config). ## User interface
[](/user-guide/ui-snowsight)
%sf-web-interface% distills Snowflake's powerful SQL support into a unified, easy-to-use experience. Use %sf-web-interface% to perform your critical Snowflake operations.
## Command-line clients
[](/developer-guide/snowflake-cli/index)
Use the command line to create, manage, update, and view apps running on Snowflake across workloads.
[](/user-guide/snowsql)
Detailed instructions for installing, configuring, and using the Snowflake command-line client.
## Extensions for code editors
[](/user-guide/vscode-ext)
Use the %sf-vscext% to connect to Snowflake within Visual Studio Code and perform SQL operations.
## Infrastructure as code The following content is not supported by Snowflake. All code is provided "AS IS" and without warranty.
[](/user-guide/terraform)
Documentation and resources for the Snowflake Terraform provider.
## Drivers and libraries
[](/api-reference)
Lists the drivers and APIs provided by Snowflake for writing applications that connect to Snowflake.
## Integrating with third-party systems
[Snowflake Connectors](https://other-docs.snowflake.com/connectors.html)
Snowflake Connectors allow you to integrate third-party applications and database systems with Snowflake.
## Third-party software
[](/user-guide/ecosystem)
Overview of the third-party tools and technologies, as well as the Snowflake-provided clients, in the Snowflake ecosystem.
--- title: Cost & billing source: https://docs.snowflake.com/en/guides-overview-cost.md section: General --- # Cost & billing Snowflake provides a robust framework to manage costs. You can also obtain monthly usage statements and reconcile those statements with usage data in views. ## Cost management
[](/user-guide/cost-management-overview)
Snowflake's cost management framework helps you manage costs across the dimensions of visibility, control, and optimization. Learn about the framework and the features that support each part of it.
[](/user-guide/cost-understanding-overall)
The total cost of using Snowflake is the aggregate of the cost of using data transfer, storage, and compute resources. Learn about how overall cost is calculated.
[](/user-guide/cost-exploring-overall)
Snowsight allows you to quickly and easily obtain information about cost from a visual dashboard. Queries against the usage views allow you to drill down into cost data and can help generate custom reports and dashboards. Learn about exploring your spend using various queries to return cost information.
[](/user-guide/cost-anomalies)
A cost anomaly occurs when daily consumption falls outside the expected range. Snowflake automatically detects these anomalies and provides tools to help you investigate the root cause. Learn how to identify and investigate cost anomalies at the account and organization level.
[](/user-guide/cost-optimize)
Learn how to optimize Snowflake in order to reduce costs and maximize your spend.
[](/user-guide/cost-attributing)
Gain insight into Snowflake cost by attributing those costs to logical units within the organization such as departments, environments or other entities. Learn how to attribute cost to differing entities within your organization.
[](/user-guide/cost-controlling)
Cost controls allow you to limit how much is spent on various services such as virtual warehouses. [Budgets](/user-guide/budgets) allow you to monitor the credit usage of supported objects and serverless features in your account. [Resource monitors](/user-guide/resource-monitors) allow you to monitor credit usage by user-managed virtual warehouses and the cloud services layer of the Snowflake architecture.
[](/user-guide/cost-access-control)
Snowflake provides system-defined roles that grant access to cost management features. Learn about the roles that control access to cost management features.
## Billing
[](/user-guide/billing-invoices)
Learn how to use %sf-web-interface% to view and download billing invoices.
[](/user-guide/billing-payment-history)
Learn how to use %sf-web-interface% to view payment activity, including charges, refunds, and authorization holds.
[](/user-guide/billing-usage-statement)
Learn how to use %sf-web-interface% to view and download monthly usage statements.
[](/user-guide/billing-reconcile)
Learn how to execute queries to reconcile usage data shown on a usage statement with data in the billing views of the Organization Usage schema.
[](/user-guide/billing-contacts)
Learn how to use %sf-web-interface% to update billing contact information.
--- title: Data Governance in Snowflake source: https://docs.snowflake.com/en/guides-overview-govern.md section: General --- # Data Governance in Snowflake Snowflake provides industry-leading features that ensure the highest levels of governance for your account and users, as well as all the data you store and access in Snowflake.
[Data Quality Monitoring and data metric functions](/user-guide/data-quality-intro)
Allows the monitoring of the state and integrity of your data using system data metric functions and user-defined data metric functions.
[Column-level Security](/user-guide/security-column-intro)
Allows the application of a masking policy to a column within a table or view.
[Row-level Security](/user-guide/security-row-intro)
Allows the application of a row access policy to a table or view to determine which rows are visible in the query result.
[](/user-guide/object-tagging/introduction)
Allows the tracking of sensitive data for compliance, discovery, protection, and resource usage.
[](/user-guide/tag-based-masking-policies)
Allows protecting column data by assigning a masking policy to a tag and then setting the tag on a database object or the Snowflake account.
[Sensitive data classification](/user-guide/classify-intro)
Allows categorizing potentially personal and/or sensitive data to support compliance and privacy regulations.
[](/user-guide/access-history)
Allows the auditing of the user access history through the Account Usage [](/sql-reference/account-usage/access_history).
[](/user-guide/object-dependencies)
Allows the auditing of how one object references another object by its metadata (e.g. creating a view depends on a table name and column names) through the Account Usage [OBJECT_DEPENDENCIES](/sql-reference/account-usage/object_dependencies) view.
Data Governance area in %sf-web-interface%
Allows you to use **Governance & security** on %sf-web-interface% to access governance features. For details, see: - [](#label-data-protection-policies-get-started) - [](#label-object-tagging-create-tag-snowsight) - [](#label-object-tagging-snowsight) - [](#label-object-tagging-assign-ui) - [](#label-security-column-intro-snowsight) - [](#label-security-row-intro-snowsight)
--- title: Data sharing and collaboration in Snowflake source: https://docs.snowflake.com/en/guides-overview-sharing.md section: General --- # Data sharing and collaboration in Snowflake - [](/user-guide/data-marketplace) - [](/user-guide/data-sharing-intro) - [](/user-guide/data-sharing-views) - [](/user-guide/data-exchange) - [Snowflake Data Clean Rooms](https://other-docs.snowflake.com/en/cleanrooms/introduction) There are many ways to share data from your Snowflake account with users in other Snowflake accounts, including collaborating with other parties in a secure environment. ## Why share data with Snowflake When you use Snowflake to share data as a provider, you can manage who has access to your data, and avoid challenges keeping your data synchronized across different people and groups. As a data consumer, you can reduce the data transformations you need to perform because the data stays in Snowflake, making it easy to join datasets shared with you with your own data. If you share your data using listings, you can include metadata with your data share, such as a title and description, and usage examples to help consumers use the data quickly. In addition to the benefits for consumers, as a provider you get access to usage data, automatically replicate your data to other regions, and can even decide to charge for access to your data or offer some datasets publicly on the %sf-marketplace%. ## Options for sharing Listings let you share data with people in any Snowflake region, across clouds, without performing manual replication tasks. If you use listings, you can provide additional metadata for the data that you share, view customer data usage, and for listings offered publicly on the %sf-marketplace%, gauge consumer interest in your listings. If you don't want to share data using a listing, you can use a direct share instead, see [Secure data sharing](/user-guide/data-sharing-intro) and [Non-secure data sharing](/user-guide/data-sharing-views). No matter which option you choose, you can share with people who don't have Snowflake accounts by using [Reader Accounts](/user-guide/data-sharing-reader-create).
Data Sharing Mechanism Share With Whom? Auto-fulfill Across Clouds? Optionally Charge for Data? Optionally Offer Data Publicly? Get Consumer Usage Metrics? [](#label-about-listings) One or more accounts in any region Yes Yes Yes Yes [](#label-about-direct-share) One or more accounts in your region No No No No
If you want to manage a group of accounts, and control who can publish and consume listings in that group, consider using a [](#label-about-data-exchange). ## Listing You can offer a listing privately to specific accounts, or publicly on the %sf-marketplace%. For more about the %sf-marketplace%, see [](/collaboration/collaboration-marketplace-about). After you accept the provider and consumer terms, you can start sharing and consuming data shared with you with a listing. For more information, see [About listings](https://other-docs.snowflake.com/en/collaboration/collaboration-listings-about). To learn more about sharing listings to or from [](#label-snowflake-editions-vps), see [](/collaboration/virtual-private-snowflake/about-vps-collaboration). ## Direct share Use a direct share to share data with one or more accounts in the same Snowflake region. You don't need to copy or move data shared with a direct share. If you want to convert a direct share with active consumers to a listing, see [Convert a direct share to a listing](https://other-docs.snowflake.com/en/collaboration/provider-listings-creating-publishing#convert-a-direct-share-to-a-private-listing). For more information, see [](/user-guide/data-sharing-gs). ## Data Exchange If creating listings that you offer privately to specific accounts isn't an option, you can use a data exchange to share data with a selected group of accounts that you invite. You must request that a data exchange be provisioned and configured for your account, then you can invite members to the exchange and specify whether they can consume data, provide data, or both. For more information, see [](/user-guide/data-exchange). ## Collaborating with shared data in a secure environment When you use listings, direct shares, and Data Exchange to share data with another party, they can directly access the data. If you want to share data with other parties, but want to control how that data is accessed, you can use a %samooha-clean-room% to collaborate. The provider who is sharing their data in a clean room defines what analyses can be run against the shared data, which allows the consumer to gather insights from the data without having unrestricted access to it. For more information, see [](/user-guide/cleanrooms/overview). --- title: Databases, Tables and Views - Overview source: https://docs.snowflake.com/en/guides-overview-db.md section: General --- # Databases, Tables and Views - Overview - [](/sql-reference/sql-ddl-summary) - [](/developer-guide/snowflake-python-api/snowflake-python-managing-databases) All data in Snowflake is maintained in databases. Each database consists of one or more schemas, which are logical groupings of database objects, such as tables and views. Snowflake does not place any hard limits on the number of databases, schemas (within a database), or objects (within a schema) you can create. Use the following pages to learn about tables and table types, views, design considerations and other related content.
[](/user-guide/tables-micro-partitions)
Introduction to *micro-partitions* and *data clustering*, two of the principal concepts utilized in Snowflake physical table structures.
[Temporary and Transient Tables](/user-guide/tables-temp-transient)
Snowflake supports creating temporary tables for storing non-permanent, transitory data such as ETL data, session-specific or other short lived data.
[External Tables](/user-guide/tables-external-intro)
Snowflake supports the concept of an external table. External tables are read-only, and their files are stored in an external stage.
[Hybrid Tables](/user-guide/tables-hybrid)
Snowflake supports the concept of a hybrid table. Hybrid tables provide optimized performance for read and write operations in transactional and hybrid workloads.
[](/user-guide/tables-iceberg)
Snowflake supports the %iceberg-tm% open table format. Iceberg tables use data in external cloud storage and give you the option to use Snowflake as the Iceberg catalog, an external Iceberg catalog, or to create a table from files in object storage.
[Views](/user-guide/views-introduction)
A view allows the result of a query to be accessed as if it were a table. Views serve a variety of purposes, including combining, segregating, and protecting data.
[Secure Views](/user-guide/views-secure)
Snowflake supports the concept of a secure view. Secure views are specifically designed for data privacy. For example to limit access to sensitive data that should not be exposed to all users of the underlying table(s).
[Materialized Views](/user-guide/views-materialized)
Materialized views are views precomputed from data derived from a query specification and stored for later use. Querying a materialized view is faster than executing a query against the base table of the view because the data is pre-computed.
[Table Design Best Practices](/user-guide/table-considerations)
Best practices, general guidelines, and important considerations when designing and managing tables.
[Cloning Best Practices](/user-guide/object-clone)
Best practices, general guidelines, and important considerations when cloning objects in Snowflake, particularly databases, schemas, and permanent tables.
[](/user-guide/tables-storage-considerations)
Best practices and guidelines for controlling data storage costs associated with Continuous Data Protection (CDP), particularly for tables.
--- title: Function and stored procedure reference source: https://docs.snowflake.com/en/sql-reference-functions.md section: General --- # Function and stored procedure reference - [](/sql-reference/external-functions) - [](/developer-guide/udf/udf-overview) These topics provide reference information for the system-defined functions and system-defined stored procedures. - [](/sql-reference/intro-summary-operators-functions) — combined summary of all system-defined functions. Can be used as a quick-reference. - [](/sql-reference/functions-all) — alphabetical list of all system-defined functions (scalar, aggregate, table, etc.). - [](/sql-reference/functions-aggregation) — functions that take multiple rows/values as input and return a single value. - [](/sql-reference/functions) — functions that take a single row/value as input and return a single value: - [](/sql-reference/expressions-byte-bit) - [](/sql-reference/expressions-conditional) - [](/sql-reference/functions-context) - [](/sql-reference/functions-conversion) - [](/sql-reference/functions-data-generation) - [](/sql-reference/functions-date-time) - [](/sql-reference/functions-differential-privacy) - [](/sql-reference/functions-encryption) - [](/sql-reference/functions-geospatial) - [](/sql-reference/functions-hash-scalar) - [](/sql-reference/functions-metadata) - [](/sql-reference/functions-notification) - [](/sql-reference/functions-numeric) - [](/sql-reference/functions-semistructured) - [](/sql-reference/functions-regexp) — regular expression (search) functions - [](/sql-reference/functions-string) - [](/sql-reference/functions-vector) - [](/sql-reference/functions-model-monitors) — functions that retrieve metrics from machine learning model monitors. - [](/sql-reference/functions-system) — functions that perform control operations or return system-level information. - [](/sql-reference/functions-table) — functions that return results in tabular format. - [](/sql-reference/functions-window) — functions that run analytic calculations, such as moving aggregations and rankings. - [](/sql-reference/functions-data-metric) — functions that enable data quality measurements for tables and views. - [](/sql-reference-stored-procedures) — stored procedures to facilitate using certain Snowflake features. --- title: Get started with Snowflake for users source: https://docs.snowflake.com/en/getting-started-for-users.md section: General --- # Get started with Snowflake for users These topics get you started with Snowflake:
[](/user-guide/setup)
Overview of getting an account and methods for accessing Snowflake.
[](/user-guide/connecting)
Overview of the different ways to connect to Snowflake.
[](/user-guide/intro-key-concepts)
Description of Snowflake architecture, key concepts, and features.
[](/user-guide/ui-snowsight-quick-tour)
Overview of %sf-web-interface%, Snowflake's web-based interface.
[](/user-guide/data-lifecycle)
Introduces the main operations and corresponding SQL commands for getting your data into Snowflake and then using it to perform queries and other SQL operations.
--- title: Key concepts for Snowflake administrators source: https://docs.snowflake.com/en/concepts-for-administrators.md section: General --- # Key concepts for Snowflake administrators These topics cover key concepts related to administering Snowflake. ## Cloud platforms and regions These topics describe the cloud infrastructure on which Snowflake runs:
[](/user-guide/intro-cloud-platforms)
Describes the cloud computing platforms on which Snowflake is offered, which include Amazon Web Services (AWS), Google Cloud, and Microsoft Azure.
[](/user-guide/intro-regions)
Describes the different cloud platform regions in which Snowflake is offered. This topic helps you choose where your data is geographically stored and your compute resources are provisioned.
## Editions, releases, and features
[](/user-guide/intro-editions)
Describes the services and features that are included with each edition of Snowflake. This topic helps you choose the right edition for your organization.
[](/user-guide/intro-releases)
Describes the Snowflake release process and provides instructions for requesting 12-hour early access for Enterprise Edition and Business Critical Edition accounts, or 24-hour early access for Virtual Private Snowflake (VPS) accounts.
[](/user-guide/intro-supported-features)
Lists the key features of Snowflake to help you decide which features you want to use.
## Security and compliance
[](/user-guide/data-cdp)
Introduces the features that Snowflake provides for ensuring your data is protected, secure, and available.
[](/user-guide/intro-compliance)
Describes the major regulatory compliance standards Snowflake meets to ensure the highest levels of data assurance, security, and governance for data in Snowflake.
--- title: Load data into Snowflake source: https://docs.snowflake.com/en/guides-overview-loading-data.md section: General --- # Load data into Snowflake Data can be loaded into Snowflake in a number of ways. The following topics provide an overview of data loading concepts, tasks, tools, and techniques to quick and easily load data into your Snowflake database. - [](/user-guide/ecosystem-partner-connect) - [](/user-guide/data-unload-overview) - [](/sql-reference/sql-dml) - [](/user-guide/tables-iceberg-load)
[](/user-guide/data-load-overview)
Options available to load data into Snowflake.
[](/user-guide/intro-summary-loading)
Reference of the supported features for using the [](/sql-reference/sql/copy-into-table) command to load data from files.
[](/user-guide/data-load-tutorials)
Learn how to load data by using step-by-step instructions in tutorials.
[](/user-guide/data-load-considerations)
Best practices, general guidelines, and important considerations for bulk data loading.
[](/user-guide/data-load-s3-compatible-storage)
Instructions for accessing data in other storage.
[](/user-guide/data-load-web-ui)
Instructions for loading limited amounts of data using the web interface.
[](/user-guide/semistructured-intro)
Considerations for loading semi-structured data.
[](/user-guide/unstructured-intro)
Considerations for loading unstructured data.
[](/user-guide/data-load-local-file-system)
Instructions for loading data in bulk using the COPY command.
[](/user-guide/data-load-snowpipe-intro)
Instructions for loading data continuously using Snowpipe.
[](/user-guide/snowpipe-streaming/data-load-snowpipe-streaming-overview)
Instructions for loading data streams continuously using Snowpipe Streaming.
[](/user-guide/multi-location-resilience-data-pipelines)
Guidance for resilient Snowpipe and COPY INTO data pipelines across locations.
[](/user-guide/data-load-transform)
Instructions for transforming data while loading it into a table using the COPY INTO command.
[](/user-guide/querying-stage)
Instructions on using standard SQL to query internal and external named stages.
[](/user-guide/querying-metadata)
Instructions on querying metadata in internal and external stages.
--- title: Managing Snowflake source: https://docs.snowflake.com/en/user-guide-manage.md section: General --- # Managing Snowflake These topics describes the tasks associated with using Snowflake. - [](/user-guide/warehouses) — Key concepts and tasks for creating and using virtual warehouses to execute queries and perform DML operations, such as loading and unloading data: - [](/user-guide/warehouses-overview) - [](/user-guide/warehouses-multicluster) - [](/user-guide/warehouses-considerations) - [](/user-guide/warehouses-tasks) - [](/user-guide/query-acceleration-service) - [](/user-guide/warehouses-load-monitoring) - [](/user-guide/databases) — Key concepts and tasks related to understanding and working with Snowflake databases and tables: - [](/user-guide/tables-micro-partitions) - [](/user-guide/tables-temp-transient) - [](/user-guide/tables-external-intro) - [](/user-guide/views-introduction) - [](/user-guide/views-secure) - [](/user-guide/views-materialized) - [](/user-guide/table-considerations) - [](/user-guide/object-clone) - [](/user-guide/tables-storage-considerations) - [](/guides-overview-queries) — Key concepts and tasks for executing queries in Snowflake: - [](/user-guide/querying-joins) - [](/user-guide/join-elimination) - [](/user-guide/querying-subqueries) - [](/user-guide/queries-hierarchical) - [](/user-guide/queries-cte) - [](/user-guide/querying-semistructured) - [](/user-guide/functions-window-using) - [](/user-guide/match-recognize-introduction) - [](/user-guide/querying-sequences) - [](/user-guide/querying-persisted-results) - [](/user-guide/querying-distinct-counts) - [](/user-guide/querying-approximate-similarity) - [](/user-guide/querying-approximate-frequent-values) - [](/user-guide/querying-approximate-percentile-values) - [](/user-guide/ui-snowsight-query) - [](/user-guide/querying-cancel-statements) - [](/sql-reference/data-types-datetime) — Reference information and examples for working with dates, times and timestamps, and time zones in Snowflake: - [](/sql-reference/date-time-input-output) - [](/sql-reference/date-time-examples) - [](/user-guide/semistructured-intro) — Key concepts and tasks for working with JSON and other types of semi-structured data: - [](/user-guide/semistructured-intro) - [](/user-guide/semistructured-data-formats) - [](/user-guide/semistructured-considerations) - [](/user-guide/tutorials/json-basics-tutorial) - [](/user-guide/unstructured-intro) — Key concepts and tasks for working with unstructured data: - [](/user-guide/data-load-dirtables) - [](/user-guide/data-load-unstructured-rest-api) - [](/user-guide/unstructured-data-sharing) - [](/user-guide/unstructured-ts) - [](/sql-reference/data-types-text) — Reference information and examples for working with binary data in Snowflake: - [](/sql-reference/binary-input-output) - [](/sql-reference/binary-examples) - [](/user-guide/data-availability) — Key concepts and tasks for understanding how Snowflake maintains access to deleted and modified data, and also how Snowflake enables data recovery in the event of loss: - [](/user-guide/data-time-travel) - [](/user-guide/data-failsafe) - [](/user-guide/data-cdp-storage-costs) - [](/user-guide/data-pipelines-intro) — Key concepts and tasks for transforming and optimizing loaded data for analysis: - [](/user-guide/streams-intro) - [](/user-guide/tasks-intro) - [Introduction to dynamic tables](/user-guide/dynamic-tables/overview) - [](/user-guide/replication-intro) — Key concepts and tasks for replicating and failing over objects across multiple Snowflake accounts, as well as redirecting client connections, for business continuity and disaster recovery: - [](/user-guide/account-replication-intro) - [](/user-guide/client-redirect) - [](/user-guide/sample-data) — Key concepts and tasks for using the sample data sets provided with Snowflake: - [](/user-guide/sample-data-using) - [](/user-guide/sample-data-tpch) - [](/user-guide/sample-data-openweathermap) --- title: Managing Your Snowflake Account source: https://docs.snowflake.com/en/user-guide-admin.md section: General --- # Managing Your Snowflake Account - [](/guides-overview-secure) These topics describe the administrative concepts and tasks associated with managing your account in Snowflake. These topics are intended primarily for administrators (i.e. users with the ACCOUNTADMIN, SYSADMIN, or SECURITYADMIN roles). - [](/user-guide/admin-account-identifier) Detailed descriptions of the two unique account identifiers supported for connecting to Snowflake and using features that span multiple accounts. - [](/user-guide/admin-trial-account) Instructions for signing up for a trial account, adding a credit card to the account, and canceling the account. - [](/user-guide/admin-account-management) Instructions for setting account, session, and object parameters for your account. - [](/user-guide/admin-user-management) Instructions for creating and managing users in your account. - [](/release-notes/bcr-bundles/managing-behavior-change-releases) Instructions for enabling and disabling behavior change releases in your account. --- title: ML Functions source: https://docs.snowflake.com/en/guides-overview-ml-functions.md section: General --- # ML Functions These powerful analysis functions give you automated predictions and insights into your data using machine learning. Snowflake provides an appropriate type of model for each feature, so you don't have to be a machine learning expert to take advantage of them. All you need is your data. ## Time-Series Functions Use time-series functions to train a machine learning model on your time-series data to determine how a specified metric (for example, sales) varies over time and relative to other features of your data. The model then provides insights or predictions based on the trends detected in the data. - [Forecasting](/user-guide/ml-functions/forecasting) predicts future metric values from past trends in time-series data. - [Anomaly Detection](/user-guide/ml-functions/anomaly-detection) flags metric values that differ from typical expectations. ## Other Analysis Functions These features don't require time series data. - [Classification](/user-guide/ml-functions/classification) sort rows into two or more classes based on their most predictive features. - [Top Insights](/user-guide/ml-functions/top-insights) helps you find dimensions and values that affect the metric in surprising ways. ## Cost Considerations When you use ML functions, you incur storage and compute costs. These costs vary depending on the feature used and the quantity of data used in training and prediction. The storage costs you incur reflect storage of the ML model instances created during the training step. To view the objects associated with your model instance, navigate to your [Account Usage views](#label-account-usage-views) (ACCOUNT_USAGE.TABLES and ACCOUNT_USAGE.STAGES). These objects appear with null database and schema columns. The `instance_id` column, however, will be populated, indicating that these objects are contained in a model instance. These objects are fully managed by the model instance, and you cannot access or delete them separately. To reduce storage costs associated with your models, delete unused or obsolete models. See [](/user-guide/cost-understanding-compute) for general information on Snowflake compute costs. ## Limitations Before you use ML functions, you must ensure [AUTOCOMMIT](#label-txn-autocommit) is enabled in your session. AUTOCOMMIT is enabled by default when you start a new Snowflake session. ## Using ML functions in Snowpark `session.call` is not yet compatible with models created by ML functions. To call such a model in Snowpark, use `session.sql` instead, as shown here. ```python session.sql('call my_model!FORECAST(...)').collect() ``` --- title: Optimizing performance in Snowflake source: https://docs.snowflake.com/en/guides-overview-performance.md section: General --- # Optimizing performance in Snowflake - [](/user-guide/cost-management-overview) The following topics help guide efforts to improve the performance of Snowflake.
[](/user-guide/performance-query-exploring)
Gain insights into the historical performance of queries using the web interface or by writing queries against data in the ACCOUNT_USAGE schema.
[](/user-guide/performance-query-options)
Learn about options for optimizing Snowflake query performance.
[](/user-guide/performance-query-warehouse)
Learn about strategies to fine-tune computing power in order to improve the performance of a query or set of queries running on a warehouse, including enabling the Query Acceleration Service.
[](/user-guide/performance-query-storage)
Learn how storing similar data together, creating optimized data structures, and defining specialized data sets can improve the performance of queries. Helpful when choosing between Automatic Clustering, Search Optimization Service, and materialized views.
[](/user-guide/performance-explorer)
Learn how to use Performance Explorer in %sf-web-interface% to monitor interactive metrics for SQL workloads.
[](/user-guide/snowflake-optima)
Learn how Snowflake Optima continuously analyzes workload patterns and implements the most effective strategies automatically.
--- title: Privacy in Snowflake source: https://docs.snowflake.com/en/guides-overview-privacy.md section: General --- # Privacy in Snowflake Snowflake provides industry-leading features that maintain the privacy of individuals and sensitive data.
[Differential privacy](/user-guide/diff-privacy/differential-privacy-overview)
Protect the identity and information of entities against targeted privacy attacks. Data providers assign privacy policies to tables and views to protect their data with differential privacy.
[Aggregation policies](/user-guide/aggregation-policies)
Require queries to aggregate data in order to return results.
[Join policies](/user-guide/join-policies)
Require queries to join tables in order to return results.
%logo-snowflake-black% [Preview Feature](/release-notes/preview-features) — Open
Available to all accounts that are Enterprise Edition (or higher). To inquire about upgrading, please contact [Snowflake Support](https://docs.snowflake.com/user-guide/contacting-support).
[Projection policies](/user-guide/projection-policies)
Prevent queries from using a SELECT statement to project values from a column.
[Synthetic data](/user-guide/synthetic-data)
Programmatically create realistic datasets that closely mirror your original data. This allows you to safely represent sensitive, confidential, or restricted information across various workloads, such as testing and validation.
--- title: Query Data in Snowflake source: https://docs.snowflake.com/en/guides-overview-queries.md section: General --- # Query Data in Snowflake Snowflake supports standard SQL, including a subset of ANSI SQL:1999 and the SQL:2003 analytic extensions. Snowflake also supports common variations for a number of commands where those variations do not conflict with each other. You can use the search optimization service to improve query performance. For details, see [Search optimization service](/user-guide/search-optimization-service).
[](/user-guide/querying-joins)
A join combines rows from two tables to create a new combined row that can be used in the query. Learn join concepts, types of joins, and how to work with joins.
[](/user-guide/querying-time-series-data)
Analyze time-series data, using SQL functionality designed for this purpose, such as the ASOF JOIN feature, date and time helper functions, aggregate functions for downsampling, and functions that support sliding window frames. Using ASOF JOIN, learn how to join tables on timestamp columns when their values closely follow each other, precede each other, or match exactly.
[Eliminate Redundant Joins](/user-guide/join-elimination)
A join on a key column can refer to tables that are not needed for the join. Such a join is referred to as a *redundant join*. Learn about redundant joins, and how to eliminate them to improve query performance.
[](/user-guide/querying-subqueries)
A subquery is a query within another query. Learn about subqueries and how to use them.
[](/user-guide/queries-hierarchical)
Relational databases often store hierarchical data by using different tables. Learn about querying hierarchical data using joins, Common Table Expressions(CTEs) and CONNECT BY.
[](/user-guide/queries-cte)
A CTE (common table expression) is a named subquery defined in a WITH clause, the result of which is effectively a table. Learn how to write and work with CTE expressions.
[](/user-guide/querying-semistructured)
Semi-structured data represents arbitrary hierarchical data structures, which can be used to load and operate on data in semi-structured formats (e.g. JSON, Avro, ORC, Parquet, or XML). Learn how to use special operators and functions to query complex hierarchical data stored in a VARIANT.
[](/user-guide/querying-with-search-functions)
You can use full-text search to find character data (text) in specified columns from one or more tables, including fields in VARIANT, OBJECT, and ARRAY columns. Learn how to run queries that use full-text search.
[](/user-guide/querying-construct-at-runtime)
You can create programs that construct SQL statements dynamically at runtime. Learn about different options for constructing SQL at runtime.
[](/user-guide/functions-window-using)
Window functions operate on windows, which are groups of rows that are related in some way. Learn about windows, window functions, and how to use window functions to examine data.
[](/user-guide/match-recognize-introduction)
In some cases, you might need to identify sequences of table rows that match a pattern. Learn about pattern matching, and how to use MATCH_RECOGNIZE to work with table rows matching patterns.
[](/user-guide/querying-sequences)
Sequences are used to generate unique numbers across sessions and statements, including concurrent statements. Learn what are sequences, and how to use them.
[](/user-guide/querying-persisted-results)
When a query is executed, the result is persisted for a period of time. Learn how query results are persisted, how long persisted results are available, and how to use persisted query results to improve performance.
[](/user-guide/querying-distinct-counts)
Various methods exist to determine the count of distinct elements within a column. Learn methods to identify and report distinct elements in data.
[](/user-guide/querying-approximate-similarity)
Snowflake provides mechanisms to compare data sets for similarity. Learn how Snowflake determines similarity and how to compare multiple data sets for similarity.
[](/user-guide/querying-approximate-frequent-values)
Snowflake can examine data to determine how frequent values are within the data. Learn how frequency is determined and how to query data to determine data frequency using the through the APPROX_TOP_K family of functions.
[](/user-guide/querying-approximate-percentile-values)
Snowflake can estimate percentages of values using an improved version of the t-Digest algorithm. Learn how to estimate percentages using the APPROX_PERCENTILE family of functions
[](/user-guide/ui-snowsight-activity)
Monitor the query activity in your account. Learn how examine queries, using query profiles, to understand and improve performance.
[](/user-guide/query-insights)
Review the insights produced for a query. Learn how to improve the performance of a query.
[](/user-guide/query-hash)
To identify patterns and trends in queries, you can use the hash of the query text, which is included in the `query_hash` and `query_parameterized_hash` columns in selected Account Usage view and in the output of selected Information Schema table functions. Learn how to use the query hash in these columns to identify repeated queries and detect patterns and trends in queries.
[](/user-guide/querying-top-k-pruning-optimization)
Instead of scanning all eligible rows in SELECT statements that contain LIMIT and ORDER BY clauses, SELECT statements that use top-K pruning scan a subset of rows, which can improve performance. Learn how to use top-K pruning to improve the performance of SELECT statements that contain LIMIT and ORDER BY clauses.
[](/user-guide/querying-cancel-statements)
Executing statements are typically cancelled using the interface used to start the query. Learn how to use system functions to cancel a specific query or all currently executing queries.
--- title: Reference source: https://docs.snowflake.com/en/reference.md section: General --- # Reference Reference information on various areas of Snowflake.
[](/sql-reference-data-types)
Reference for SQL data types.
[](/sql-reference-commands)
Reference for SQL commands.
[](/sql-reference-functions)
Reference for SQL functions.
[](/sql-reference-classes)
Reference for SQL classes.
[](/sql-reference-snowflake-scripting)
Reference for [Snowflake Scripting](/developer-guide/snowflake-scripting/index) constructs.
[](/sql-reference)
Reference material on other subjects.
--- title: Securing Snowflake source: https://docs.snowflake.com/en/guides-overview-secure.md section: General --- # Securing Snowflake Snowflake provides industry-leading features that help ensure you can configure the highest levels of security for your account and users, as well as all the data you store in Snowflake. These topics are intended primarily for administrators (i.e. users with the ACCOUNTADMIN, SYSADMIN, or SECURITYADMIN roles). ## Authentication
[](/user-guide/authentication-policies)
Using authentication policies to restrict account and user authentication by client, authentication methods, and more.
[](/user-guide/security-mfa)
Using multi-factor authentication with Snowflake.
[Federated Authentication & SSO](/user-guide/admin-security-fed-auth-overview)
Topics related to federated authentication to Snowflake.
[](/user-guide/key-pair-auth)
Using key-pair authentication to Snowflake.
[](/user-guide/programmatic-access-tokens)
Generating and managing programmatic access tokens for authentication.
[OAuth](/user-guide/oauth-intro)
Topics related to using Snowflake OAuth and External OAuth to connect to Snowflake.
[](/user-guide/workload-identity-federation)
Preferred authentication method for service-to-service workloads.
[](/user-guide/api-authentication)
Configuring Snowflake to authenticate to external services.
## Network security
[](/user-guide/malicious-ip-protection)
Protecting your account from IP addresses that are known to be malicious.
[](/user-guide/network-policies)
Using network policies to restrict access to Snowflake.
[](/user-guide/network-rules)
Using network rules with other Snowflake features to restrict access to and from Snowflake.
## Private connectivity
[](/user-guide/private-connectivity-inbound)
Using private connectivity to access the Snowflake service, %sf-web-interface%, %sis%, internal stages, Snowflake managed storage volumes, and Snowpark Container Services.
[](/user-guide/private-connectivity-outbound)
Using private connectivity for external network locations, external functions, external stages, external tables, external volumes, and Snowpipe automation.
## Administration and authorization
[](/user-guide/trust-center/overview)
Using the Trust Center to evaluate and monitor your account for security risks.
[](/user-guide/session-policies)
Using session policies to manage your Snowflake session.
[SCIM](/user-guide/scim-intro)
Topics related to using SCIM to provision users and groups to Snowflake.
[Access Control](/user-guide/security-access-control-overview)
Topics related to role-based access control (RBAC) in Snowflake.
[End to End Encryption](/user-guide/security-encryption-end-to-end)
Using end-to-end encryption in Snowflake.
--- title: Snowflake AI and ML source: https://docs.snowflake.com/en/guides-overview-ai-features.md section: General --- # Snowflake AI and ML Snowflake offers two broad categories of powerful, intelligent features based on Artificial Intelligence (AI) and Machine Learning (ML). These features can help you do more with your data in less time than ever before. - **Snowflake Cortex** is a suite of AI features that use large language models (LLMs) to understand unstructured data, answer freeform questions, and provide intelligent assistance. This suite of Snowflake AI Features comprises: - [Cortex Agents](/user-guide/snowflake-cortex/cortex-agents) - [](/user-guide/snowflake-cortex/aisql) - [Cortex Analyst](/user-guide/snowflake-cortex/cortex-analyst) - [Cortex Fine-tuning](/user-guide/snowflake-cortex/cortex-finetuning) - [Cortex Search](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) - [%sf-intelligence%](/user-guide/snowflake-cortex/snowflake-cowork) - [Cortex Code](/user-guide/cortex-code/cortex-code) - [Cortex Code in Snowsight](/user-guide/cortex-code/cortex-code-snowsight) - [Cortex Code CLI](/user-guide/cortex-code/cortex-code-cli) - [Cortex Code Agent SDK](/user-guide/cortex-code-agent-sdk/cortex-code-agent-sdk) - [Cortex Code CLI Model Context Protocol (MCP) support](/user-guide/cortex-code/cortex-code-mcp) - [Cortex Code CLI Agent Client Protocol (ACP) support](/user-guide/cortex-code/cortex-code-acp) - [Cortex Code CLI plugins](/user-guide/cortex-code/cortex-code-plugins) - [Cortex AI Guardrails](/user-guide/snowflake-cortex/cortex-ai-guardrails) - **Snowflake ML** provides functionality for you to build your own models. - [ML Functions](/guides-overview-ml-functions) simplify the process of creating and using traditional machine learning models to detect patterns in your structured data. These powerful out-of-the-box analysis tools help time-strapped analysts, data engineers, and data scientists understand, predict, and classify data, without any programming. - For data scientists and developers, [Snowflake ML](/developer-guide/snowflake-ml/overview) lets you develop and operationalize custom models to solve your unique data challenges, while keeping your data inside Snowflake. Snowflake ML incorporates model development classes based on popular ML frameworks, along with ML Ops capabilities such as a feature store, a model registry, framework connectors, and immutable data snapshots. ## Use of Snowflake AI features Snowflake AI Features and their underlying models are designed with the following principles in mind: - **Full security.** Except as you elect, all AI models run inside of Snowflake's security and governance perimeter. Your data is not available to other customers or model developers. - **Data privacy.** Snowflake never uses your Customer Data to train models made available to our customer base. - **Control.** You have control over your team's use of Snowflake AI Features through familiar [role-based access control](/user-guide/security-access-control-overview). ## AI/ML model update process Snowflake is continually working to improve the quality of its offerings, including the models powering the Snowflake AI Features. This section describes how updates to those models fit into [Snowflake's Behavior Change](/release-notes/intro-bcr-releases) process. ## Model Update and Behavior Change Policy ### Overview Snowflake continuously updates the models that power Cortex AI features to improve quality, performance, and availability. These updates may introduce changes to model behavior, availability, or lifecycle status. This document describes how model changes are defined, how they are communicated, and how model lifecycle and deprecation are managed. ### Model lifecycle Models in Cortex follow a defined lifecycle to communicate readiness and stability: - Private Preview - Public Preview - General Availability (GA) - Legacy - End of Life (EOL) Lifecycle status reflects the maturity and support level of a model. As models progress through these stages, their status will be reflected across customer-facing surfaces. Preview models are intended for evaluation and may change more frequently. GA models are considered stable and suitable for production use. ### Types of model changes A model update is considered a behavior change if it results in any of the following: - Changes to required syntax, including specifying a model or model version - Changes to the structure of model outputs - Deprecation of a model These changes may impact how customers interact with models and should be reviewed as part of normal governance processes. ### How changes are communicated Snowflake communicates model-related updates through the following mechanisms: - [Behavior Change Releases (BCRs)](/release-notes/intro-bcr-releases) — Used for changes that may require customer action or impact existing workflows - [What's New](/release-notes/new-features) — Used for improvements or additions that do not materially change how customers interact with models Model deprecations are communicated separately from bundled releases to provide clear and timely notification. ### Deprecation policy Snowflake periodically deprecates models to ensure customers have access to high-quality, well-supported options. For General Availability (GA) models: - Snowflake will make reasonable efforts to provide at least 60 days advance notice prior to deprecation For Preview models: - Deprecation timelines are not guaranteed and may occur with shorter notice During the deprecation period: - Customers are expected to migrate to alternative models before the deprecation date - After deprecation, models may no longer be available for use Lifecycle status will reflect deprecation through the transition to Legacy and ultimately End of Life. ### Legal notices - If you choose to use any of the Snowflake AI Features, your use is subject to our [Acceptable Use Policy](https://www.snowflake.com/legal/acceptable-use-policy/). - The outputs of Snowflake AI Features may be inaccurate, inappropriate, inefficient, or biased. Decisions based on such outputs, including those built into automatic pipelines, should have human oversight and review processes to ensure they are safe, accurate, and suitable for your intended use. - Your use of any Snowflake AI Feature that is identified as being powered by a third-party, open-source model is subject to any applicable license agreement and/or acceptable use policy set forth under the Offering-Specific Terms page available at [https://www.snowflake.com/legal/](https://www.snowflake.com/legal/). - For further information, see the [Snowflake AI Trust and Safety FAQ](https://www.snowflake.com/en/legal/snowflake-ai-trust-and-safety/). --- title: Snowflake data types source: https://docs.snowflake.com/en/data-types.md section: General --- # Snowflake data types - [](/sql-reference/functions-string) - [](/sql-reference/functions-date-time) - [](/sql-reference/functions-semistructured) - [](/sql-reference/functions-geospatial) - [](/sql-reference/functions-vector) - [](/sql-reference/functions-conversion) - [](/sql-reference/data-type-conversion) Snowflake supports most basic SQL data types (with some restrictions) for use in columns, local variables, expressions, parameters, and any other appropriate locations. You can also load unstructured data into Snowflake. For more information, see [Introduction to unstructured data](/user-guide/unstructured-intro). In some cases, data of one type can be converted to another type. For example, INTEGER data can be converted to FLOAT data. Some conversions are lossless, but others might lose information. The amount of loss depends upon the data types and the specific values. For example, converting a FLOAT value to an INTEGER value removes the digits after the decimal place. (The value is rounded to the nearest integer.) In some cases, the user must specify the desired conversion, such as when passing a VARCHAR value to the [TIME_SLICE](/sql-reference/functions/time_slice) function, which expects a TIMESTAMP or DATE argument. We call this explicit casting. In other cases, data types are converted automatically, such as when adding a float and an integer. We call this implicit casting (or coercion). In Snowflake, data types are automatically coerced whenever necessary and possible. For more information about explicit and implicit casting, see [Data type conversion](/sql-reference/data-type-conversion). For more information about Snowflake data types, see the following topics: - [Summary of data types](/sql-reference/intro-summary-data-types) - [Numeric data types](/sql-reference/data-types-numeric) - [String & binary data types](/sql-reference/data-types-text) - [Logical data types](/sql-reference/data-types-logical) - [Date & time data types](/sql-reference/data-types-datetime) - [Semi-structured data types](/sql-reference/data-types-semistructured) - [Structured data types](/sql-reference/data-types-structured) - [Unstructured data types](/sql-reference/data-types-unstructured) - [Geospatial data types](/sql-reference/data-types-geospatial) - [UUID data type](/sql-reference/data-types-uuid) - [Vector data types](/sql-reference/data-types-vector) - [User-defined types](/sql-reference/data-types-user-defined) - [Unsupported data types](/sql-reference/data-types-unsupported) - [Data type conversion](/sql-reference/data-type-conversion) --- title: Snowflake Scripting reference source: https://docs.snowflake.com/en/sql-reference-snowflake-scripting.md section: General --- # Snowflake Scripting reference - [](/developer-guide/snowflake-scripting/index) - [](/developer-guide/stored-procedure/stored-procedures-snowflake-scripting) - [](/developer-guide/udf/sql/udf-sql-procedural-functions) These topics provide reference information for the language elements supported in [Snowflake Scripting](/developer-guide/snowflake-scripting/index). ```sqlsyntax -- Variable declaration [ DECLARE ... ] ... BEGIN ... -- Branching [ IF ... ] [ CASE ... ] -- Looping [ FOR ... ] [ WHILE ... ] [ REPEAT ... ] [ LOOP ... ] -- Loop termination (within a looping construct) [ BREAK ] [ CONTINUE ] -- Variable assignment [ LET ... ] -- Cursor management [ OPEN ... ] [ FETCH ... ] [ CLOSE ... ] -- Asynchronous child job management [ AWAIT ... ] [ CANCEL ... ] -- "No-op" (no-operation) statement (usually within a branch or exception) [ NULL ] -- Raising exceptions [ RAISE ... ] -- Returning a value [ RETURN ... ] -- Exception handling [ EXCEPTION ... ] END; ``` **Next Topics:** - [AWAIT](/sql-reference/snowflake-scripting/await) - [BEGIN ... END](/sql-reference/snowflake-scripting/begin) - [BREAK](/sql-reference/snowflake-scripting/break) - [CANCEL](/sql-reference/snowflake-scripting/cancel) - [CASE](/sql-reference/snowflake-scripting/case) - [CLOSE](/sql-reference/snowflake-scripting/close) - [CONTINUE](/sql-reference/snowflake-scripting/continue) - [DECLARE](/sql-reference/snowflake-scripting/declare) - [EXCEPTION](/sql-reference/snowflake-scripting/exception) - [FETCH](/sql-reference/snowflake-scripting/fetch) - [FOR](/sql-reference/snowflake-scripting/for) - [IF](/sql-reference/snowflake-scripting/if) - [LET](/sql-reference/snowflake-scripting/let) - [LOOP](/sql-reference/snowflake-scripting/loop) - [NULL](/sql-reference/snowflake-scripting/null) - [OPEN](/sql-reference/snowflake-scripting/open) - [RAISE](/sql-reference/snowflake-scripting/raise) - [REPEAT](/sql-reference/snowflake-scripting/repeat) - [RETURN](/sql-reference/snowflake-scripting/return) - [WHILE](/sql-reference/snowflake-scripting/while) --- title: SQL class reference source: https://docs.snowflake.com/en/sql-reference-classes.md section: General --- # SQL class reference These topics provide reference information for Snowflake [classes](/sql-reference/snowflake-db-classes). Each class supports one or more of the following SQL operations: - ALTER: Modifies the properties of an instance of a class. - CREATE: Creates an instance of a class. - DROP: Deletes an instance of a class. - SHOW: Lists instances of a class. An instance of a class can have one or more methods. A method is a stored procedure or function and can be called by using the instance name and method name, and arguments (if any) required by the method. For example, CALL instance_name!method_name(...). ## Updating your search path You can add the schema for classes you use frequently to your search path to save typing and make your SQL statements more concise. For more information about updating your search path, see [](#label-update-search-path). ## Available classes Snowflake provides the following system-defined (built-in) classes.
[](/sql-reference/classes/anomaly_detection)
Allows you to detect outliers in your time series data.
[](/sql-reference/classes/anomaly_insights)
Allows you to detect outliers in your costs.
[](/sql-reference/classes/budget)
Allows you to monitor credit usage of supported objects.
[](/sql-reference/classes/classification)
Automatically sorts data into categories based on features in the data.
[](/sql-reference/classes/classification_profile)
Allows you to automatically classify sensitive data.
[](/sql-reference/classes/custom_classifier)
Allows you to define custom classifiers to extend your data classification capabilities.
[](/sql-reference/classes/forecast)
Represents a forecast model that produces a forecast for a single or multiple time series.
[](/sql-reference/classes/top-insights)
Allows you to determine the segments driving changes in a metric.
--- title: SQL command reference source: https://docs.snowflake.com/en/sql-reference-commands.md section: General --- # SQL command reference These topics provide reference information for all the Snowflake SQL commands (DDL, DML, and query syntax). - [](/sql-reference/constructs) — structure of SQL queries in Snowflake. - [](/sql-reference/operators) — arithmetic, logical, and other types of operators. - [](/sql-reference/sql-ddl-summary) — overview of DDL commands. - [](/sql-reference/sql-dml) — commands for performing DML operations, including: - Inserting, deleting, updating, and merging data in Snowflake tables. - Bulk copying data into and out of Snowflake tables. - Staging files for bulk copying. - [](/sql-reference/sql-all) — alphabetical list of all the commands. - Commands categorized by the type of objects and operations they control, including: - General account-level objects (accounts, users, roles, security policies, integrations, etc.) and operations (failover & recovery, etc.). - Session-based operations (session context, queries, variables, transactions, etc.). - Virtual warehouses (for loading data and performing queries) and resource monitors (for controlling credit usage). - Databases, schemas, tables, and other schema-level objects (views, sequences, etc.). - Snowflake extensions and application development (user-defined functions, stored procedures, scripting, etc.). - Objects for sharing data (shares, listings, etc.). - Objects for classifying, protecting, and governing data (masking policies, row-access policies, tags, etc.). --- title: SQL data types reference source: https://docs.snowflake.com/en/sql-reference-data-types.md section: General --- # SQL data types reference - [](/sql-reference/data-types-numeric) - [](/sql-reference/data-types-text) - [](/sql-reference/data-types-logical) - [](/sql-reference/data-types-datetime) - [](/sql-reference/data-types-semistructured) - [](/sql-reference/data-types-structured) - [](/sql-reference/data-types-unstructured) - [](/sql-reference/data-types-geospatial) - [](/sql-reference/data-types-uuid) - [](/sql-reference/data-types-vector) - [](/sql-reference/data-types-user-defined) - [](/sql-reference/data-types-unsupported) - [](/sql-reference/data-type-conversion) - [](/sql-reference/functions-conversion) - [Unstructured Data](/user-guide/unstructured-intro) Snowflake supports most basic SQL data types (with some restrictions) for use in columns, local variables, expressions, parameters, and any other appropriate locations. You can also load unstructured data into Snowflake. For more information, see [Introduction to unstructured data](/user-guide/unstructured-intro). In some cases, data of one type can be converted to another type. For example, INTEGER data can be converted to FLOAT data. Some conversions are lossless, but others might lose information. The amount of loss depends upon the data types and the specific values. For example, converting a FLOAT value to an INTEGER value removes the digits after the decimal place. (The value is rounded to the nearest integer.) In some cases, the user must specify the desired conversion, such as when passing a VARCHAR value to the [TIME_SLICE](/sql-reference/functions/time_slice) function, which expects a TIMESTAMP or DATE argument. We call this explicit casting. In other cases, data types are converted automatically, such as when adding a float and an integer. We call this implicit casting (or coercion). In Snowflake, data types are automatically coerced whenever necessary and possible. For more information about explicit and implicit casting, see [Data type conversion](/sql-reference/data-type-conversion). For more information about Snowflake data types, see the following topics: - [Summary of data types](/sql-reference/intro-summary-data-types) - [Numeric data types](/sql-reference/data-types-numeric) - [String & binary data types](/sql-reference/data-types-text) - [Logical data types](/sql-reference/data-types-logical) - [Date & time data types](/sql-reference/data-types-datetime) - [Semi-structured data types](/sql-reference/data-types-semistructured) - [Structured data types](/sql-reference/data-types-structured) - [Unstructured data types](/sql-reference/data-types-unstructured) - [Geospatial data types](/sql-reference/data-types-geospatial) - [UUID data type](/sql-reference/data-types-uuid) - [Vector data types](/sql-reference/data-types-vector) - [User-defined types](/sql-reference/data-types-user-defined) - [Unsupported data types](/sql-reference/data-types-unsupported) - [Data type conversion](/sql-reference/data-type-conversion) --- title: Stored procedures source: https://docs.snowflake.com/en/sql-reference-stored-procedures.md section: General --- # Stored procedures Snowflake provides stored procedures to facilitate using certain Snowflake features. To find the stored procedures that are associated with a particular Snowflake Class, see [](/sql-reference-classes). Use [](/sql-reference/sql/call) to call a stored procedure. For example: ```sql CALL SYSTEM$CLASSIFY('hr.tables.empl_info', null); ``` Snowflake supports the following stored procedures, grouped by feature:
Feature Stored procedure [Cortex Powered Object Descriptions](/user-guide/sql-cortex-descriptions) - [](/sql-reference/stored-procedures/ai_generate_table_desc) [Data classification](/user-guide/classify-intro) - [](/sql-reference/stored-procedures/associate_semantic_category_tags) - [](/sql-reference/stored-procedures/system_classify) - [](/sql-reference/stored-procedures/system_classify_schema) - [](/sql-reference/stored-procedures/system_cancel_classify_schema) [Data sharing and collaboration](/guides-overview-sharing) - [](/sql-reference/stored-procedures/system_request_listing_and_wait) [Default event table](/developer-guide/logging-tracing/event-table-setting-up) - [](/sql-reference/stored-procedures/snowflake_telemetry_add_row_access_policy_on_events_view) - [](/sql-reference/stored-procedures/snowflake_telemetry_drop_row_access_policy_on_events_view) [Differential privacy](/user-guide/diff-privacy/differential-privacy-overview) - [](/sql-reference/stored-procedures/reset_privacy_budget) [Network security](/user-guide/network-policy-advisor) - [](/sql-reference/stored-procedures/evaluate_candidate_network_policy) - [](/sql-reference/stored-procedures/recommend_network_policy) [Notifications](/user-guide/notifications/about-notifications) - [](/sql-reference/stored-procedures/system_send_snowflake_notification) - [](/sql-reference/stored-procedures/system_send_email) [Semantic views](/user-guide/views-semantic/overview) - [](/sql-reference/stored-procedures/system_create_semantic_view_from_yaml) [Synthetic data](/user-guide/synthetic-data) - [](/sql-reference/stored-procedures/generate_synthetic_data) [Trust Center](/user-guide/trust-center/overview) - [](/sql-reference/stored-procedures/register_extension) - [](/sql-reference/stored-procedures/deregister_extension)
--- title: Tutorials and Other Resources source: https://docs.snowflake.com/en/other-resources.md section: General --- # Tutorials and Other Resources This topic provides links to assorted "how to" tutorials/labs and "best practices" for using Snowflake. ## Tutorials Snowflake provides several tutorials for getting started. You will need a Snowflake account to explore these tutorials. If you sign up for a trial account, the trial account has a user with necessary roles (ACCOUNTADMIN and SYSADMIN) and a virtual warehouse (COMPUTE_WH) needed to explore this tutorial. If you use any other account to explore this tutorial, then make sure your user is granted these roles and the account has the virtual warehouse. For new users, we recommend you start with these tutorials: - [](/user-guide/tutorials/snowflake-in-20minutes) — A simple tutorial using SnowSQL, the Snowflake command-line client, to introduce key concepts and tasks. - [Getting Started with Snowflake - Zero to Snowflake](https://quickstarts.snowflake.com/guide/getting_started_with_snowflake/index.html) — A comprehensive tutorial that uses both SnowSQL and %sf-web-interface-link% covers data loading, querying, working with semi-structured data, accessing historical data using Snowflake's Time Travel feature, sharing, and so on. - [Getting Started with Python](https://quickstarts.snowflake.com/guide/getting_started_with_python/index.html) — A tutorial in which you set up the Python Connector and then explore the basic operations you can do with it. For tutorials on bulk loading, see: - [Bulk Loading from a Local File System](/user-guide/tutorials/data-load-internal-tutorial) - [Bulk Loading from Amazon S3](/user-guide/tutorials/data-load-external-tutorial) In addition, you might explore the following pages that introduce important concepts about semi-structured data: - [JSON Basics](/user-guide/tutorials/json-basics-tutorial) - [Loading JSON Data into a Relational Table](/user-guide/tutorials/script-data-load-transform-json) - [Loading and Unloading Parquet Data](/user-guide/tutorials/script-data-load-transform-parquet) ![](/static/images/video-play-bn.png) ![](/static/images/vid-thumb-d-generic.png) ![](/static/images/vid-thumb-gs-key-concepts.png) ![](/static/images/vid-thumb-gs-intro-snowflake.png) ![](/static/images/vid-thumb-gs-intro-virtual-warehouses.png) ![](/static/images/vid-thumb-gs-intro-db-query.png) ![](/static/images/vid-thumb-gs-intro-data-loading.png) ![](/static/images/vid-thumb-d-accel-bi-queries.png) ![](/static/images/vid-thumb-d-easily-load-analyze-semi-struct.png) ![](/static/images/vid-thumb-d-elim-conc-issues.png) ![](/static/images/vid-thumb-d-protect-data-time-travel.png) ![](/static/images/vid-thumb-d-quick-look-zero-copy-cloning.png) ![](/static/images/vid-thumb-d-easy-data-sharing.png) ![](/static/images/vid-thumb-d-query-mult-databases.png) ![](/static/images/vid-thumb-d-tackle-high-concurrency.png) ![](/static/images/vid-thumb-d-snowpipe.png) ## Best Practices Snowflake best practices are provided throughout the documentation. The following are links to important practices related to Snowflake features: - [Roles and Access Control](/user-guide/security-access-control-considerations) - [Virtual Warehouses](/user-guide/warehouses-considerations) - [Table Design](/user-guide/table-considerations) - [Data Storage](/user-guide/tables-storage-considerations) - [Data Loading](/user-guide/data-load-considerations) - [Data Unloading](/user-guide/data-unload-considerations) - [Semi-structured Data](/user-guide/semistructured-considerations) ## Sample Data Sets The following benchmarking datasets are available for all Snowflake accounts: - [TPC-DS](/user-guide/sample-data-tpcds) - [TPC-H](/user-guide/sample-data-tpch) In addition, [Snowflake Marketplace](https://app.snowflake.com/marketplace?pricing=free) is where you can find additional data sets, provided by third-parties, for use with Snowflake. For related documentation, refer to [Introduction to the Snowflake Marketplace](https://other-docs.snowflake.com/en/marketplace/intro.html). --- title: Tutorials to get started with Snowflake source: https://docs.snowflake.com/en/learn-tutorials.md section: General --- # Tutorials to get started with Snowflake The tutorials in this topic provide hands-on examples that get you started with Snowflake. To explore these tutorials, you must have a Snowflake account and a user with the required roles and access to a virtual warehouse: - If you have signed up for a [trial account](/user-guide/admin-trial-account), the trial account user has the required roles and a virtual warehouse that you can use for several of these tutorials. - If you use another account to explore these tutorials, you must sign in as a user that has the required roles and that can use a virtual warehouse. Each tutorial describes the prerequisites that must be met before completing its tasks, including the roles required for the user who performs the tasks. Several tutorials require the ACCOUNTADMIN and SYSADMIN roles. Snowflake bills a minimal amount for the on-disk storage that you use for any sample data in these tutorials. Snowflake requires a [virtual warehouse](/user-guide/warehouses) to load the data and execute queries. A running virtual warehouse consumes Snowflake credits. After you finish a tutorial, you can drop objects that are created in the tutorial to minimize costs. If you are using a [30-day trial account](https://signup.snowflake.com/), which provides free credits, you won't incur any costs. The following sections contain links to tutorials that get you started with Snowflake tasks and features: - [Tutorial that introduces you to Snowflake](#tutorial-that-introduces-you-to-snowflake) - [Tutorials to get started with data engineering](#tutorials-to-get-started-with-data-engineering) - [Tutorial to get started with security](#tutorial-to-get-started-with-security) - [Other learning resources](#other-learning-resources) ## Tutorial that introduces you to Snowflake Snowflake provides the following tutorial to introduce you to key concepts and tasks:
[](/user-guide/tutorials/snowflake-in-20minutes)
Use SnowSQL, a Snowflake command-line client, to learn about key concepts and tasks.
## Tutorials to get started with data engineering Snowflake provides the following tutorials to get you started with data engineering: These tutorials show you how to load data into a table by using the [COPY INTO <table>](/sql-reference/sql/copy-into-table) command. For information about other options for loading data, see [Overview of data loading](/user-guide/data-load-overview). ### Load data
[Load and query sample data using SQL](/user-guide/tutorials/tasty-bytes-sql-load)
Uses a fictitious food truck brand named Tasty Bytes to show you how to [load](/user-guide/data-load-overview) and query data in Snowflake using SQL. You can access a pre-loaded [Snowsight template](/user-guide/ui-snowsight/snowsight-templates) worksheet to complete these tasks.
[Load data from cloud storage: Amazon S3](/user-guide/tutorials/load-from-cloud-tutorial)
Shows you how to load data from an Amazon S3 bucket into Snowflake using SQL. You can access a pre-loaded Snowsight template worksheet to complete these tasks.
[Load data from cloud storage: Microsoft Azure](/user-guide/tutorials/load-from-cloud-tutorial-azure)
Shows you how to load data from Microsoft Azure cloud storage into Snowflake using SQL. You can access a pre-loaded Snowsight template worksheet to complete these tasks.
[Load data from cloud storage: Google Cloud Storage](/user-guide/tutorials/load-from-cloud-tutorial-gcs)
Shows you how to load data from Google Cloud Storage into Snowflake using SQL. You can access a pre-loaded Snowsight template worksheet to complete these tasks.
### Bulk load data
[Bulk load from a local file system using COPY](/user-guide/tutorials/data-load-internal-tutorial)
Describes how to [bulk load data](/user-guide/data-load-local-file-system) from files in your local file system into a table.
[Bulk load from Amazon S3 using COPY](/user-guide/tutorials/data-load-external-tutorial)
Describes how to bulk load data from files in an existing Amazon Simple Storage Service (Amazon S3) bucket into a table.
### Work with semi-structured data
[Learn the basics of using JSON with Snowflake](/user-guide/tutorials/json-basics-tutorial)
Describes the basics of using [JSON](#label-what-is-json) with Snowflake.
[Load JSON data into a relational table](/user-guide/tutorials/script-data-load-transform-json)
Uses a [COPY INTO <table>](/sql-reference/sql/copy-into-table) command with a SELECT statement to load individual elements in a staged JSON file into a table.
[Load and unload Parquet data](/user-guide/tutorials/script-data-load-transform-parquet)
Describes how you can upload [Parquet](#label-what-is-parquet) data by transforming elements of a staged Parquet file directly into table columns using the [COPY INTO <table>](/sql-reference/sql/copy-into-table) command. The tutorial also describes how you can use the [COPY INTO <location>](/sql-reference/sql/copy-into-location) command to unload table data into a Parquet file.
## Tutorial to get started with security Snowflake provides the following tutorial to get you started with security:
[](/user-guide/tutorials/users-and-roles-tutorial)
Shows you how to create a [user](/user-guide/admin-user-management) and grant a role to it by using SQL commands. You can access a pre-loaded [Snowsight template](/user-guide/ui-snowsight/snowsight-templates) worksheet to complete these tasks.
## Other learning resources These other learning sources are available:
[Tutorials](https://docs.snowflake.com/tutorials)
Explore a large repository of tutorials with hands-on examples that help you learn about Snowflake's features.
[Snowflake Education Services](https://learn.snowflake.com/en/)
Discover instructor-led classes, on-demand courses, and self-directed learning to get you started with Snowflake.
[Snowflake for Developers](https://www.snowflake.com/en/developers/guides/)
Discover product quickstarts, industry-specific use cases, administration best practices, and reference architectures from Snowflake experts and partners.
[Snowflake Developers YouTube Channel](https://www.youtube.com/@snowflakedevelopers)
Discover Snowflake product tips, demos, and tutorials.
--- title: Unload data from Snowflake source: https://docs.snowflake.com/en/guides-overview-unloading-data.md section: General --- # Unload data from Snowflake Snowflake supports bulk unloading of data from a database table into flat, delimited text files. The following topics detail the processes and procedures associated with unloading data.
[](/user-guide/data-unload-overview)
Introduction and overview of unloading data.
[](/user-guide/intro-summary-unloading)
Reference of the supported features for using the [](/sql-reference/sql/copy-into-location) command to unload data from Snowflake tables into flat files.
[](/user-guide/data-unload-considerations)
Best practices, general guidelines, and important considerations for unloading data.
[](/user-guide/data-unload-prepare)
Supported data file formats for unloading data.
[](/user-guide/data-unload-snowflake)
Instructions on using the COPY command to unload data from a table into an internal (i.e. Snowflake) stage.
[](/user-guide/data-unload-s3)
Instructions on using the COPY command to unload data from a table into an Amazon S3 bucket.
[](/user-guide/data-unload-gcs)
Instructions on using the COPY command to unload data from a table into an Google Cloud Storage bucket.
[](/user-guide/data-unload-azure)
Instructions on using the COPY command to unload data from a table into an Azure container.
--- title: Welcome to Snowflake Documentation source: https://docs.snowflake.com/en/index.md section: General --- # Welcome to Snowflake Documentation ![WELCOME TO SNOWFLAKE DOCUMENTATION](/static/images/sf-hero.jpg) WELCOME TO SNOWFLAKE DOCUMENTATION In these topics, you will find the information you need to access your Snowflake account and perform all the administrative and user tasks associated with using Snowflake. The documentation also provides conceptual overviews, tutorials, and a detailed reference for all supported SQL commands, functions, and operators. You can start by browsing the contents on the left or using the search box at the top to search across the documentation and other Snowflake resources. If you do not find the information you are looking for, please feel free to reach out to Snowflake Documentation or Snowflake Support using the buttons at the bottom of each page. ## [](/getting-started-for-users)
[](/user-guide/setup)
Overview of getting an account and methods for accessing Snowflake.
[](/user-guide/connecting)
Overview of the different ways to connect to Snowflake.
[](/user-guide/intro-key-concepts)
Description of Snowflake architecture, key concepts, and features.
[](/user-guide/ui-snowsight-quick-tour)
Overview of %sf-web-interface%, Snowflake's web-based interface.
[](/user-guide/data-lifecycle)
Introduces the main operations and corresponding SQL commands for getting your data into Snowflake and then using it to perform queries and other SQL operations.
## [](/other-resources) This topic provides links to assorted "how to" tutorials/labs and "best practices" for using Snowflake. ## [](/user-guide) - [](/user-guide/ui-snowsight) — Learn how to use %sf-web-interface% for your Snowflake operations: - [](/user-guide/ui-snowsight-quick-tour) - [](/user-guide/ui-snowsight-gs) - [](/user-guide/ui-snowsight-worksheets) - [](/user-guide/ui-snowsight/workspaces) - [](/user-guide/ui-snowsight/notebooks) - [](/user-guide/snowflake-copilot) - [](/user-guide/ui-snowsight-dashboards) - [](/user-guide/ui-snowsight-data) - [](/user-guide/ui-snowsight-activity) - [Evaluating and monitoring account security in the Trust Center](/user-guide/trust-center/overview) - [](/user-guide/ui-support) - [](/user-guide/ui-snowsight-contacts) - [](/user-guide/warehouses) — Key concepts and tasks for creating and using virtual warehouses to execute queries and perform DML operations, such as loading and unloading data: - [](/user-guide/warehouses-overview) - [](/user-guide/warehouses-multicluster) - [](/user-guide/warehouses-considerations) - [](/user-guide/warehouses-tasks) - [](/user-guide/query-acceleration-service) - [](/user-guide/warehouses-load-monitoring) - [](/user-guide/databases) — Key concepts and tasks related to understanding and working with Snowflake databases and tables: - [](/user-guide/tables-micro-partitions) - [](/user-guide/tables-temp-transient) - [](/user-guide/tables-external-intro) - [](/user-guide/views-introduction) - [](/user-guide/views-secure) - [](/user-guide/views-materialized) - [](/user-guide/table-considerations) - [](/user-guide/object-clone) - [](/user-guide/tables-storage-considerations) - [](/guides-overview-queries) — Key concepts and tasks for executing queries in Snowflake: - [](/user-guide/querying-joins) - [](/user-guide/join-elimination) - [](/user-guide/querying-subqueries) - [](/user-guide/queries-hierarchical) - [](/user-guide/queries-cte) - [](/user-guide/querying-semistructured) - [](/user-guide/functions-window-using) - [](/user-guide/match-recognize-introduction) - [](/user-guide/querying-sequences) - [](/user-guide/querying-persisted-results) - [](/user-guide/querying-distinct-counts) - [](/user-guide/querying-approximate-similarity) - [](/user-guide/querying-approximate-frequent-values) - [](/user-guide/querying-approximate-percentile-values) - [](/user-guide/ui-snowsight-query) - [](/user-guide/querying-cancel-statements) - [](/user-guide/semistructured-intro) — Key concepts and tasks for working with JSON and other types of semi-structured data: - [](/user-guide/semistructured-data-formats) - [](/user-guide/semistructured-considerations) - [](/user-guide/tutorials/json-basics-tutorial) - [](/user-guide/unstructured-intro) — Key concepts and tasks for working with unstructured data: - [](/user-guide/data-load-dirtables) - [](/user-guide/data-load-unstructured-rest-api) - [](/user-guide/unstructured-data-sharing) - [](/user-guide/unstructured-ts) - [](/user-guide/data-availability) — Key concepts and tasks for understanding how Snowflake maintains access to deleted and modified data, and also how Snowflake enables data recovery in the event of loss: - [](/user-guide/data-time-travel) - [](/user-guide/data-failsafe) - [](/user-guide/data-cdp-storage-costs) - [](/user-guide/data-pipelines-intro) — Key concepts and tasks for transforming and optimizing loaded data for analysis: - [](/user-guide/streams-intro) - [](/user-guide/tasks-intro) - [](/user-guide/replication-intro) — Key concepts and tasks for replicating and failing over databases across multiple Snowflake accounts, as well as redirecting client connections, for business continuity and disaster recovery:
Supported regions for feature
This feature is not available in the People's Republic of China.
- [](/user-guide/account-replication-intro) - [](/user-guide/client-redirect) - [](/user-guide/sample-data) — Key concepts and tasks for using the sample data sets provided with Snowflake: - [](/user-guide/sample-data-using) - [](/user-guide/sample-data-tpch) - [](/user-guide/sample-data-openweathermap) - [](/guides-overview-alerts) — Key concepts and tasks for sending email notifications in SQL (e.g. from a stored procedure, task, etc.) and setting up alerts to perform actions or send notifications when data in Snowflake meets certain conditions. - [](/user-guide/alerts) - [](/user-guide/notifications/about-notifications) - [](/user-guide/snowflake-postgres/about) — Create, manage, and use Postgres instances directly from Snowflake: - [](/user-guide/snowflake-postgres/postgres-create-instance) - [](/user-guide/snowflake-postgres/connecting-to-snowflakepg) - [](/user-guide/snowflake-postgres/postgres-roles) - [](/user-guide/snowflake-postgres/postgres-connection-pooling) - [](/user-guide/snowflake-postgres/postgres-maintenance) - [](/user-guide/snowflake-postgres/postgres-create-replica) - [](/user-guide/snowflake-postgres/high-availability) - [](/user-guide/snowflake-postgres/postgres-cost) - [](/user-guide/snowflake-postgres/insights) - [](/user-guide/snowflake-postgres/postgres-logging) - [](/user-guide/snowflake-postgres/postgres-cortex-code) - [](/user-guide/snowflake-postgres/postgres-network) - [](/user-guide/snowflake-postgres/postgres-instance-sizes) - [](/user-guide/snowflake-postgres/postgres-extensions) - [](/user-guide/snowflake-postgres/postgres-server-settings) ## [](/user-guide-admin) - [](/user-guide/admin-account-identifier) Detailed descriptions of the two unique account identifiers supported for connecting to Snowflake and using features that span multiple accounts. - [](/user-guide/admin-trial-account) Instructions for signing up for a trial account, adding a credit card to the account, and canceling the account. - [](/user-guide/admin-account-management) Instructions for setting account, session, and object parameters for your account. - [](/user-guide/admin-user-management) Instructions for creating and managing users in your account. - [](/release-notes/bcr-bundles/managing-behavior-change-releases) Instructions for enabling and disabling behavior change releases in your account. ## [](/sql-reference) - [](/sql-reference/parameters) — parameters that can be used to control system behavior at the account, user, session, and object level. - [References](/sql-reference/references) — use references to authorize access on objects for owner's rights stored procedures, applications, and classes. - [](/sql-reference/ternary-logic) — information about the behavior of NULL in Boolean expressions and with comparison operators. - [](/sql-reference/collation) — information about sorting and other character-set-dependent operations on text strings. - [](/sql-reference/sql-format-models) — formats for specifying conversion of numeric and date/time values to and from text strings. - [](/sql-reference/identifiers) — rules for defining and using object identifiers, including resolving object names used in SQL statements: - [](/sql-reference/identifiers-syntax) - [](/sql-reference/identifier-literal) - [](/sql-reference/name-resolution) - [](/sql-reference/constraints) — concepts and reference information for defining and maintaining unique, primary key, and foreign key constraints in tables: - [](/sql-reference/constraints-overview) - [](/sql-reference/constraints-create) - [](/sql-reference/constraints-alter) - [](/sql-reference/constraints-drop) - [](/sql-reference/session-variables) — concepts and reference for defining and using variables in sessions. - [](/sql-reference/transactions) — concepts and reference for using transactions with SQL statements. - [](/sql-reference/literals-table) — concepts and reference for using table literals instead of a single scalar value in queries. - [](/sql-reference/snowflake-db) — reference for the SNOWFLAKE shared database, which is provided by Snowflake for querying/reporting on your organization, account, data sharing, and other object usage. - [](/sql-reference/info-schema) — concepts and reference for the Snowflake Information Schema, which consists of a set of metadata views and historical table functions for querying/reporting on objects in Snowflake. - [](/sql-reference/metadata) — concepts and reference for metadata fields in Snowflake. ## [](/sql-reference-commands) - [](/sql-reference/constructs) — structure of SQL queries in Snowflake. - [](/sql-reference/operators) — arithmetic, logical, and other types of operators. - [](/sql-reference/sql-ddl-summary) — overview of DDL commands. - [](/sql-reference/sql-dml) — commands for performing DML operations, including: - Inserting, deleting, updating, and merging data in Snowflake tables. - Bulk copying data into and out of Snowflake tables. - Staging files for bulk copying. - [](/sql-reference/sql-all) — alphabetical list of all the commands. - Commands categorized by the type of objects and operations they control, including: - General account-level objects (accounts, users, roles, security policies, integrations, etc.) and operations (failover & recovery, etc.). - Session-based operations (session context, queries, variables, transactions, etc.). - Virtual warehouses (for loading data and performing queries) and resource monitors (for controlling credit usage). - Databases, schemas, tables, and other schema-level objects (views, sequences, etc.). - Snowflake extensions and application development (user-defined functions, stored procedures, scripting, etc.). - Objects for sharing data (shares, listings, etc.). - Objects for classifying, protecting, and governing data (masking policies, row-access policies, tags, etc.). ## [](/sql-reference-functions) - [](/sql-reference/intro-summary-operators-functions) — combined summary of all system-defined functions. Can be used as a quick-reference. - [](/sql-reference/functions-all) — alphabetical list of all system-defined functions (scalar, aggregate, table, etc.). - [](/sql-reference/functions-aggregation) — functions that take multiple rows/values as input and return a single value. - [](/sql-reference/functions) — functions that take a single row/value as input and return a single value: - [](/sql-reference/expressions-byte-bit) - [](/sql-reference/expressions-conditional) - [](/sql-reference/functions-context) - [](/sql-reference/functions-conversion) - [](/sql-reference/functions-data-generation) - [](/sql-reference/functions-date-time) - [](/sql-reference/functions-differential-privacy) - [](/sql-reference/functions-encryption) - [](/sql-reference/functions-geospatial) - [](/sql-reference/functions-hash-scalar) - [](/sql-reference/functions-metadata) - [](/sql-reference/functions-notification) - [](/sql-reference/functions-numeric) - [](/sql-reference/functions-semistructured) - [](/sql-reference/functions-regexp) — regular expression (search) functions - [](/sql-reference/functions-string) - [](/sql-reference/functions-vector) - [](/sql-reference/functions-model-monitors) — functions that retrieve metrics from machine learning model monitors. - [](/sql-reference/functions-system) — functions that perform control operations or return system-level information. - [](/sql-reference/functions-table) — functions that return results in tabular format. - [](/sql-reference/functions-window) — functions that run analytic calculations, such as moving aggregations and rankings. - [](/sql-reference/functions-data-metric) — functions that enable data quality measurements for tables and views. - [](/sql-reference-stored-procedures) — stored procedures to facilitate using certain Snowflake features. ## [](/sql-reference-snowflake-scripting) - [AWAIT](/sql-reference/snowflake-scripting/await) - [BEGIN ... END](/sql-reference/snowflake-scripting/begin) - [BREAK](/sql-reference/snowflake-scripting/break) - [CANCEL](/sql-reference/snowflake-scripting/cancel) - [CASE](/sql-reference/snowflake-scripting/case) - [CLOSE](/sql-reference/snowflake-scripting/close) - [CONTINUE](/sql-reference/snowflake-scripting/continue) - [DECLARE](/sql-reference/snowflake-scripting/declare) - [EXCEPTION](/sql-reference/snowflake-scripting/exception) - [FETCH](/sql-reference/snowflake-scripting/fetch) - [FOR](/sql-reference/snowflake-scripting/for) - [IF](/sql-reference/snowflake-scripting/if) - [LET](/sql-reference/snowflake-scripting/let) - [LOOP](/sql-reference/snowflake-scripting/loop) - [NULL](/sql-reference/snowflake-scripting/null) - [OPEN](/sql-reference/snowflake-scripting/open) - [RAISE](/sql-reference/snowflake-scripting/raise) - [REPEAT](/sql-reference/snowflake-scripting/repeat) - [RETURN](/sql-reference/snowflake-scripting/return) - [WHILE](/sql-reference/snowflake-scripting/while) ## [](/appendices) - [](/sql-reference/conventions) Notational conventions used in the Snowflake documentation. - [](/sql-reference/reserved-keywords) List of words reserved for Snowflake SQL. ![](/static/images/sf-hero.jpg) --- title: Working with organizations and accounts source: https://docs.snowflake.com/en/guides-overview-manage.md section: General --- # Working with organizations and accounts The following topics describe how to manage Snowflake organizations and accounts. ## Organizations
[](/user-guide/organizations)
Learn about organizations, which link the accounts owned by your business entity. You can find the name of your organization, list the accounts in your organization, and change the name of your organization.
[](/user-guide/organization-administrators)
Learn about the system roles that administrators use to perform organization-level tasks.
[](/user-guide/organization-users)
Learn about using organization users for users who need access to multiple accounts within the organization.
[](/user-guide/organizations-manage-accounts)
Manage the lifecycle of an account such as creating it and deleting it. Also, manage the general characteristics of an account like its Snowflake edition.
[](/user-guide/organizations-connect)
Connect to accounts in your organization from SnowSQL, connectors, drivers, and through %sf-web-interface%.
## Organization accounts
[](/user-guide/organization-accounts)
Learn how organization administrators of multi-account organizations use an organization account. Also, use premium views in the ORGANIZATION_USAGE schema to track usage across the organization.
## Accounts
[](/user-guide/admin-account-identifier)
Learn how to use account identifiers to specify the account that you are using (e.g. to connect to the account, use %sf-web-interface%, etc.).
[](/user-guide/admin-trial-account)
Sign up for a trial account, convert that account to a paid account, and cancel the trial account.
[](/user-guide/admin-account-management)
View and alter parameters for your account.
[](/user-guide/admin-user-management)
Create, modify, view, and drop users in your account.
[](/release-notes/bcr-bundles/managing-behavior-change-releases)
Enable, disable, and check the status of behavior changes.
## Loading & Unloading Data Stages, COPY INTO, Snowpipe, file formats, and connectors for ingesting and exporting data. --- title: AbortQueryJob 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/abortqueryjob.md section: Loading & Unloading Data --- # AbortQueryJob 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-salesforce-processors-nar ## Description Aborts a Query Job in Salesforce using the Bulk API 2.0. ## Tags abort, bulk, job, preview, query, salesforce ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Job ID The ID of the job for which the status is checked. Salesforce Client Salesforce Client to interact with the APIs
## Relationships
Name Description comms.failure A FlowFile is routed to this relationship if the Query Job could not be aborted but the operation might be retried failure A FlowFile is routed to this relationship if the Query Job could not be aborted success If the Query Job has been successfully aborted, the FlowFile is routed to this relationship
## See also - [com.snowflake.openflow.runtime.processors.salesforce.DeleteQueryJob](/user-guide/data-integration/openflow/processors/deletequeryjob) - [com.snowflake.openflow.runtime.processors.salesforce.GetQueryJobResult](/user-guide/data-integration/openflow/processors/getqueryjobresult) - [com.snowflake.openflow.runtime.processors.salesforce.GetQueryJobStatus](/user-guide/data-integration/openflow/processors/getqueryjobstatus) - [com.snowflake.openflow.runtime.processors.salesforce.SubmitQueryJob](/user-guide/data-integration/openflow/processors/submitqueryjob) --- title: About Openflow source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/about.md section: Loading & Unloading Data --- # About Openflow This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about-byoc) - [](/user-guide/data-integration/openflow/about-spcs) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/processors/index) - [](/user-guide/data-integration/openflow/controllers/index) - [](/user-guide/data-integration/openflow/version-history) - [](/user-guide/data-engineering/row-timestamps) Snowflake Openflow is an integration service that connects any data source and any destination with hundreds of processors supporting structured and unstructured text, images, audio, video and sensor data. Built on [Apache NiFi](https://nifi.apache.org/), Openflow lets you run a fully managed service in your own cloud for complete control. The Openflow platform is currently available for deployment in customers' own VPCs in both AWS and %spcs%. For operational safeguards (backing up flows, avoiding data loss when you remove deployments or runtimes), see [Manage Openflow](/user-guide/data-integration/openflow/manage). This topic describes the key features of Openflow, its benefits, architecture, and workflow, and use cases. ## Key features and benefits
Open and extensible
An extensible managed service that's powered by Apache NiFi, enabling you to build and extend processors from any data source to any destination.
Unified data integration platform
Openflow enables data engineers to handle complex, bi-directional data extraction and loading through a fully managed service that can be deployed inside your own VPC or within your Snowflake deployment.
Enterprise-ready
Openflow offers out-of-the box security, compliance, and observability and maintainability hooks for data integration.
High speed ingestion of all types of data
One unified platform lets you handle structured and unstructured data, in both batch and streaming modes, from your data source to Snowflake at virtually any scale.
Continuous ingestion of multimodal data for AI processing
Near real-time unstructured data ingestion, so you can immediately chat with your data coming from sources such as Sharepoint, Google Drive, and so on.
## Openflow deployment types Openflow is supported in both the Bring Your Own Cloud (BYOC) and Snowpark Container Services (SPCS) versions.
%ofsfspcs%
%ofsfspcs%, using [](/developer-guide/snowpark-container-services/overview) (SPCS), provides a streamlined and integrated solution for connectivity. Because SPCS is a self-contained service within Snowflake, it's easy to deploy and manage. SPCS offers a convenient and cost-effective environment for running your data flows. A key advantage of %ofsfspcs% is its native integration with Snowflake's security model, which allows for seamless authentication, authorization, network security and simplified operations. When configuring %ofsfspcs-plural%, follow the process as outlined in [Setup Openflow - Snowflake Deployment](/user-guide/data-integration/openflow/setup-openflow-spcs).
Openflow - Bring Your Own Cloud
Openflow - Bring Your Own Cloud (BYOC) provides a connectivity solution that you can use to connect public and private systems securely and handle sensitive data preprocessing locally, within the secure bounds of your organization's cloud environment. BYOC refers to a deployment option where the Openflow data processing engine, or data plane, runs within your own cloud environment while Snowflake manages the overall Openflow service and control plane. When configuring BYOC deployments, follow the process as outlined in [](/user-guide/data-integration/openflow/setup-openflow-byoc).
## Use cases Use Openflow if you want to fetch data from any source and put it in any destination with minimal management, coupled with Snowflake's built-in data security and governance. Openflow use cases include: - Ingest data from unstructured data sources, such as Google Drive and Box, and make it ready for chat in your AI assistants with Snowflake Cortex or use the data for your own custom processing. - Replicate the change data capture (CDC) of database tables into Snowflake for comprehensive, centralized reporting. - Ingest real-time events from streaming services, such as Apache Kafka, into Snowflake for near real-time analytics. - Ingest data from SaaS platforms, such as LinkedIn Ads, to Snowflake for reporting, analytics, and insights. - Create an Openflow dataflow using Snowflake and NiFi [processors](/user-guide/data-integration/openflow/processors/index) and [controller services](/user-guide/data-integration/openflow/controllers/index). ## Security Openflow uses industry-leading security features that help ensure you have the highest levels of security for your account, and users, and all the data you store in Snowflake. Some key aspects include:
Authentication
- Runtimes use [Snowflake Managed Token](#label-openflow-snowflake-managed-token) as the default and recommended authentication method. - Snowflake Managed Token works consistently across SPCS and BYOC deployment types. - BYOC deployments can alternatively use key-pair authentication for explicit credential management.
Authorization
- Openflow supports fine-grained roles for RBAC. - ACCOUNTADMIN to grant privileges to be able to create deployments and runtimes.
Encryption in-transit
- Openflow connectors support TLS protocol, using standard Snowflake clients for data ingestion. - All the communications between the Openflow deployments and Openflow control plane are encrypted using TLS protocol.
Secrets management (BYOC)
- Integration with AWS Secrets Manager or Hashicorp Vault. For more information, see [Encrypted Passwords in Configuration Files](https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#encrypt-config_tool).
Private link support
- Openflow connectors are compatible with reading and writing data to Snowflake using inbound AWS PrivateLink.
Tri-Secret Secure support
- Openflow connectors are compatible with [Tri-Secret Secure](/user-guide/security-encryption-tss) for writing data to Snowflake.
## Snowflake Managed Token authentication Snowflake Managed Token is the recommended and default authentication method for Openflow runtimes to connect to Snowflake. This authentication method works consistently across both [Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/about-spcs) and [BYOC deployments](/user-guide/data-integration/openflow/about-byoc). Snowflake Managed Token provides a unified and simplified experience for configuring Snowflake connectivity. ### Key benefits
Simplified configuration
Snowflake Managed Token eliminates the need to generate, store, and rotate long-lived credentials such as key pairs. The token is automatically managed by Snowflake, reducing operational overhead.
Unified across deployment types
Whether you deploy Openflow in %spcs% (SPCS) or Bring Your Own Cloud (BYOC), you configure authentication the same way using the `SNOWFLAKE_MANAGED` authentication strategy.
Enhanced security
Tokens are short-lived and automatically refreshed, minimizing the risk associated with credential exposure.
### How it works When you configure a connector or processor to connect to Snowflake, select `SNOWFLAKE_MANAGED` as the **Snowflake Authentication Strategy**. The runtime automatically obtains and manages the token used to authenticate to Snowflake on your behalf. The behavior of Snowflake Managed Token varies based on your deployment type:
%ofsfspcs-plural%
When running in a Snowflake-managed deployment, the runtime uses [SPCS session tokens](/developer-guide/snowpark-container-services/overview) provided natively by the SPCS environment. These tokens are available at runtime and require no additional configuration.
BYOC deployments
When running in a BYOC deployment, the runtime uses [workload identity federation](/user-guide/workload-identity-federation) to authenticate to Snowflake. The runtime automatically exchanges its cloud provider identity (for example, an AWS IAM role) for a Snowflake token. To use Snowflake Managed Token in BYOC deployments, you must first configure [runtime roles](#label-deployment-byoc-setup-runtime-role) for your deployment.
### When to use Snowflake Managed Token Use Snowflake Managed Token for: - All new connector configurations in both SPCS and BYOC deployments. - Migrations from key-pair authentication to the simplified, managed authentication model. - Scenarios where you want to avoid managing key pairs or other long-lived credentials. ### Alternative authentication methods While Snowflake Managed Token is recommended, BYOC deployments also support key-pair authentication (`KEY_PAIR`) for cases where you require explicit credential management. For more information about key-pair authentication, see [](/user-guide/key-pair-auth). For information about the underlying authentication mechanisms, see the following: - [](/user-guide/workload-identity-federation): Information about the authentication mechanism used in BYOC deployments. - [](/developer-guide/snowpark-container-services/working-with-services): Information about how SPCS services authenticate to Snowflake. ## Architecture The following diagram illustrates the architecture of Openflow: ![Openflow architecture](/static/images/connectivity/openflow-architecture.png) The deployment agent installs and bootstraps the Openflow deployment infrastructure in your VPC and regularly sync container images from the Snowflake system image registry. Openflow components include:
Deployments
A deployment is where your data flows execute, within individual runtimes. You will often have multiple runtimes to isolate different projects, teams, or for SDLC reasons, all associated with a single deployment. Deployments come in two types [Bring Your Own Cloud (BYOC)](/user-guide/data-integration/openflow/about-byoc) and [Openflow - Snowflake](/user-guide/data-integration/openflow/about-spcs).
Control plane
The control plane is a layer containing all components used to manage and observe Openflow runtimes. This includes the Openflow service and API, which users interact with via the Openflow canvas or through interaction with Openflow APIs. On %ofsfspcs-plural%, the Control Plane consists of Snowflake-owned public cloud infrastructure and services as well as the control plane application itself.
BYOC deployments
BYOC deployments are deployments acting as containers for runtimes that are deployed in *your* cloud environment. They incur charges based on their compute, infrastructure, and storage use. See [](/user-guide/data-integration/openflow/cost-byoc) for more information.
%ofsfspcs-plural%
Openflow - Snowflake Deployments are containers for runtimes and are deployed using a [compute pool](/developer-guide/snowpark-container-services/working-with-compute-pool). They incur utilization charges based on their uptime and usage of compute. See [](/user-guide/data-integration/openflow/cost-spcs) for more information.
Runtime
Runtimes host data pipelines, with the framework providing security, simplicity, and scalability. You can deploy Openflow runtimes in your VPC using Openflow. You can deploy Openflow connectors to your runtimes, and also build completely new pipelines using Openflow processors and controller services.
%ofsfspcs% Runtime
Openflow - Snowflake Deployment Runtimes are deployed as [Snowpark Container Services](/developer-guide/snowpark-container-services/overview) service to an %ofsfspcs% deployment, which is represented by an underlying compute pool. Customers request a Runtime through the deployment, which executes a request on behalf of the user to service. Once created, customers access it via a web browser at the URL generated for that underlying service.
--- title: About Openflow - Snowflake Deployments source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/about-spcs.md section: Loading & Unloading Data --- # About %ofsfspcs-plural% This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/setup-openflow-spcs) - [](/user-guide/data-integration/openflow/cost-spcs) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/processors/index) - [](/user-guide/data-integration/openflow/controllers/index) %ofsfspcs% run on [Snowpark Container Services (SPCS)](/developer-guide/snowpark-container-services/overview) and provide a streamlined and integrated solution for data integration and connectivity across interoperable storage like Iceberg and Snowflake native storage. As a fully self-contained service within Snowflake, it's easy to deploy and manage, offering a convenient and cost-effective environment for running your data flows. A key advantage is its native integration with Snowflake's security model, which allows seamless authentication, authorization, and network security, and simplified operations. Although customers can have both BYOC and Snowflake Deployments, the following list use cases that are well-suited to Snowflake Deployments: - Incorporating full-fidelity data in the bronze layer: Landing raw data from various sources directly into Snowflake and using Openflow Snowflake Deployments to extract and load. - Enriching data: Running pipelines to enrich tables that already exist inside Snowflake. - From ingest to insight in one place: Building applications where the entire data lifecycle (ingest, process and serve) happens within the Snowflake ecosystem. - Transforming raw data to insights with AI: Ingesting unstructured data and then, for instance, using %sf-intelligence% to search and understand it better, all in concert with users' other structured data. - Employing reverse ETL: Closing the loop on insight generation by sharing with external operational systems via APIs, messaging infrastructure, and more. ## Understanding Snowflake roles and External Access Integrations %ofsfspcs-plural% must be able to interact with data sources and destinations that are typically outside Snowflake. In addition these deployments must also be able to communicate with and access Snowflake itself. Snowflake roles and external access integrations provide this support. ### What is a Snowflake role? A Snowflake role is a traditional Snowflake role, associated with a specific Openflow Runtime, and used for the following tasks: - Grant access to external access integrations (EAIs). These EAIs specify rules that allow the runtime to access the data sources and destinations from within Snowflake itself. - Grant access to Snowflake resources. - Grant access to resources that are connector-specific Snowflake roles are linked to Openflow session tokens, avoiding the need for customers to create separate service users and key pairs for authentication to Snowflake. ### What is an External Access Integration(EAI) within Openflow? An [External access integration](/developer-guide/external-network-access/external-network-access-overview) (EAI) is a Snowflake object designed to provide secure access to external resources, like source systems from which Openflow connectors pull external data. Openflow Snowflake Deployments use EAIs and network rules together to define the endpoints an Openflow connector can read from or write to. Data engineers define and configure EAIs and Snowflake roles specific to a given connector and its underlying runtime. ## Typical %ofsfspcs% workflow The following sections describe %ofsfspcs% concepts and workflows.
User persona Task Snowflake administrator - Configures core Snowflake and external access integrations. See [](/user-guide/data-integration/openflow/setup-openflow-spcs). - Creates a set of deployments in Snowflake. The Openflow UI is used to manage deployments and runtime creation and maintenance. The Openflow UI allows users to create, upgrade, and delete runtimes in all deployments. Data engineer (pipeline author, responsible for data ingestion)
- Works with a Snowflake administrator to configure required allow listed domains such that %ofsfspcs% can access the external data sources. - Creates Snowflake roles, external integrations, and other objects that can later be used by runtimes. - Uses the runtime canvas to build completely new flows or to configure deployed connectors. Creates a completely new flow or uses an existing connector as-is or as a starting point to customize.
Connectors are a simple way to solve for a specific integration use case, and less technical users can deploy them without assistance from a data engineer. Data engineer (pipeline operator) Configures flow parameters and runs the flow. Data engineer (responsible for transformation to silver and gold layers) Responsible for transforming data from the bronze layer that was populated by the pipeline to silver and gold layers for analytics. Business user Makes use of gold layer objects for analytics.
## Limitations - %ofsfspcs% is not supported in trial accounts. - Only a single %ofsfspcs% is supported per account. However, an account can have many %ofsfspcs% runtimes — each having a separate role and network access — which allows users to separate the workload. - Users with a default role of ACCOUNTADMIN can't login to %ofsfspcs% runtimes and will get an error message when attempting to do so. - Customers requiring private connectivity will need to configure [outbound PrivateLink](/user-guide/private-connectivity-outbound). Private Link is available to [](#label-snowflake-editions-business-critical) only. ### Next steps [](/user-guide/data-integration/openflow/setup-openflow-spcs) --- title: About Openflow Connector for Amazon Ads source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/amazon-ads/about.md section: Loading & Unloading Data --- # About %amazonads% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/amazon-ads/setup) This topic describes the basic concepts of %amazonads%, its workflow, and limitations. The %amazonads% automatically ingests [Amazon Ads](https://advertising.amazon.com/) data into your Snowflake account by using Amazon Ads [Reporting API V3](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/overview). Reporting API enables you to configure custom reports with selected [report types](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/overview), [columns](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/columns), filters and other groupings. Use this connector if you're looking to do the following: - Bring data from Amazon Ads for Ad performance statistics and insights ## Workflow 1. A **Amazon Ads administrator** gets access to Reporting API by following the [onboarding instructions](https://advertising.amazon.com/API/docs/en-us/guides/onboarding/overview), [generates a refresh token](https://advertising.amazon.com/API/docs/en-us/guides/get-started/retrieve-access-token) and [retrieves the client ID and client secret](https://advertising.amazon.com/API/docs/en-us/guides/onboarding/create-lwa-app#retrieve-your-security-credentials). 2. 1. A **Snowflake account administrator** performs the following: 3. Installs the connector. 4. Configures the connector with the required parameters, for example refresh token, report configuration, and database and schema names. 5. Runs the connector flow. The connector does the following: 1. Fetches the specified report as specified in the connector configuration. 2. Creates a temporary table and puts the report chunks in it. 3. Creates a table in the provided destination schema. 4. Synchronises data from the temporary table to the destination table. 5. Removes the temporary table. 6. **Marketing users** with Snowflake access can view and perform operations on the data downloaded from Amazon Ads to destination tables. ## Limitations - The connector supports incremental ingestion only for the daily value of `Report Time Increment` parameter. - Modification of the report definition when the processors are running might lead to data inconsistencies. To ensure consistency, stop the processors and clear the queues before updating the configuration. - If the Amazon Ads API [rate limit](https://advertising.amazon.com/API/docs/en-us/reference/concepts/rate-limiting) is reached, the data doesn't get ingested despite the connector attempting to pull data from the source system. ## Next steps [](/user-guide/data-integration/openflow/connectors/amazon-ads/setup) --- title: About Openflow Connector for Box source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/box/about.md section: Loading & Unloading Data --- # About %box% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/box/setup) This topic describes the basic concepts of %box%, its workflow, and limitations. The %box% connects a Box enterprise with Snowflake. Use this connector to do the following: - Ingest Box content for your own custom processing in Snowflake - Ingest Box content and make it ready for chat in your AI assistants with Snowflake Cortex - Use Box AI to extract metadata from Box content for enrichment in Snowflake - Add enriched metadata from Snowflake to content in Box ## Workflow 1. A **Box developer** creates a Box Platform app and submits it for authorization. 2. A **Box administrator** authorizes the app. 3. The **Box developer** then performs the following tasks: 1. Shares a Box folder with the app service account. 2. Shares a Platform app configuration JSON file and a folder ID with a Snowflake account administrator. 4. A **Snowflake account administrator** performs the following tasks: 1. Installs the connector. 2. Configures the connector with Snowflake connection details and the data provided by the Box developer. 3. Runs the connector flow. The connector does the following: 1. Creates the required tables, stages, and a Cortex Search service in the specified Snowflake schema. 2. Fetches Box file content and permissions from the folder specified in the connector configuration. 3. Runs parsing and chunking on the fetched documents, and saves them in Snowflake tables. The saved chunks are automatically indexed by the Cortex Search service. 5. A **Chatbot developer** uses the Cortex Search service to build a chatbot application. ## Limitations - [Cortex Parse Document limitations and requirements](#label-parse-document-requirements) - [Cortex Search limitations](#label-cortex-search-overview-limitations) - Changes caused by moving folders out of the specified root folder aren't captured during incremental ingestion. - The connector ingests only the supported file types and ignores others. These limitations apply to the predefined connector flow. If the flow is customized and doesn't use some or all of the predefined components, then these limitations may not apply. ## Next steps [](/user-guide/data-integration/openflow/connectors/box/setup) --- title: About Openflow Connector for Google Ads source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/google-ads/about.md section: Loading & Unloading Data --- # About %gadsof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/google-ads/setup) This topic describes the basic concepts of Openflow Connector for Google Ads, steps to set it up, and limitations. Google Ads is an online advertising platform where advertisers can create and run ads to promote their products or services. Through Google Ads, you can create online ads to reach people exactly when they're interested in offered products and services. The Openflow Connector for Google Ads: - Automatically ingests Google Ads data into your Snowflake account. - Downloads data using the [Google Ads API](https://cloud.google.com/endpoints/docs/openapi/enable-api). - Lets you configure custom reports with chosen attributes, [metrics](https://developers.google.com/google-ads/api/fields/v17/metrics), and [segments](https://support.google.com/google-ads/answer/2454072). Use this connector if you're looking to do the following: - Import metrics from Google Ads for performance tracking and optimization ## Use cases ### Run the connector in different ingestion modes There are two ways of ingesting data incrementally and as a snapshot. Snapshot mode is a default one and is on as long as **segments.date** segment is not selected. It creates a table in the provided destination schema and appends on each schedule the newest data from Google Ads. To configure incremental ingestion user has to fill Report Segments parameter with segment named **segments.date**, other segments can be still preset. Then data will be overlapped between the one we fetched previously and the date range of the current run. The overlap is caused by the conversion window as we need to ask for the historical data for the number of days that's equal to the conversion window, for example, if the conversion window is set to 14 days and the ingestion happens every day, there is 13 days of overlap. ### Reconfigure currently running connector The report configuration can be changed when the processor is running. To do so go to GetGoogleAdsReportContext and change your desired parameters. Upon changing only Report Attributes, Metrics or Segments parameters, the current destination table will be removed and a new one with updated schema will be created, so before updating them please be aware that already downloaded data will be deleted. When the Resource Name or Account Client ID will be changed a new table will be created. The old destination table will not be dropped. Modifying the Schedule and Conversion window will not affect in any way the data already fetched in the destination table. When the Start Date will be changed, the connector will perform a single ingestion from that date to the current date and then proceed as normally in incremental mode. If there is data downloaded from the period between new Start Date and current date it will be replaced after change. Data before the new Start Date will not be affected. #### Rate Limiting Restrictions [Google Ads API limits](https://developers.google.com/google-ads/api/docs/access-levels) govern how many requests can be made within a given time frame. If your flow exceeds the allowed quota, syncs may slow down or fail with an error. This mostly occurs when your access token makes higher number of requests than the source typically allows. In such cases, we recommend applying for higher access quota (wherever appliable) or reducing the sync frequency. ## Limitations - Filtering is not supported. Instead, data can be filtered after ingestion. - Custom column ingestion is not supported. - When segmenting reports, if all selected metrics are zero, they are always excluded. - Attributed resource ingestion is not supported. Instead, multiple reports can be joined after ingestion. - There can be only one report for selected resource name and client id pair. - Modification of report definition when processors are running may lead to data inconsistencies. To ensure consistency, before updating configuration stop processors and clear queues. ## Next steps [](/user-guide/data-integration/openflow/connectors/google-ads/setup) --- title: About Openflow Connector for Google Drive source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/google-drive/about.md section: Loading & Unloading Data --- # About %gdof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/google-drive/setup) The %gdof% connects a Google Workspace Shared Drive and Snowflake to ingest files and user permissions and keeps them up to date. %gdof% also supports the Cortex Search service and can make ingested files ready for conversational analysis for use in AI Assistants using SQL, Python or REST APIs. Use this connector if you're looking to do the following: - Ingest Google Drive content for your own custom processing in Snowflake - Ingest Google Drive content and make it ready for chat in your AI assistants with Snowflake Cortex ## Limitations 1. [Cortex Parse Document limitations and requirements](#label-parse-document-requirements). 2. [Cortex Search limitations](#label-cortex-search-overview-limitations). 3. Changes caused by moving or renaming folders aren't captured during incremental ingestion. 4. The connector supports only explicit Google Permissions for Users and Groups. It does not currently support authentication models for links shared with Anyone. 5. The connector ingests only the supported file types and ignores others. Please note, the limitations are listed for the predefined versioned flow. If the flow was customized, and it doesn't use some of the predefined components, the limitations related to these components won't apply. ## Next steps [](/user-guide/data-integration/openflow/connectors/google-drive/setup) --- title: About Openflow Connector for Google Sheets source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/google-sheets/about.md section: Loading & Unloading Data --- # About %sheets% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/google-sheets/setup) This topic describes the basic concepts of %sheets%, its workflow, and limitations. The %sheets% enables the ingestion of Google Sheets data into Snowflake. It uses the Google Sheets API to fetch data and persist that data in a table dedicated to a given range from a sheet. The connector creates the destination table in the database and the schema provided in the configuration. Use this connector if you're looking to do the following: - Load data from Google sheets into Snowflake tables for reporting, analytics and insights ## Workflow 1. A **Google Cloud administrator** creates a service account and a key as described in [Service account credentials](https://developers.google.com/workspace/guides/create-credentials#service-account). 2. A **Google Sheets user** creates a Google Sheets spreadsheet and shares it with the service account. The first row of data represents the column names in the destination table that the connector will create. It cannot contain actual data. If a column contains multiple data types, the connector selects the least restrictive type. 3. A **Snowflake account administrator** configures the connector as follows: 1. Installs the connector. 2. Creates Snowflake warehouse, destination database, destination schema, and key. 3. Specifies the required parameters for the connector, such as Snowflake Warehouse, Destination Database, Snowflake Key, and Spreadsheet ID. 4. Runs the connector flow. The connector performs the following tasks when run in Openflow: 1. Retrieves the data from a specified spreadsheet. 2. Creates and updates the destination table to reflect the schema of data from Google Sheets. If the destination table is not created, then it is truncated. 3. Inserts the data into the destination table. ## Limitations - The connector saves numeric values from a sheet only as INT or DOUBLE types. Because of this, small rounding errors may occur in the least significant digits if sheets contain floating point numbers. The connector currently doesn't support higher precision. - Incremental load is not supported. The connector uses the truncate and load ingestion strategy. ## Next steps [](/user-guide/data-integration/openflow/connectors/google-sheets/setup) --- title: About Openflow Connector for HubSpot source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/hubspot/about.md section: Loading & Unloading Data --- # About %hubspot% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/hubspot/setup) This topic describes the basic concepts of %hubspot%, its workflow, and limitations. The %hubspot% ingests HubSpot data into Snowflake. It uses the HubSpot API to retrieve data, which is then stored in a Snowflake table. Data ingestion happens in the following two phases: 1. Initial load, where all data is retrieved during the first API call. 2. Incremental load, which merges the updates and new data into the destination table and uses timestamps from previous calls to limit the result to the issues that were updated since the last data load. For more information about HubSpot private apps, see [Private apps](https://developers.hubspot.com/docs/guides/apps/private-apps/overview). Use this connector if you're looking to do the following: - Get HubSpot CRM data into Snowflake for reporting, analytics, and insights ## Workflow 1. A HubSpot administrator performs the following tasks: 1. Generates an API token within the HubSpot instance with the necessary scopes required for the API requests intended to make. This token is used by the connector for authentication. 2. Defines the criteria to search objects like `Object Types` and `Updated After (optional)` fields. 2. A Snowflake account administrator performs the following tasks: 1. Installs the connector. 2. Configures the connector parameters: - Provides the HubSpot private app API token. - Defines the criteria for the objects being ingested by providing filters. - Sets the desired database and schema names within Snowflake. 3. Runs the connector flow. Upon execution, the connector does the following: 1. Creates an API call to fetch objects from the configured HubSpot instance. 2. Extracts the relevant data. 3. Creates the configured destination table in the Snowflake database if the API call returned at least one result. 4. Loads raw data into the specified Snowflake table and creates a processed view on top of the raw data. ## Limitations - When multiple object types are defined, filtering by 'Updated After' applies to all object types defined in the parameter context. - Currently, the connector supports basic authentication using a HubSpot private app and API token. This means that the connector is only able to ingest data that is accessible to the owner of the API token. - The processors are designed to work on the primary node only with one thread. - The number of calls your private app can make is based on your account subscription. To learn more about HubSpot private app limits, see [Private app limits](https://developers.hubspot.com/docs/guides/apps/private-apps/overview#private-app-limits). ## Next steps [](/user-guide/data-integration/openflow/connectors/hubspot/setup) --- title: About Openflow Connector for Jira Cloud source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/jira-cloud/about.md section: Loading & Unloading Data --- # About %jira% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-core) - [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-agile) - [](/user-guide/data-integration/openflow/connectors/jira-cloud/migrate-from-legacy) This topic describes the basic concepts of %jira%, its workflow, and limitations. The %jira% ingests data from multiple Atlassian Jira Cloud entities into Snowflake. It consists of two separate flows: - **Core flow** — uses the [Jira Cloud REST API](https://developer.atlassian.com/cloud/jira/platform/rest/v3/intro/#about) to retrieve issues, projects, comments, changelogs, worklogs, users, deleted issues, votes, watchers, remote links, and issue security schemes. - **Agile flow** — uses the [Jira Agile REST API](https://developer.atlassian.com/cloud/jira/software/rest/intro/#about) to retrieve boards, sprints, board-sprint mappings, board-project mappings, and board-issue mappings. Both flows store data in dedicated Snowflake tables with explicit column schemas. The two flows can write to the same Snowflake destination schema, since they create tables with different names. Use this connector if you're looking to do the following: - Centralize Jira data in Snowflake for cross-team visibility and deeper insights into engineering, support, and project workflows - Ingest a broad set of Jira entities into separate, query-ready Snowflake tables, with a selectable subset of optional tables - Extract Jira issues with per-project parallel ingestion for faster data loads - Track deleted issues via Jira audit log polling - Optionally ingest Jira Agile data using the separate agile flow If you previously deployed an earlier version of the Jira Cloud connector, see [](/user-guide/data-integration/openflow/connectors/jira-cloud/migrate-from-legacy) for a step-by-step migration guide. ## Destination tables The connector creates the following tables in the configured Snowflake destination schema. Most tables have a fixed column schema defined by the connector. The `ISSUE` table is the exception: its columns are driven by the `Issue Fields` configuration and may include custom fields from your Jira instance. See [](#label-jira-core-issue-fields) for details. ### Core flow tables The `ISSUE`, `PROJECT`, `USER`, and `FIELD` tables are always created. The remaining tables are created only when the corresponding table name is listed in the `Enabled Tables` parameter (or, for `DELETED_ISSUE`, when delete tracking is enabled). See [](#label-jira-core-ingestion-parameters) for details.
Table Enabled by Contents ISSUE Always One row per Jira issue. The set of columns is driven by the `Issue Fields` configuration and may include custom fields. PROJECT Always One row per Jira project visible to the API token owner. USER Always Jira users encountered during ingestion. FIELD Always Metadata for Jira issue fields used to drive the dynamic `ISSUE` schema. CHANGELOG `CHANGELOG` Issue field change history, one row per changelog entry. COMMENT `COMMENT` Comments attached to issues, one row per comment. ISSUE_REMOTE_LINK `ISSUE_REMOTE_LINK` Remote links attached to issues. ISSUE_SECURITY_SCHEME `ISSUE_SECURITY_SCHEME` Issue-level security schemes and levels defined in the Jira instance. ISSUE_VOTE `ISSUE_VOTE` Per-issue vote records. ISSUE_WATCHER `ISSUE_WATCHER` Per-issue watcher records. PERMISSION `PERMISSION` Global and project permission definitions. PROJECT_COMPONENT `PROJECT_COMPONENT` Components defined in each project. PROJECT_VERSION `PROJECT_VERSION` Release versions defined in each project. USER_GROUP `USER_GROUP` Group memberships per user. WORKLOG `WORKLOG` Time tracking entries on issues. DELETED_ISSUE `Deletes Fetch Strategy = AUDIT` Issues deleted from Jira, tracked via audit log.
### Agile flow tables The following tables are created by the agile flow. To populate these tables, install and run the agile flow separately from the core flow. The `BOARD` table is always created. The remaining tables are gated by the agile flow's own `Enabled Tables` parameter.
Table Enabled by Contents BOARD Always Agile boards visible to the API token owner. SPRINT `SPRINT` Sprints across all ingested boards. BOARD_SPRINT `SPRINT` Board-to-sprint mappings. BOARD_PROJECT `BOARD_PROJECT` Board-to-project mappings. BOARD_ISSUE `BOARD_ISSUE` Board-to-issue mappings.
### Connector-managed columns In addition to the columns derived from the Jira API response, the connector adds the following metadata columns. `_SNOWFLAKE_INSERTED_AT` and `_SNOWFLAKE_UPDATED_AT` are added to every destination table. `_SNOWFLAKE_DELETED` is added only to tables that track soft deletes. To see which tables have it, inspect the destination tables in Snowflake.
Column Type Purpose `_SNOWFLAKE_INSERTED_AT` `TIMESTAMP_NTZ` When the row was first inserted by the connector. `_SNOWFLAKE_UPDATED_AT` `TIMESTAMP_NTZ` When the row was last updated by the connector. `_SNOWFLAKE_DELETED` `BOOLEAN` `TRUE` when the source record is no longer present in the corresponding Jira API response (for example, an issue deleted in Jira, or a comment removed from an issue). The row remains in the destination table. Filter on `_SNOWFLAKE_DELETED = FALSE` to exclude soft-deleted records.
## Workflow 1. A **Jira Cloud administrator** performs the following tasks: 1. Generates an API token within the Jira instance. This token is used by the connector for authentication. Both tokens with scopes and tokens without scopes are supported, although tokens with scopes are recommended for fine-grained access control. The required scopes depend on which features are enabled. See [](#label-jira-core-api-scopes) for details. 2. Optionally, if delete tracking is required, ensures the API token owner has the **Administer Jira** global permission for access to the audit log endpoint. 2. A **Snowflake account administrator** performs the following tasks: 1. Installs the core flow, the agile flow, or both, depending on which entities are needed. 2. Configures each flow: 1. Provides the Jira API token and email address. 2. Specifies the Jira instance URL. 3. For the core flow, optionally filters ingestion to specific projects using `Project Keys Filter` and configures the issue fields to ingest. 4. Sets the database and schema names in the Snowflake account. 3. Runs the flow in the Openflow canvas. Upon execution: - The **core flow** discovers projects and registers them in the ingestion state service, fetches issues in parallel across projects along with the per-issue tables listed in `Enabled Tables` (and optionally deleted issues), and fetches worklogs, users, user groups, permissions, project components, project versions, and issue security schemes on independent schedules. - The **agile flow** fetches boards, sprints, board-project mappings, board-sprint mappings, and board-issue mappings. 3. **Snowflake business users** can then query the destination tables directly with standard SQL, without needing to flatten JSON. ## Openflow requirements - The minimum runtime size is `Small`. When you have many tables listed in `Enabled Tables`, more processors run concurrently and the default Small runtime thread budget may become a bottleneck. In that case, move to a `Medium` runtime (or larger). - The connector supports multi-node Openflow runtimes. Each flow's state service is cluster-aware, and the flow connections use load balancing where appropriate so that work is distributed across available nodes. If you want to run on multiple nodes, configure a static cluster size by setting **Min nodes** to the target node count rather than relying on autoscaling. The connector doesn't generate enough sustained load on the runtime to trigger the runtime to scale up additional nodes on its own. - For Jira instances with many projects, a multi-node runtime is recommended. Per-project work is distributed across nodes, so adding nodes increases the number of projects the connector processes in parallel. Use the project count as a rough guide when sizing **Min nodes**. - The connector is primarily limited by Jira API rate limits rather than runtime compute capacity. Increasing the runtime size beyond `Medium`, or adding more nodes than the API rate budget can sustain, is unlikely to improve ingestion speed. - The core flow and agile flow can run on the same or separate Openflow runtimes. If you run both flows on the same runtime, `Small` isn't sufficient — use at least `Medium` (or larger, depending on the load). ## Limitations - Basic authentication using an email and API token is the only supported authorization method. The connector can only ingest data accessible to the owner of the API token. - Delete tracking via the `AUDIT` strategy requires the API token owner to have the **Administer Jira** global permission. The Jira audit log has limited retention (typically 6 months for Jira Premium, less for Free or Standard plans). If the connector is paused for longer than the retention period, delete events can be missed. - Schema evolution for the `ISSUE` table is additive only. New columns can be added, but column type changes or removals aren't supported. If a Jira custom field type changes, the connector may require redeployment. - The `ISSUE` table schema is dynamic and depends on the `Issue Fields` configuration. Fields not included in the resolved field set aren't loaded, and there is no raw JSON fallback. - Narrowing `Project Keys Filter` to remove a project doesn't delete that project's rows from the destination tables. Rows that were previously ingested remain in place and are no longer updated. To remove orphaned rows after a filter change, manually delete them from the destination tables. - Agile data (boards, sprints, board mappings) is fully re-fetched on every scheduled run of the agile flow. For Jira instances with many boards, this may result in increased API usage. - Each connector instance can be associated with only one Jira Cloud site. ## Next steps - [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-core) to install the core flow. - [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-agile) to install the agile flow. - [](/user-guide/data-integration/openflow/connectors/jira-cloud/migrate-from-legacy) if you're moving from a previous version of the Jira Cloud connector. --- title: About Openflow Connector for LinkedIn Ads source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/linkedin-ads/about.md section: Loading & Unloading Data --- # About %linkedinads% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/linkedin-ads/setup) This topic describes the basic concepts, workflow, and limitations of %linkedinads%. The %linkedinads% enables you to ingest LinkedIn Ads metrics into Snowflake. This connector uses the [Reporting API](https://learn.microsoft.com/en-us/linkedin/marketing/integrations/ads-reporting/ads-reporting?view=li-lms-2025-02&tabs=http) to fetch data. The connector persists data in a table dedicated to a given report. Each report can be configured to contain metrics, pivots, and facets chosen by the user. The connector creates the destination table in the database and the schema provided in the configuration. Use this connector if you're looking to do the following: - Import campaign performance data from LinkedIn Ads to Snowflake for reporting, analytics and insights ## Workflow 1. A **LinkedIn Ads user** obtains credentials required to connect to LinkedIn Ads API. 2. A **Snowflake account administrator** performs the following tasks: 1. Installs the connector. 2. Configures the connector with the required parameters. 3. Runs the connector. The following happens when the connector is run in Openflow: 1.
Retrieves the data based on the specified configuration.
If the Time Granularity parameter is set to `DAILY`, then the connector downloads only the data for a calculated timeframe. In other cases, the connector downloads all the data from the start date to the current time.
2. Creates a temporary table and inserts the downloaded data into it. 3. Recreates or updates the destination table to reflect the schema of data from LinkedIn Ads. If you change the schema, the connector drops the destination table and recreates it with a new schema. If `DAILY` time granularity is chosen in the Time Granularity parameter, then outdated data is deleted from the destination table. 4. Inserts the data into the destination table with an additional insertion timestamp. 5. Drops the temporary table. ## Limitations - All metrics of type BigDecimal are saved as Strings. [Conversion functions](/sql-reference/functions-conversion) allow you to convert values manually to numeric types with chosen scale and precision. - Some metrics and pivots return values that are IDs. The connector does not use the [URN resolution](https://learn.microsoft.com/en-us/linkedin/marketing/integrations/ads-reporting/ads-reporting?view=li-lms-2025-02&tabs=http#urn-resolution). - The connector uses the [Authorization Code Flow](https://learn.microsoft.com/en-us/linkedin/shared/authentication/authorization-code-flow?context=linkedin%2Fcontext&tabs=HTTPS1) because the [Client Credentials Flow](https://learn.microsoft.com/en-us/linkedin/shared/authentication/client-credentials-flow?context=linkedin%2Fcontext&tabs=HTTPS1) is not available for Marketing API. This means that the refresh token must be refreshed manually every year. --- title: About Openflow Connector for Meta Ads source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/meta-ads/about.md section: Loading & Unloading Data --- # About %metaads% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/meta-ads/setup) This topic describes the basic concepts of %metaads%, its workflow, and limitations. [Meta Ads](https://www.facebook.com/business/ads) is an online advertising platform, which you can use to create and run ads to promote your products or services on Meta products, such as Facebook and Instagram. The %metaads% automatically ingests Meta Ads data into your Snowflake account by using [Meta Ads Insights API](https://developers.facebook.com/docs/marketing-api/insights). Insights API enables you to configure custom reports with selected fields, [breakdowns](https://developers.facebook.com/docs/marketing-api/insights/breakdowns), and other aggregations. Use this connector if you're looking to do the following: - Bring Meta Ads data to unify and analyze your marketing performance ## Workflow 1. A **Meta Ads administrator** performs the following: 1. [Creates a Meta Ads app](https://developers.facebook.com/docs/development/create-an-app/). 2. [Enables Marketing API](https://developers.facebook.com/docs/marketing-api/get-started). 3. [Acquires a long-lived token](https://developers.facebook.com/docs/facebook-login/guides/access-tokens/get-long-lived/). 2. A **Snowflake account administrator** performs the following: 1. Installs the connector. 2. Configures the connector with the required parameters, for example long-lived token, report configuration, and database and schema names. 3. Runs the connector flow. The connector does the following: 1. Fetches the specified report as specified in the connector configuration. 2. Creates a temporary table and puts the report chunks in it. 3. Creates a table in the provided destination schema. 4. Synchronises data from the temporary table to the destination table. 5. Removes the temporary table. 3. **Marketing users** with Snowflake access can view and perform operations on the data downloaded from Meta Ads to destination tables. ## Limitations - The connector supports incremental ingestion only for the daily value of `Report Time Increment` parameter. - Modification of the report definition when the processors are running might lead to data inconsistencies. To ensure consistency, stop the processors and clear the queues before updating the configuration. - If the Meta Ads API [rate limit](https://developers.facebook.com/docs/graph-api/overview/rate-limiting/#ads-insights) is reached, the data doesn't get ingested even though the connector continues attempting to pull data from the source system. To increase the rate limit, [change the app access type](https://developers.facebook.com/docs/marketing-api/overview/rate-limiting) from `Standard access` to `Advanced access` of the Ads Management Standard Access, and enable the `ads_read` and `ads_management` [permissions](https://developers.facebook.com/docs/permissions/). - Data can be fetched only from the past 37 months, as defined by Meta Ads. ## Next steps [](/user-guide/data-integration/openflow/connectors/meta-ads/setup) --- title: About Openflow Connector for Microsoft Dataverse source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/dataverse/about.md section: Loading & Unloading Data --- # About %dataverse% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/dataverse/setup) The Openflow Connector for Microsoft Dataverse connects a Microsoft Dataverse storage and Snowflake to ingest Microsoft Dataverse tables and keeps them up to date on Snowflake side. The outcome of the connector are selected tables replicated on Snowflake Account in a database and schema specified by the user. Use this connector if you're looking to do the following: - Integrate data from Microsoft Power Platform and Dynamics 365 applications with Snowflake for holistic business insights ## Rate limiting restrictions [Microsoft Dataverse API limits](https://learn.microsoft.com/en-us/power-apps/developer/data-platform/api-limits?tabs=sdk#how-service-protection-api-limits-are-enforced) govern how many requests can be made within a given time frame. If your flow exceeds the allowed quota, syncs may slow down or fail with an error. This mostly occurs when your access token makes higher number of requests than the source typically allows. In such cases, we recommend applying for higher access quota (wherever applicable) or reducing the sync frequency. ### Limitations - Only tables with enabled change tracking can be replicated - Schema of destination tables is discovered from the database metadata through REST APIs. Whenever new columns are added to the table, they appear in the destination table. Changes and removals of columns are not reflected in the destination table. - All [limitations of Microsoft Dataverse Web API](https://learn.microsoft.com/en-us/power-apps/maker/data-platform/api-limits-overview) apply. - Supported set of column types is limited by set of types supported by [Snowpipe Streaming](#label-snowpipe-streaming-supported-java-data-types). - Each instance of the connector supports a single schedule. If you need multiple schedules, then you need to install multiple instances of the connector. - Empty tables are not replicated. - Removal of a table is not replicated. If a table was replicated previously and is removed, it will remain in destination schema. - Delta tokens used for change tracking expire after 7 days of inactivity by default. If the connector is not run for more than 7 days, the delta token expires and the connector must perform a full resync of the affected tables. This duration is controlled by the `ExpireChangeTrackingInDays` setting in the Microsoft Dataverse organization configuration. ### Next steps [](/user-guide/data-integration/openflow/connectors/dataverse/setup) --- title: About Openflow Connector for MySQL source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/mysql/about.md section: Loading & Unloading Data --- # About %mysql% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/mysql/setup) - [](/user-guide/data-integration/openflow/connectors/mysql/data-mapping) This topic describes the basic concepts of %mysql%, its workflow, and limitations. ## About the %mysql% The %mysql% connects a MySQL database instance to Snowflake and replicates data from selected tables in near real-time or on a specified schedule. The connector also creates a log of all data changes, which is available along with the current state of the replicated tables. The connector also supports MariaDB as a source database. ## Use cases Use this connector if you're looking to do the following: - CDC replication of MySQL or MariaDB tables into Snowflake for comprehensive, centralized reporting ## Supported MySQL versions The following table lists the tested and officially supported MySQL versions.
8.0 8.4 [Standard MySQL](https://www.mysql.com/) Yes Yes [AWS RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_MySQL.html) Yes Yes [Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraMySQLReleaseNotes/Welcome.html) Yes, as Version 3 Not applicable. Aurora 8.4 isn't currently supported. [GCP Cloud SQL](https://cloud.google.com/sql/mysql?hl=en) Yes Yes [Azure Database](https://azure.microsoft.com/en-us/products/mysql/) Yes Yes [Percona Server](https://www.percona.com/software/mysql-database/percona-server) Yes Yes
## Supported MariaDB versions The following table lists the tested and officially supported MariaDB versions.
11.4 or later [Standard MariaDB](https://mariadb.org/) Yes [AWS RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_MariaDB.html) Not applicable. AWS RDS for MariaDB isn't currently supported.
## Openflow requirements - The runtime size must be at least Medium. For guidance on choosing a size and resizing the runtime later, see [Runtime sizing](#label-mysql-runtime-sizing). - The connector doesn't support multi-node Openflow runtimes. Configure the runtime for this connector with **Min nodes** and **Max nodes** set to `1`. ## Limitations - The connector supports MySQL version 8 or later and MariaDB version 11.4 or later. - The connector supports only username and password authentication with MySQL or MariaDB. - Only database tables that have a primary key or a NOT NULL unique index can be replicated. When no primary key is defined, MySQL's InnoDB storage engine automatically promotes the first NOT NULL unique index to act as the primary key, and the connector uses it as the replication key. - The connector doesn't replicate tables with data that exceeds [Snowflake's type limitations](/sql-reference/intro-summary-data-types). - The connector doesn't replicate columns of types GEOMETRY, GEOMETRYCOLLECTION, LINESTRING, MULTILINESTRING, MULTIPOINT, MULTIPOLYGON, POINT, and POLYGON. - The connector has the [Group Replication Limitations of MySQL](https://dev.mysql.com/doc/refman/8.4/en/group-replication-limitations.html#group-replication-limitations-transaction-size). This means that a single transaction must fit into a binary log message of size no more than 4 GB. - The connector doesn't support replicating tables from a reader instance in Amazon Aurora as Aurora reader instances don't maintain their own binary logs. - The connector supports source table schema changes with the exception of changing primary key definitions and changing the precision or the scale of a numeric column. - For `DATE` and `DATETIME` types in MySQL or MariaDB, any values that contain a zero month or day are mapped to the Unix epoch ('1970-01-01' or '1970-01-01T00:00'). Date zero ('0000-00-00') is also mapped to the Unix epoch. Values with a zero year are converted to year one, for example, '0000-05-30 7:59:59' becomes '0001-05-30T7:59:59'). The remaining date and time components are unchanged. - For `TIMESTAMP` types in MySQL or MariaDB, value '0000-00-00 00:00:00' is mapped to the Unix EPOCH ('1970-01-01T00:00Z'). - The connector doesn't capture cascade delete operations (ON DELETE CASCADE). Foreign key cascade deletions are executed internally by InnoDB (the storage engine used by MySQL and MariaDB) and aren't recorded in the binary log, resulting in incomplete replication of dependent table deletions to Snowflake. Limitations affecting certain table columns can be bypassed by excluding these specific columns from replication. ## Workflow 1. A **MySQL or MariaDB database administrator** performs the following tasks: - Configure MySQL or MariaDB replication settings - Create credentials for the connector - (Optionally) Provide the SSL certificate. 2. A **Snowflake account administrator** performs the following tasks: 1. Creates a service user for the connector, a warehouse for the connector, and a destination database for the replicated data. 2. Installs the connector. 3. Specifies the required parameters for the flow template. 4. Runs the flow. The connector performs the following tasks when run in Openflow: 1. Creates a schema for journal tables. 2. Creates the schemas and destination tables matching the source tables configured for replication. 3. Starts replicating the tables. For details on the replication process, see [How tables are replicated](#how-tables-are-replicated). ## How the connector works The following sections describe how the connector works in various scenarios, including replication, changes in schema, and data retention. ### Data replication The name of the destination schema is determined by the `Destination Schema Pattern` parameter. For more information, see [](#label-of-mysql-destination-parameters). By default, the destination schema name matches the source database name, so the fully qualified name of a destination table is: `..` ### How tables are replicated The tables are replicated in the following stages: 1. Schema introspection: The connector discovers the columns in the source table, including the column names and types, then validates them against Snowflake's and the connector's [Limitations](#limitations). Validation failures cause this stage to fail, and the cycle completes. After successful completion of this stage, the connector creates an empty destination table. 2. Snapshot load: The connector copies all data available in the source table into the destination table. If this stage fails, then no more data is replicated. After successful completion, the data from the source table is available in the destination table. 3. Incremental load: The connector tracks changes in the source table and applies those changes to the destination table. This process continues until the table is removed from replication. Failure at this stage permanently stops replication of the source table, until the issue is resolved. This connector can be configured to immediately start replicating incremental changes for newly added tables, bypassing the snapshot load phase. This option is often useful when reinstalling the connector in an account where previously replicated data exists and you want to continue replication without having to re-snapshot tables. For details on the bypassing snapshot load and using the incremental load process, see [Incremental replication](/user-guide/data-integration/openflow/connectors/mysql/incremental-replication). Interim failures, such as connection errors, do not prevent tables from being replicated. Permanent failures, such as unsupported data types, do prevent tables from being replicated. If a permanent failure prevents a table from being replicated, remove the table from the list of replicated tables. After you address the problem that caused the failure, you can add the table back to the list of replicated tables. ### Schema changes The connector picks up schema changes on source tables during replication, except for the cases listed under Limitations. When a column is added on the source, the connector adds it to the destination table and starts replicating it on the next poll. Existing rows in the destination table aren't backfilled with values for the new column. When a column is dropped on the source, the connector doesn't drop the corresponding column on the destination. Instead, it renames the column with a suffix (by default, `__SNOWFLAKE_DELETED`) to preserve existing data. If the same column is later re-added on the source, the connector adds it as a new column, so the destination table ends up with both the new column and the soft-deleted one (for example, `A` and `A__SNOWFLAKE_DELETED`). If that same column is then dropped a second time, replication for the affected table fails because the soft-deleted column name is already taken. To recover, restart replication for the affected table: see [Restart table replication](#label-of-mysql-restart-table-replication). The same soft-delete mechanism applies when you change a table's Column Filter JSON. For details, see [Replicate a subset of columns in a table](#label-mysql-connector-replication-subset-of-columns). ### Oversized values The connector doesn't replicate individual values larger than 16 MB. By default, processing such a value marks the associated table permanently failed. To change this behavior, modify the **Oversized Value Strategy** destination parameter. # Understanding data retention The connector follows a data retention philosophy where customer data is never automatically deleted. You maintain full ownership and control over your replicated data, and the connector preserves historical information rather than permanently removing it. This approach has the following implications: - Rows deleted from the source table are soft-deleted in the destination table rather than physically removed. - Columns dropped from the source table are renamed in the destination table rather than dropped. - Journal tables are retained indefinitely and are not automatically cleaned up. ## Destination table metadata columns Each destination table includes the following metadata columns that track replication information:
Column name Type Description `_SNOWFLAKE_INSERTED_AT` TIMESTAMP_NTZ The timestamp when the row was originally inserted into the destination table. `_SNOWFLAKE_UPDATED_AT` TIMESTAMP_NTZ The timestamp when the row was last updated in the destination table. `_SNOWFLAKE_DELETED` BOOLEAN Indicates whether the row was deleted from the source table. When `true`, the row has been soft-deleted and no longer exists in the source.
## Soft-deleted rows When a row is deleted from the source table, the connector does not physically remove it from the destination table. Instead, the row is marked as deleted by setting the `_SNOWFLAKE_DELETED` metadata column to `true`. This approach allows you to: - Retain historical data for auditing or compliance purposes. - Query deleted records when needed. - Decide when and how to permanently remove data based on your requirements. To query only active (non-deleted) rows, filter on the `_SNOWFLAKE_DELETED` column: ```sql SELECT * FROM my_table WHERE _SNOWFLAKE_DELETED = FALSE; ``` To query deleted rows: ```sql SELECT * FROM my_table WHERE _SNOWFLAKE_DELETED = TRUE; ``` ## Dropped columns When a column is dropped from the source table, the connector does not drop the corresponding column from the destination table. Instead, the column is renamed by appending the `__SNOWFLAKE_DELETED` suffix to preserve historical values. For example, if a column named `EMAIL` is dropped from the source table, it is renamed to `EMAIL__SNOWFLAKE_DELETED` in the destination table. Rows that existed before the column was dropped retain their original values, while rows added after the drop have `NULL` in this column. You can still query historical values from the renamed column: ```sql SELECT EMAIL__SNOWFLAKE_DELETED FROM my_table; ``` ## Renamed columns Due to limitations in CDC (Change Data Capture) mechanisms, the connector cannot distinguish between a column being renamed and a column being dropped followed by a new column being added. As a result, when you rename a column in the source table, the connector treats this as two separate operations: dropping the original column and adding a new column with the new name. For example, if you rename a column from `A` to `B` in the source table, the destination table will contain: - `A__SNOWFLAKE_DELETED`: Contains values from before the rename. Rows added after the rename have `NULL` in this column. - `B`: Contains values from after the rename. Rows that existed before the rename have `NULL` in this column. ### Querying renamed columns To retrieve data from both the original and renamed columns as a single unified column, use a `COALESCE` or `CASE` expression: ```sql SELECT COALESCE(B, A__SNOWFLAKE_DELETED) AS A_RENAMED_TO_B FROM my_table; ``` Alternatively, using a `CASE` expression: ```sql SELECT CASE WHEN B IS NOT NULL THEN B ELSE A__SNOWFLAKE_DELETED END AS A_RENAMED_TO_B FROM my_table; ``` ### Creating a view for renamed columns Rather than manually modifying the destination table, you can create a view that presents the renamed column as a single unified column. This approach is recommended because it preserves the original data and avoids potential issues with ongoing replication. ```sql CREATE VIEW my_table_unified AS SELECT *, COALESCE(B, A__SNOWFLAKE_DELETED) AS A_RENAMED_TO_B FROM my_table; ``` Manually modifying the destination table structure (such as dropping or renaming columns) is not recommended, as it may interfere with ongoing replication and cause data inconsistencies. ## Journal tables During incremental replication, changes from the source database are first written to journal tables before being merged into the destination tables. The connector does not automatically remove data from journal tables, as this data may be useful for auditing, debugging, or reprocessing purposes. Journal tables are created in the same schema as their corresponding destination tables and follow this naming convention: `_JOURNAL__` Where: - `` is the name of the destination table. - `` is the creation timestamp in Unix epoch format (seconds since January 1, 1970), ensuring uniqueness. - `` starts at 1 and increments whenever the destination table schema changes, either due to schema changes in the source table or modifications to column filters. For example, if your destination table is `SALES.ORDERS`, the journal table might be named `SALES.ORDERS_JOURNAL_1705320000_1`. Do not drop journal tables while replication is in progress. Removing an active journal table may cause data loss or replication failures. Only drop journal tables after the corresponding source table has been fully removed from replication. ### Managing journal table storage If you need to manage storage costs by removing old journal data, you can create a Snowflake task that periodically cleans up journal tables for tables that are no longer being replicated. Before implementing journal cleanup, verify that: - The corresponding source tables have been fully removed from replication. - You no longer need the journal data for auditing or processing purposes. For information on creating and managing tasks for automated cleanup, see [Introduction to tasks](/user-guide/tasks-intro). ## Next steps Review [](/user-guide/data-integration/openflow/connectors/mysql/data-mapping) to understand how the connector maps data types to Snowflake data types. Review [](/user-guide/data-integration/openflow/connectors/mysql/setup) to set up the connector. --- title: About Openflow Connector for Oracle source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/about.md section: Loading & Unloading Data --- # About %oracleofc% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). The %oracleofc% is also subject to additional terms of service beyond the standard connector terms of service. For more information, see the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/). - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms) - [](/user-guide/data-integration/openflow/connectors/oracle/data-mapping) - [](/user-guide/data-integration/openflow/connectors/oracle/setup-tasks) This topic describes the basic concepts of %oracleofc%, its workflow, and limitations. ## About the %oracleofc% The %oracleofc% connects an Oracle database instance to Snowflake and replicates data from selected tables in near real-time or on a specified schedule. The connector also creates a log of all data changes, which is available along with the current state of the replicated tables. ## Use cases The connector supports the following use case: - Replicate Oracle database tables into Snowflake for comprehensive, centralized reporting. ## Licensing models and critical constraints The %oracleofc% supports two distinct licensing models. You must select the correct model before installation. Failure to select the correct model might result in deployment failure or unintended financial commitments. For detailed licensing terms, comparison, and configuration instructions, see [Oracle XStream licensing](#label-oracle-xstream-licensing). The connector is technically compatible with Oracle Database Standard Edition (SE/SE2). However, Oracle documentation states that "a license to Oracle Database Enterprise Edition is a prerequisite to license and use Oracle XStream." Before deploying the connector against a Standard Edition database, verify your Oracle license agreement to ensure that your use of XStream is permitted. You're solely responsible for compliance with your Oracle license terms. ### 1. Embedded License (Snowflake-provided) Snowflake provides the Oracle XStream license to you directly for a fee. This model allows you to consume XStream replication without a direct contract with Oracle. For more information, see [Embedded license details](#label-oracle-embedded-license-details) and the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/).
Term Details Billing License and Support & Maintenance (S&M) fees are drawn from your Snowflake Capacity. Commitment Activation initiates a non-cancelable 36-month term (after the 60-day trial). Lifecycle - **Post-term (36+ months)**: After the initial 36-month term, the license fee drops to $0, but the S&M fee continues annually. - **Lock-out risk**: If you opt-out of S&M renewal, the connector will be permanently locked when S&M coverage ends. Unlocking the connector requires purchasing a new Embedded License, which triggers a new 36-month commitment at full price. Management UI All license actions (Start/Cancel Trial, Monitor Usage, Opt-out) are performed by the ORGADMIN in %sf-web-interface% under **Admin** %raa% **Terms** %raa% **Openflow for Oracle**. For step-by-step instructions, see [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms). Restrictions The following customers are ineligible: - Public sector entities. - Customers purchasing Snowflake through the GCP Marketplace. - Customers contracted with Snowflake through a third-party reseller.
### 2. Independent license (Bring Your Own License - BYOL) You provide your own Oracle license that includes XStream entitlements (for example, Oracle GoldenGate license). For more information, see [Independent license (BYOL) details](#label-oracle-byol-license-details).
Term Details Billing No additional licensing fees from Snowflake. Standard storage and compute costs (for example, Openflow Compute) will apply. Compliance You are solely responsible for compliance with your Oracle license. Usage Mandatory for public sector, GCP Marketplace, and reseller customers.
## Choosing an Oracle XStream licensing model The %oracleofc% requires a paid license for Oracle XStream services. Two licensing models are available: - Embedded Oracle License - Independent Oracle License (Bring Your Own License - BYOL) Use the following table to determine the appropriate model for your organization.
Consideration Embedded License Independent License (BYOL) Who is it for? Customers who need to license Oracle XStream technology and want to purchase it directly through their Snowflake agreement. Customers who already have an Oracle GoldenGate license or another Oracle agreement that provides entitlement for XStream. Billing Billed through Snowflake based on the number of processor cores on your source Oracle DB. Involves a non-cancelable 36-month commitment. Also billed for support and maintenance services. Additionally, standard storage and compute costs (for example, Openflow Compute) will apply. No additional licensing or support and maintenance fees for Oracle XStream services from Snowflake. You are responsible for all licensing and compliance directly with Oracle. Standard storage and compute costs (for example, Openflow Compute) will apply. Configuration Requires you to input your Oracle DB's CPU core count and a processor multiplier factor in the connector parameters. Does not require you to provide CPU core information to Snowflake. Trial period Includes a 60-day free trial for up to 16 licensed cores. Billing commences automatically on the 61st day. No trial period is offered through Snowflake. Your use is subject to your existing Oracle agreement.
## Embedded license details By choosing this option, you are procuring the right to use Oracle XStream technology with the connector through Snowflake. Be aware of the following key terms: ### Billing Oracle XStream services are billed monthly and drawn from your Snowflake capacity balance. The fee has two components - a license fee and a Support & Maintenance (S&M) fee. The license fee is calculated based on the number of processor cores in your source Oracle database, multiplied by the Oracle Processor Core Factor. ### Commitment (The "Day 61" Rule) The first 60 days are free for up to 16 licensed cores. However, activating the connector beyond the 60-day trial initiates a non-cancelable 36-month billing term ("Initial Term"). - **Automatic Conversion**: Billing commences automatically on Day 61. To avoid charges, you must cancel the trial in the **Admin** %raa% **Terms** %raa% **Openflow for Oracle** dashboard before Day 60. - **Lock-in**: If your Snowflake agreement is terminated during this Initial Term, the entire remaining balance for the Initial Term becomes due immediately. ### Post-term renewal and penalties After the Initial Term, the license fee becomes $0 but the Support & Maintenance (S&M) fee continues. - **Opt-out Consequence**: You can opt-out of S&M renewal through the dashboard in **Admin** %raa% **Terms** %raa% **Openflow for Oracle**. However, if S&M coverage stops, the connector processors are locked. To resume operations, you must purchase a NEW Embedded License (resetting the 36-month full-price commitment). ### Requirements You are responsible for accurately reporting the number of processor cores and the correct core factor in the connector configuration. This information must be kept current if your source database hardware changes. ### Restrictions This option isn't available for: - Public sector entities (for example, Government and Education entities). - Customers purchasing Snowflake through the GCP Marketplace. - Customers contracted with Snowflake through a third-party reseller (for example, CDW, Optiv). ### Configuration To configure the Embedded License: - Review and accept the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/) terms presented in the UI. - Select the **Embedded License** type. - Enter the CPU core count details for your source Oracle database: **Total Cores** (the total number of physical cores on the source database server) and **Core Factor** (the Oracle processor core factor, for example, 0.5 for Intel processors). Consult the Oracle Processor Core Factor Table for the correct value. ## Independent license (BYOL) details This option is for customers who have already licensed the necessary Oracle technology. ### Requirements You are solely responsible for ensuring that your use of the connector complies with the terms of your existing Oracle license agreement. Snowflake doesn't validate or audit your Oracle entitlements. ### Configuration To configure the Independent License (BYOL): - Review and accept the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/) terms presented in the UI. - Select the **Independent License** type. When configuring the connector, proceed without entering any core count or billing-related information. ## Openflow requirements The following Openflow runtime requirements apply to the %oracleofc%: - The runtime size must be at least **Medium**. For guidance on choosing a size and resizing the runtime later, see [Runtime sizing](#label-oracle-runtime-sizing). - The connector doesn't support multi-node Openflow runtimes. Configure the runtime for this connector with **Min nodes** and **Max nodes** set to `1`. ## Supported Oracle versions and platforms The following Oracle database versions and platforms are supported: - Oracle database versions 12cR1 and later - On-premises servers - Oracle Exadata - OCI VM/Bare Metal - AWS Custom RDS for Oracle - AWS Standard Single-tenant RDS for Oracle ## Limitations The following limitations apply to the %oracleofc%: - AWS Standard Multi-tenant RDS for Oracle isn't supported. - Oracle Autonomous Databases (ATP/ADW) aren't supported. - Oracle SaaS offerings such as Oracle Fusion Cloud Applications and NetSuite aren't supported. - The connector requires Openflow deployment version 0.55.0 or later for BYOC. - The Openflow runtime must be created after the required Openflow deployment version is installed. - Each replicated table must have a primary key, a qualifying unique constraint, a qualifying unique index, or a user-declared logical key. For more information, see [](#label-oracle-replication-key-selection). - Schema changes (such as ALTER TABLE statements that add or drop columns) aren't supported while re-reading the redo logs from the earliest position. If any table's schema was altered between the earliest available SCN and the current position, that table should be removed from replication and re-added with a fresh snapshot instead. - The connector doesn't detect at runtime when you drop or modify the primary key, unique constraint, or unique index that it uses as the replication key. This limitation also applies to renaming a replication-key column. After any such change, restart replication for the affected table: see [](#label-of-oracle-restart-table-replication). - When a logical-key value changes on the source, the connector doesn't soft-delete the old row, which results in duplicate active rows in the destination. For more information, see [](#label-oracle-logical-key-value-change). ## How the connector works The following sections describe how the connector works in different contexts, including replication, schema changes, and data retention. ### How tables are replicated The name of the destination schema is determined by the `Destination Schema Pattern` parameter. For more information, see [](#label-oracle-snowflake-destination-parameters). By default, the destination schema name is the source database name and source schema name joined by an underscore, so the fully qualified name of a destination table is: `._.` The tables are replicated in the following stages: 1. Schema introspection: The connector discovers the columns in the source table, including the column names and types, then validates them against Snowflake's and the connector's [Limitations](#limitations). Validation failures cause this stage to fail, and the cycle completes. After successful completion of this stage, the connector creates an empty destination table in Snowflake. 2. Snapshot load: The connector copies all data available in the source table into the destination table. If this stage fails, then no more data is replicated. After successful completion, the data from the source table is available in the destination table. 3. Incremental load: The connector tracks changes in the source table and applies those changes to the destination table. This process continues until the table is removed from replication. Failure at this stage permanently stops replication of the source table until the issue is resolved. ### Schema changes The connector picks up schema changes on source tables during replication, except for the cases listed under Limitations. When a column is added on the source, the connector adds it to the destination table and starts replicating it on the next poll. Existing rows in the destination table aren't backfilled with values for the new column. When a column is dropped on the source, the connector doesn't drop the corresponding column on the destination. Instead, it renames the column with a suffix (by default, `__SNOWFLAKE_DELETED`) to preserve existing data. If the same column is later re-added on the source, the connector adds it as a new column, so the destination table ends up with both the new column and the soft-deleted one (for example, `A` and `A__SNOWFLAKE_DELETED`). If that same column is then dropped a second time, replication for the affected table fails because the soft-deleted column name is already taken. To recover, restart replication for the affected table: see [Restart table replication](#label-of-oracle-restart-table-replication). The same soft-delete mechanism applies when you change a table's Column Filter JSON. For details, see [Replicate a subset of columns in a table](#label-oracle-connector-replication-subset-of-columns). ### How the connector chooses a replication key The connector uses one column or set of columns from each source table as the replication key. The replication key uniquely identifies a row, drives the MERGE operation that applies CDC changes to the destination, and orders rows during the snapshot load. For each table, the connector resolves the replication key in this order: 1. **User-declared logical key.** If the connector is configured with a Table Key Configuration Service that lists the table, the connector uses those columns as the replication key, overriding any primary key, unique constraint, or unique index on the table. For more information, see [](#label-oracle-logical-key). 2. **Primary key.** The columns of the table's enabled primary key constraint. 3. **Unique constraint or unique index.** If the table has no primary key, the connector looks for a qualifying unique constraint or unique index, as described in [](#label-oracle-replication-key-uk-criteria). 4. **None.** If no qualifying key is found, the connector can't replicate the table. To resolve this, either add a primary key to the table, modify an existing constraint or index so it qualifies (see the following criteria), or declare a logical key on columns that uniquely identify rows. For diagnostic steps, see [](#label-oracle-no-replication-key) in the troubleshooting topic. #### Qualifying unique constraints and unique indexes The connector evaluates a unique constraint as a candidate replication key only when: - The constraint type is `UNIQUE` and `STATUS = ENABLED` in `ALL_CONSTRAINTS`. - The constraint isn't initially deferred (`DEFERRED = IMMEDIATE`). Constraints declared `DEFERRABLE INITIALLY IMMEDIATE` qualify; `DEFERRABLE INITIALLY DEFERRED` doesn't. - All columns covered by the constraint are `NOT NULL`. The connector evaluates a unique index as a candidate replication key only when: - `UNIQUENESS = UNIQUE` and the index isn't `UNUSABLE` in `ALL_INDEXES`. - `INDEX_TYPE = NORMAL` (a standard B-tree index). Bitmap and function-based unique indexes are excluded. - The index isn't the implementation of a primary or unique constraint (constraint-backed indexes are evaluated through the constraint, not separately). - All columns covered by the index are `NOT NULL`. `LOB`, `CLOB`, `NCLOB`, `LONG`, and similar large-object columns can't appear in unique constraints or unique indexes in Oracle, so they never qualify as replication-key columns. #### Tiebreakers When more than one candidate qualifies, the connector picks one deterministically using the following preferences, in order: 1. Among all candidates, a unique constraint is preferred over a unique index. 2. Among candidates of the same type, the candidate with the fewest columns is preferred. 3. Among candidates with the same column count, the candidate with the most numeric columns is preferred. The connector counts the following Oracle types as numeric: `NUMBER`, `INTEGER`, `INT`, `SMALLINT`, `FLOAT`, `DOUBLE`, `BINARY_FLOAT`, `BINARY_DOUBLE`. 4. If a tie remains, the candidate with the lowest constraint or index name in alphabetical order is selected. If you want a specific column set used regardless of the tiebreaker outcome, declare it as a logical key. For more information, see [](#label-oracle-logical-key). #### Replication key examples The following table has no primary key but has a unique constraint on a `NOT NULL` column. The constraint qualifies, and the connector replicates the table using the constraint as the replication key: ```sql CREATE TABLE customers ( email VARCHAR2(255) NOT NULL, name VARCHAR2(100), created_at TIMESTAMP DEFAULT SYSTIMESTAMP, CONSTRAINT uk_customers_email UNIQUE (email) ); ``` The following table has no primary key and no unique constraint, but a unique B-tree index on a `NOT NULL` column. The index qualifies, and the connector replicates the table using the index as the replication key: ```sql CREATE TABLE sessions ( session_id VARCHAR2(64) NOT NULL, user_id NUMBER, created_at TIMESTAMP ); CREATE UNIQUE INDEX idx_sessions_id ON sessions (session_id); ``` The following table doesn't qualify for automatic replication-key selection: the unique-constraint column is nullable. To replicate this table, add a `NOT NULL` constraint, replace the column with one that's `NOT NULL`, or specify a logical key: ```sql CREATE TABLE products ( sku VARCHAR2(50), name VARCHAR2(100), CONSTRAINT uk_products_sku UNIQUE (sku) ); ``` ### Changes to a replication key value When a source update changes the replication key value of an existing row, the connector can't update the destination row in place because the row's identity changes. Instead, it splits the source update into two operations on the destination table: 1. The destination row keyed by the **old** value is soft-deleted: its `_SNOWFLAKE_DELETED` metadata column is set to `TRUE`. 2. A new destination row is inserted keyed by the **new** value, with the updated payload and `_SNOWFLAKE_DELETED` set to `FALSE`. The destination table therefore contains two rows after the change: the original row, soft-deleted, and a new row under the new key value. To query only current rows, filter on `_SNOWFLAKE_DELETED = FALSE`. This behavior applies when the replication key is a primary key or an auto-detected unique constraint or unique index. User-declared logical keys behave differently: see the limitation described in [](#label-oracle-logical-key-value-change). ### Oversized values The connector doesn't replicate individual values larger than 16 MB. By default, processing such a value marks the associated table permanently failed. To change this behavior, modify the **Oversized Value Strategy** destination parameter. # Understanding data retention The connector follows a data retention philosophy where customer data is never automatically deleted. You maintain full ownership and control over your replicated data, and the connector preserves historical information rather than permanently removing it. This approach has the following implications: - Rows deleted from the source table are soft-deleted in the destination table rather than physically removed. - Columns dropped from the source table are renamed in the destination table rather than dropped. - Journal tables are retained indefinitely and are not automatically cleaned up. ## Destination table metadata columns Each destination table includes the following metadata columns that track replication information:
Column name Type Description `_SNOWFLAKE_INSERTED_AT` TIMESTAMP_NTZ The timestamp when the row was originally inserted into the destination table. `_SNOWFLAKE_UPDATED_AT` TIMESTAMP_NTZ The timestamp when the row was last updated in the destination table. `_SNOWFLAKE_DELETED` BOOLEAN Indicates whether the row was deleted from the source table. When `true`, the row has been soft-deleted and no longer exists in the source.
## Soft-deleted rows When a row is deleted from the source table, the connector does not physically remove it from the destination table. Instead, the row is marked as deleted by setting the `_SNOWFLAKE_DELETED` metadata column to `true`. This approach allows you to: - Retain historical data for auditing or compliance purposes. - Query deleted records when needed. - Decide when and how to permanently remove data based on your requirements. To query only active (non-deleted) rows, filter on the `_SNOWFLAKE_DELETED` column: ```sql SELECT * FROM my_table WHERE _SNOWFLAKE_DELETED = FALSE; ``` To query deleted rows: ```sql SELECT * FROM my_table WHERE _SNOWFLAKE_DELETED = TRUE; ``` ## Dropped columns When a column is dropped from the source table, the connector does not drop the corresponding column from the destination table. Instead, the column is renamed by appending the `__SNOWFLAKE_DELETED` suffix to preserve historical values. For example, if a column named `EMAIL` is dropped from the source table, it is renamed to `EMAIL__SNOWFLAKE_DELETED` in the destination table. Rows that existed before the column was dropped retain their original values, while rows added after the drop have `NULL` in this column. You can still query historical values from the renamed column: ```sql SELECT EMAIL__SNOWFLAKE_DELETED FROM my_table; ``` ## Renamed columns Due to limitations in CDC (Change Data Capture) mechanisms, the connector cannot distinguish between a column being renamed and a column being dropped followed by a new column being added. As a result, when you rename a column in the source table, the connector treats this as two separate operations: dropping the original column and adding a new column with the new name. For example, if you rename a column from `A` to `B` in the source table, the destination table will contain: - `A__SNOWFLAKE_DELETED`: Contains values from before the rename. Rows added after the rename have `NULL` in this column. - `B`: Contains values from after the rename. Rows that existed before the rename have `NULL` in this column. ### Querying renamed columns To retrieve data from both the original and renamed columns as a single unified column, use a `COALESCE` or `CASE` expression: ```sql SELECT COALESCE(B, A__SNOWFLAKE_DELETED) AS A_RENAMED_TO_B FROM my_table; ``` Alternatively, using a `CASE` expression: ```sql SELECT CASE WHEN B IS NOT NULL THEN B ELSE A__SNOWFLAKE_DELETED END AS A_RENAMED_TO_B FROM my_table; ``` ### Creating a view for renamed columns Rather than manually modifying the destination table, you can create a view that presents the renamed column as a single unified column. This approach is recommended because it preserves the original data and avoids potential issues with ongoing replication. ```sql CREATE VIEW my_table_unified AS SELECT *, COALESCE(B, A__SNOWFLAKE_DELETED) AS A_RENAMED_TO_B FROM my_table; ``` Manually modifying the destination table structure (such as dropping or renaming columns) is not recommended, as it may interfere with ongoing replication and cause data inconsistencies. ## Journal tables During incremental replication, changes from the source database are first written to journal tables before being merged into the destination tables. The connector does not automatically remove data from journal tables, as this data may be useful for auditing, debugging, or reprocessing purposes. Journal tables are created in the same schema as their corresponding destination tables and follow this naming convention: `_JOURNAL__` Where: - `` is the name of the destination table. - `` is the creation timestamp in Unix epoch format (seconds since January 1, 1970), ensuring uniqueness. - `` starts at 1 and increments whenever the destination table schema changes, either due to schema changes in the source table or modifications to column filters. For example, if your destination table is `SALES.ORDERS`, the journal table might be named `SALES.ORDERS_JOURNAL_1705320000_1`. Do not drop journal tables while replication is in progress. Removing an active journal table may cause data loss or replication failures. Only drop journal tables after the corresponding source table has been fully removed from replication. ### Managing journal table storage If you need to manage storage costs by removing old journal data, you can create a Snowflake task that periodically cleans up journal tables for tables that are no longer being replicated. Before implementing journal cleanup, verify that: - The corresponding source tables have been fully removed from replication. - You no longer need the journal data for auditing or processing purposes. For information on creating and managing tasks for automated cleanup, see [Introduction to tasks](/user-guide/tasks-intro). ## Next steps After reviewing this topic, consider the following next steps: - Review [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms) to enable the connector, accept the Oracle XStream terms, and configure your licensing model. - Review [](/user-guide/data-integration/openflow/connectors/oracle/data-mapping) to understand how the connector maps data types to Snowflake data types. - Review [](/user-guide/data-integration/openflow/connectors/oracle/setup-tasks) to set up the connector. --- title: About Openflow Connector for PostgreSQL source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/postgres/about.md section: Loading & Unloading Data --- # About %postgresql% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/postgres/setup) - [](/user-guide/data-integration/openflow/connectors/postgres/data-mapping) This topic describes the basic concepts of %postgresql%, its workflow, and limitations. ## About the %postgresql% The %postgresql% connects a PostgreSQL database instance to Snowflake and replicates data from selected tables in near real-time or on schedule. The connector also creates a log of all data changes, available along the current state of the replicated tables. ## Use cases Use this connector if you're looking to do the following: - CDC replication of PostgreSQL data with Snowflake for comprehensive, centralized reporting. ## Supported PostgreSQL versions The following are the supported PostgreSQL versions. **Supported PostgreSQL versions**
11 12 13 14 15 16 17 18 [Standard](https://www.postgresql.org/) Yes Yes Yes Yes Yes Yes Yes Yes [AWS RDS](https://docs.aws.amazon.com/AmazonRDS/latest/PostgreSQLReleaseNotes/Welcome.html) Yes Yes Yes Yes Yes Yes Yes Yes [Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraPostgreSQLReleaseNotes/Welcome.html) Yes Yes Yes Yes Yes Yes Yes [GCP Cloud SQL](https://cloud.google.com/sql/docs/postgres/) Yes Yes Yes Yes Yes Yes Yes [Azure Database](https://learn.microsoft.com/en-us/azure/postgresql/) Yes Yes Yes Yes Yes Yes Yes
## Openflow requirements - The runtime size must be at least Medium. For guidance on choosing a size and resizing the runtime later, see [Runtime sizing](#label-postgres-runtime-sizing). - The connector doesn't support multi-node Openflow runtimes. Configure the runtime for this connector with **Min nodes** and **Max nodes** set to `1`. ## Limitations - The connector supports PostgreSQL version 11 or later. - The connector supports only username and password authentication with PostgreSQL. - The connector doesn't replicate tables with data that exceeds [Snowflake's type limitations](/sql-reference/intro-summary-data-types). An exception to this rule is date & time data type columns that contain out-of-range values. For more information, see [](#label-supported-pg-date-time-data-type-value). - The connector requires every replicated table to have a supported identity key configuration: either a primary key with replica identity `DEFAULT`, or a unique index with replica identity `USING INDEX` (see [](#label-postgres-configure-replica-identity-using-index)). Tables with neither configuration can replicate INSERT operations only — UPDATE and DELETE require an identity key. - The connector supports source table schema changes with the exception of changing primary key definitions, changing the precision, or the scale of a numeric column. - When using incremental replication without snapshots, if a row was inserted before incremental replication started, any subsequent update to that row will produce a destination row with missing values in VARCHAR, VARIANT, BINARY, and ARRAY columns. Limitations affecting certain table columns can be bypassed by excluding these specific columns from replication. ## Workflow 1. A **Database administrator** configures PostgreSQL replication settings, creates a publication, and credentials for the connector. Optionally, they deliver the SSL certificate. 2. A **Snowflake account administrator** performs the following tasks: 1. Creates a service user for the connector, a warehouse for the connector, and a destination database to replicate into. 2. Installs the connector. 3. Specifies the required parameters for the flow template. 4. Runs the flow. The connector performs the following tasks when run in Openflow: 1. Creates a schema for journal tables. 2. Creates the schemas and destination tables matching the source tables configured for replication. 3. Starts replication following the table replication lifecycle. ## How the connector works The following sections describe how the connector works in various scenarios, including replication, changes in schema, and data retention. ### Data replication The name of the destination schema is determined by the `Destination Schema Pattern` parameter. For more information, see [](#label-of-postgres-destination-parameters). By default, the destination schema name matches the source schema name, so the fully qualified name of a destination table is: `..` ### How tables are replicated The tables are replicated in the following stages: 1. Schema introspection: The connector discovers the columns in the source table, their names, types, then validates them against Snowflake's and the connector's limitations. Validation failures cause this stage to fail, and the cycle completes. After successful completion of Schema Introspection, the connector creates an empty destination table. 2. Snapshot load: The connector copies all data available in the source table into the destination table. Failure of this stage finishes the cycle, and no more data is replicated. After successful completion, the whole set of data from the source table is available in the destination table. 3. Incremental load: The connector keeps tracking changes in the source table, and copying them into the destination table. This continues until the table is removed from replication. Failure at this stage permanently stops replication of the source table, until the issue is removed. This connector can be configured to immediately start replicating incremental changes for newly added tables, bypassing the snapshot load phase. This option is often useful when reinstalling the connector in an account where previously replicated data exists and you want to continue replication without having to re-snapshot tables. For details on the bypassing snapshot load and using the incremental load process, see [Incremental replication](/user-guide/data-integration/openflow/connectors/postgres/incremental-replication). Interim failures, such as connection errors, do not prevent tables from being replicated. Permanent failures, such as unsupported data types, do prevent tables from being replicated. If a permanent failure prevents a table from being replicated, remove the table from the list of replicated tables. After you address the problem that caused the failure, you can add the table back to the list of replicated tables. ### Schema changes The connector picks up schema changes on source tables during replication, except for the cases listed under Limitations. When a column is added on the source, the connector adds it to the destination table and starts replicating it on the next poll. Existing rows in the destination table aren't backfilled with values for the new column. When a column is dropped on the source, the connector doesn't drop the corresponding column on the destination. Instead, it renames the column with a suffix (by default, `__SNOWFLAKE_DELETED`) to preserve existing data. If the same column is later re-added on the source, the connector adds it as a new column, so the destination table ends up with both the new column and the soft-deleted one (for example, `A` and `A__SNOWFLAKE_DELETED`). If that same column is then dropped a second time, replication for the affected table fails because the soft-deleted column name is already taken. To recover, restart replication for the affected table: see [Restart table replication](#label-of-postgres-restart-table-replication). The same soft-delete mechanism applies when you change a table's Column Filter JSON. For details, see [Replicate a subset of columns in a table](#label-postgres-connector-replication-subset-of-columns). ### Oversized values The connector doesn't replicate individual values larger than 16 MB. By default, processing such a value marks the associated table permanently failed. To change this behavior, modify the **Oversized Value Strategy** destination parameter. ### TOASTed value support The connector supports replicating tables with [TOAST values](https://www.postgresql.org/docs/current/storage-toast.html) for columns of types: `array`, `bytea`, `json`, `jsonb`, `text`, `varchar`, `xml`. Whenever the connector encounters a TOASTed value in the CDC stream, it substitutes a default placeholder of `__previous_value_unchanged`, formatted for the given column type, and stores it in the journal table. The `MERGE` query then accounts for placeholder values, so that the destination table always contains the last non-TOASTed value. ### Out of range value support The connector supports replicating tables with columns of types `date`, `timestamp`, and `timestamptz` that contain out-of-range values. If the connector encounters an out-of-range value in the CDC stream, it substitutes a default placeholder based on the type of the column. **Placeholder values for out-of-range values**
Column type Placeholder value `date` `-9999-01-01` through `9999-12-31`. `timestamp` `0001-01-01 00:00:00` through `9999-12-31 23:59:59.999999999`. `timestamptz` `0001-01-01 00:00:00+00` through `9999-12-31 23:59:59.999999999+00`.
`-Infinity` and `Infinity` values are also replaced with the respective placeholders for all three types. # Understanding data retention The connector follows a data retention philosophy where customer data is never automatically deleted. You maintain full ownership and control over your replicated data, and the connector preserves historical information rather than permanently removing it. This approach has the following implications: - Rows deleted from the source table are soft-deleted in the destination table rather than physically removed. - Columns dropped from the source table are renamed in the destination table rather than dropped. - Journal tables are retained indefinitely and are not automatically cleaned up. ## Destination table metadata columns Each destination table includes the following metadata columns that track replication information:
Column name Type Description `_SNOWFLAKE_INSERTED_AT` TIMESTAMP_NTZ The timestamp when the row was originally inserted into the destination table. `_SNOWFLAKE_UPDATED_AT` TIMESTAMP_NTZ The timestamp when the row was last updated in the destination table. `_SNOWFLAKE_DELETED` BOOLEAN Indicates whether the row was deleted from the source table. When `true`, the row has been soft-deleted and no longer exists in the source.
## Soft-deleted rows When a row is deleted from the source table, the connector does not physically remove it from the destination table. Instead, the row is marked as deleted by setting the `_SNOWFLAKE_DELETED` metadata column to `true`. This approach allows you to: - Retain historical data for auditing or compliance purposes. - Query deleted records when needed. - Decide when and how to permanently remove data based on your requirements. To query only active (non-deleted) rows, filter on the `_SNOWFLAKE_DELETED` column: ```sql SELECT * FROM my_table WHERE _SNOWFLAKE_DELETED = FALSE; ``` To query deleted rows: ```sql SELECT * FROM my_table WHERE _SNOWFLAKE_DELETED = TRUE; ``` ## Dropped columns When a column is dropped from the source table, the connector does not drop the corresponding column from the destination table. Instead, the column is renamed by appending the `__SNOWFLAKE_DELETED` suffix to preserve historical values. For example, if a column named `EMAIL` is dropped from the source table, it is renamed to `EMAIL__SNOWFLAKE_DELETED` in the destination table. Rows that existed before the column was dropped retain their original values, while rows added after the drop have `NULL` in this column. You can still query historical values from the renamed column: ```sql SELECT EMAIL__SNOWFLAKE_DELETED FROM my_table; ``` ## Renamed columns Due to limitations in CDC (Change Data Capture) mechanisms, the connector cannot distinguish between a column being renamed and a column being dropped followed by a new column being added. As a result, when you rename a column in the source table, the connector treats this as two separate operations: dropping the original column and adding a new column with the new name. For example, if you rename a column from `A` to `B` in the source table, the destination table will contain: - `A__SNOWFLAKE_DELETED`: Contains values from before the rename. Rows added after the rename have `NULL` in this column. - `B`: Contains values from after the rename. Rows that existed before the rename have `NULL` in this column. ### Querying renamed columns To retrieve data from both the original and renamed columns as a single unified column, use a `COALESCE` or `CASE` expression: ```sql SELECT COALESCE(B, A__SNOWFLAKE_DELETED) AS A_RENAMED_TO_B FROM my_table; ``` Alternatively, using a `CASE` expression: ```sql SELECT CASE WHEN B IS NOT NULL THEN B ELSE A__SNOWFLAKE_DELETED END AS A_RENAMED_TO_B FROM my_table; ``` ### Creating a view for renamed columns Rather than manually modifying the destination table, you can create a view that presents the renamed column as a single unified column. This approach is recommended because it preserves the original data and avoids potential issues with ongoing replication. ```sql CREATE VIEW my_table_unified AS SELECT *, COALESCE(B, A__SNOWFLAKE_DELETED) AS A_RENAMED_TO_B FROM my_table; ``` Manually modifying the destination table structure (such as dropping or renaming columns) is not recommended, as it may interfere with ongoing replication and cause data inconsistencies. ## Journal tables During incremental replication, changes from the source database are first written to journal tables before being merged into the destination tables. The connector does not automatically remove data from journal tables, as this data may be useful for auditing, debugging, or reprocessing purposes. Journal tables are created in the same schema as their corresponding destination tables and follow this naming convention: `_JOURNAL__` Where: - `` is the name of the destination table. - `` is the creation timestamp in Unix epoch format (seconds since January 1, 1970), ensuring uniqueness. - `` starts at 1 and increments whenever the destination table schema changes, either due to schema changes in the source table or modifications to column filters. For example, if your destination table is `SALES.ORDERS`, the journal table might be named `SALES.ORDERS_JOURNAL_1705320000_1`. Do not drop journal tables while replication is in progress. Removing an active journal table may cause data loss or replication failures. Only drop journal tables after the corresponding source table has been fully removed from replication. ### Managing journal table storage If you need to manage storage costs by removing old journal data, you can create a Snowflake task that periodically cleans up journal tables for tables that are no longer being replicated. Before implementing journal cleanup, verify that: - The corresponding source tables have been fully removed from replication. - You no longer need the journal data for auditing or processing purposes. For information on creating and managing tasks for automated cleanup, see [Introduction to tasks](/user-guide/tasks-intro). ## Next steps Review [](/user-guide/data-integration/openflow/connectors/postgres/data-mapping) to understand how the connector maps data types to Snowflake data types. Review [](/user-guide/data-integration/openflow/connectors/postgres/setup) to set up the connector. --- title: About Openflow Connector for SharePoint source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/sharepoint/about.md section: Loading & Unloading Data --- # About %sharepointof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/sharepoint/setup) This topic describes the basic concepts of Openflow Connector for SharePoint, its use cases and limitations. The Openflow Connector for SharePoint connects a Microsoft 365 SharePoint site and Snowflake to ingest files and user permissions and keeps them up to date. Openflow Connector for SharePoint also supports the Cortex Search service and can make ingested files ready for conversational analysis for use in AI Assistants using SQL, Python or REST APIs. ## Variants of the %sharepointof% The %sharepointof% contains four variants which allow you to, optionally, index data into Snowflake Cortex Search and include document metadata (ACLs).
Variant Description Microsoft SharePoint (Cortex Search, document ACLs) Indexes files and their permissions (ACLs) into Snowflake Cortex Search. Microsoft SharePoint (Cortex Search, no document ACLs) Indexes files without their permissions (ACLs) into Snowflake Cortex Search. Microsoft SharePoint (Simple Ingest, document ACLs) Ingests files and their permissions (ACLs) into a Snowflake stage. Microsoft SharePoint (Simple Ingest, no document ACLs) Ingests files without their permissions (ACLs) into a Snowflake stage.
These variants appear as separate connectors in Marketplace. When installing the connector, choose the variant that meets your requirements. ## Rate limiting restrictions [SharePoint API limits](https://learn.microsoft.com/en-us/sharepoint/dev/embedded/development/limits-calling#api-rate-limits) govern how many requests can be made within a given time frame. If your flow exceeds the allowed quota, syncs may slow down or fail with an error. This mostly occurs when your access token makes higher number of requests than the source typically allows. In such cases, Snowflake recommends applying for higher access quota (wherever applicable) or reducing the sync frequency. ### Limitations - [](#label-parse-document-requirements). - [](#label-cortex-search-overview-limitations). - Changes caused by moving or renaming folders aren't captured during incremental ingestion. - The connector ingests only the supported file types and ignores others. ### Next steps [](/user-guide/data-integration/openflow/connectors/sharepoint/setup) --- title: About Openflow Connector for Slack source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/slack/about.md section: Loading & Unloading Data --- # About %slackof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/slack/setup) This topic describes the basic concepts of Openflow Connector for Slack, steps to set it up, and limitations. The Openflow Connector for Slack connects a Slack workspace to Snowflake in order to ingest Slack messages, reactions, file attachments, and channel memberships (ACLs). The connector also supports the Cortex Search service and can make ingested Slack content ready for conversational analysis for use in AI Assistants using SQL, Python or REST APIs. Use this connector if you're looking to do the following: - Pull Slack messages and metadata into Snowflake for searchable, organization-wide insights - Ingest Slack content and make it ready for chat in your AI assistants with Snowflake Cortex ## Limitations - The connector captures historical file attachments, reactions and messages, but only after the Slack App is added to a conversation or channel. - If a user edits an existing message or deletes a message, the changes are captured in Snowflake at the next refresh interval. ## Workflow 1. **Slack Admin** creates a Slack App as described later, then installs the App in the channels or conversations they wish to ingest messages from. The Bot token and App token from the Slack App need to be provided to the Snowflake Account Admin 2. **Snowflake account admin**: 1. Installs the connector. 2. Specifies the required parameters for the flow template, for example, Bot token, App token, and database and schema names. 3. Runs flow. The following happens when the flow is run in Openflow: 1. The flow automatically creates a database, schema and the necessary tables and external access integration in Snowflake on behalf of the admin. It also creates a Cortex Search and wires up chunks and ACLs and metadata. By default, these are only accessible to the Snowflake account admin role 2. Fetches specified conversations, metadata, ACLs from the Slack channel(s). An ACL is defined as the snapshot list of user IDs and emails that are members of each channel being ingested. 3. Chunks ingested conversation messages 4. Puts chunked conversation messages along with metadata and ACLs into Snowflake tables 3. **IT Developer** in customer's organization creates bespoke Chat App and passes user identity which is the user's email registered on Slack, as a filter when invoking Cortex Search REST API with the end user's question 4. **End users** of the Chat App in the customer's organization see responses from Cortex Search restricted to chunks from conversations they have access to in the Slack channel based on ACLs, along with a link to the source conversation. ### Considerations - By default, any user with the Snowflake account admin role will be able to “see” the raw ingested messages and conversations and tables created by the flow template - The user with the Snowflake account admin role decides who can access the internal stage and tables through Snowflake roles. - The user with the Snowflake account admin role decides who can query the Cortex Search service through Snowflake roles. ### Next steps [](/user-guide/data-integration/openflow/connectors/slack/setup) --- title: About Openflow Connector for Snowflake to Kafka source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/snowflake-to-kafka/about.md section: Loading & Unloading Data --- # About %sf-kafka% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/snowflake-to-kafka/setup) This topic describes the basic concepts of %sf-kafka% and limitations. The connector consumes a Snowflake stream and sends consumed CDC records to a Kafka topic. A Snowflake Stream object records data manipulation language (DML) changes made to tables, including inserts, updates, and deletes, as well as metadata about each change, so that actions can be taken using the changed data. This process is referred to as change data capture (CDC). Use this connector if you're looking to do the following: - Replicate Snowflake tables to Apache Kafka using CDC for real-time insights distribution and event-driven architectures ## Workflow Depending on the configuration of the Kafka broker, which is going to be receiving the CDC data, the workflow may differ slightly. 1. A Snowflake account administrator performs the following tasks: 1. Creates or identifies the Snowflake stream that is going to be the source of the CDC data. 2. Designates a warehouse to be used by the connector. 3. Configures or identifies the Snowflake user used by the connector and a role for this user. The user must have appropriate permissions to the source Snowflake stream. At a minimum, the user needs USAGE privilege on the database and schema containing the Snowflake stream, and SELECT privilege on the stream and the stream's underlying table or view object. 2. A Kafka administrator performs the following tasks. 1. Creates or identifies a Kafka broker and topic that is going to be the destination for the CDC captured from the Snowflake stream. 2. Sets up the authentication mechanism for the Kafka broker, which is going to be used by the connector. 3. A data engineer performs the following tasks: 1. Installs and configures the connector. 2. Provides Snowflake credentials and configuration. 3. Provides Kafka credentials and configuration. 4. Provides connector parameters. ## Stream metadata columns Stream metadata columns `METADATA$ROW_ID`, `METADATA$ISUPDATE`, and `METADATA$ACTION` are sent to the Kafka topic. The names of these columns are modified before they are sent to Kafka. In the JSON message payload that is sent, they become `METADATA_ROW_ID`, `METADATA_ISUPDATE`, and `METADATA_ACTION`. For more information, see [](#label-stream-metadata-columns). ## Limitations - A single connector can only capture CDCs from one Snowflake stream. - Messages are sent without a schema. - Schema evolution is not supported. ## Next steps [](/user-guide/data-integration/openflow/connectors/snowflake-to-kafka/setup) --- title: About Openflow Connector for SQL Server source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/sql-server/about.md section: Loading & Unloading Data --- # About %sqlserver% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/sql-server/setup) - [](/user-guide/data-integration/openflow/connectors/sql-server/data-mapping) This topic describes the basic concepts, workflow, and limitations of the %sqlserver%. ## About the %sqlserver% The %sqlserver% connects a SQL Server database instance to Snowflake and replicates data from selected tables in near real-time or on schedule. The connector uses SQL Server [Change Tracking](https://learn.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-tracking-sql-server) to detect and apply changes to replicated tables. Change data is recorded in journal tables alongside the current state of the replicated tables. ## Use cases Use this connector if you're looking to do the following: - Synchronization of SQL Server data with Snowflake for comprehensive, centralized reporting. ## Supported SQL Server versions The following SQL Server database versions and platforms are supported: - [Microsoft SQL Server 2022](https://www.microsoft.com/sql-server) - Microsoft SQL Server 2019 - Microsoft SQL Server 2017 - Microsoft SQL Server 2016 - [Azure SQL Database](https://learn.microsoft.com/azure/azure-sql/database/?view=azuresql) - [Azure SQL Managed Instance](https://learn.microsoft.com/azure/azure-sql/managed-instance/?view=azuresql) - [AWS RDS for SQL Server](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_SQLServer.html) - Google Cloud SQL for SQL Server The connector relies on SQL Server Change Tracking, which is available starting with SQL Server 2008. Earlier versions don't support this feature and are incompatible with the connector. ## Openflow requirements - The runtime size must be at least Medium. For guidance on choosing a size and resizing the runtime later, see [Runtime sizing](#label-sql-server-runtime-sizing). - The connector doesn't support multi-node Openflow runtimes. Configure the runtime for this connector with **Min nodes** and **Max nodes** set to `1`. ## Limitations - You can't run multiple connectors of the same type in a single runtime instance. - The connector supports only username and password authentication with SQL Server. - The connector only replicates database tables that contain primary keys. - The connector doesn't update existing records in the Snowflake database when a new NOT NULL column with a default value is added to one of the source databases. - The connector doesn't update existing records in the Snowflake database when a new column is added to the included list in the Column Filter JSON. - The connector supports source table schema changes, except for changing primary key definitions, changing the precision, or the scale of a numeric column. - The connector doesn't support the truncate table operation. You can bypass limitations affecting certain table columns by excluding these specific columns from replication. ## Workflow The following workflow outlines the steps to set up and run the %sqlserver%: 1. A SQL Server database administrator performs the following tasks: 1. Configures SQL Server replication settings and enables change tracking on the databases and tables being replicated. 2. Creates credentials for the connector. 3. (Optional) Provides the SSL certificate to connect to the SQL Server instance over SSL. 2. A Snowflake account administrator performs the following tasks: 1. Creates a service user for the connector, a destination database to store replicated data, and a warehouse for the connector. 2. Installs the connector. 3. Specifies the required parameters for the connector flow definition. 4. Runs the flow. The connector does the following when run in Openflow: 1. Creates the schemas and destination tables matching the source tables configured for replication. 2. Begins replication according to the table replication lifecycle. For more information, see [](#label-of-sql-server-how-tables-are-replicated). ## How the connector works The following sections describe how the connector works in various scenarios, including replication, changes in schema, and data retention. ### Change tracking behavior The connector uses SQL Server [Change Tracking](https://learn.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-tracking-sql-server) (CT) to detect changes in the source tables. Change Tracking reports the net effect of changes between polling intervals. If a row is updated multiple times between two consecutive polls, the connector sees only the most recent version of that row. Intermediate states aren't preserved. This makes the connector suitable for **data synchronization** use cases, where the goal is to keep the destination table in sync with the source. It isn't suitable for **audit or history** use cases where every intermediate change to a row must be captured. ### Data replication The connector supports replicating tables from multiple SQL Server databases in a single SQL Server instance. The connector creates replicated tables from different databases in separate schemas in the destination Snowflake database. Reference replicated tables by combining the source database name, the source schema name, and the table name in the following format: `..` The name of the destination schema is determined by the `Destination Schema Pattern` parameter. For more information, see [](#label-of-sqlserver-destination-parameters). By default, the destination schema name is the source database name and source schema name joined by an underscore, so the fully qualified name of a destination table is: `._.` ### How tables are replicated The connector replicates tables in the following stages: 1. Schema introspection: The connector discovers the columns in the source table, including the column names and types, then validates them against Snowflake's and the connector's limitations. Validation failures cause this stage to fail, and the cycle completes. After successful completion of this stage, the connector creates an empty destination table. 2. Snapshot load: The connector copies all data available in the source table into the destination table. If this stage fails, the connector stops replicating data. After successful completion, the data from the source table is available in the destination table. 3. Incremental load: The connector tracks changes in the source table and applies those changes to the destination table. This process continues until the table is removed from replication. Failure at this stage permanently stops replication of the source table, until the issue is resolved. For information on bypassing snapshot load and using the incremental load process, see [Incremental replication](/user-guide/data-integration/openflow/connectors/sql-server/incremental-replication). ### Schema changes The connector picks up schema changes on source tables during replication, except for the cases listed under Limitations. When a column is added on the source, the connector adds it to the destination table and starts replicating it on the next poll. Existing rows in the destination table aren't backfilled with values for the new column. When a column is dropped on the source, the connector doesn't drop the corresponding column on the destination. Instead, it renames the column with a suffix (by default, `__SNOWFLAKE_DELETED`) to preserve existing data. If the same column is later re-added on the source, the connector adds it as a new column, so the destination table ends up with both the new column and the soft-deleted one (for example, `A` and `A__SNOWFLAKE_DELETED`). If that same column is then dropped a second time, replication for the affected table fails because the soft-deleted column name is already taken. To recover, restart replication for the affected table: see [Restart table replication](#label-of-sql-server-restart-table-replication). The same soft-delete mechanism applies when you change a table's Column Filter JSON. For details, see [Replicate a subset of columns in a table](#label-sqlserver-connector-replication-subset-of-columns). ### Oversized values The connector doesn't replicate individual values larger than 16 MB. By default, processing such a value marks the associated table permanently failed. To change this behavior, modify the **Oversized Value Strategy** destination parameter. ### Source database locking behavior During snapshot and incremental replication, the connector reads from the source database tables to retrieve row data and track changes. Under SQL Server's default READ COMMITTED isolation level, these read operations acquire shared locks on the source tables. If other database clients hold conflicting locks on the same tables at the same time, this can lead to deadlocks, where SQL Server terminates one of the conflicting sessions. To avoid deadlocks between the connector and other database clients, enable [Read Committed Snapshot Isolation (RCSI)](https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/snapshot-isolation-in-sql-server) on the source database: ```sql ALTER DATABASE SET READ_COMMITTED_SNAPSHOT ON; ``` With RCSI enabled, read operations use row versioning instead of shared locks, which eliminates lock contention between the connector and concurrent write transactions on the source database. Enabling RCSI changes the semantics of the default READ COMMITTED isolation level for all connections to the database, not just the connector. Applications that rely on the default lock-based READ COMMITTED behavior (for example, expecting readers to block on concurrent uncommitted writes) can see different results after the change. Test the impact in a non-production environment before enabling RCSI in production. # Understanding data retention The connector follows a data retention philosophy where customer data is never automatically deleted. You maintain full ownership and control over your replicated data, and the connector preserves historical information rather than permanently removing it. This approach has the following implications: - Rows deleted from the source table are soft-deleted in the destination table rather than physically removed. - Columns dropped from the source table are renamed in the destination table rather than dropped. - Journal tables are retained indefinitely and are not automatically cleaned up. ## Destination table metadata columns Each destination table includes the following metadata columns that track replication information:
Column name Type Description `_SNOWFLAKE_INSERTED_AT` TIMESTAMP_NTZ The timestamp when the row was originally inserted into the destination table. `_SNOWFLAKE_UPDATED_AT` TIMESTAMP_NTZ The timestamp when the row was last updated in the destination table. `_SNOWFLAKE_DELETED` BOOLEAN Indicates whether the row was deleted from the source table. When `true`, the row has been soft-deleted and no longer exists in the source.
## Soft-deleted rows When a row is deleted from the source table, the connector does not physically remove it from the destination table. Instead, the row is marked as deleted by setting the `_SNOWFLAKE_DELETED` metadata column to `true`. This approach allows you to: - Retain historical data for auditing or compliance purposes. - Query deleted records when needed. - Decide when and how to permanently remove data based on your requirements. To query only active (non-deleted) rows, filter on the `_SNOWFLAKE_DELETED` column: ```sql SELECT * FROM my_table WHERE _SNOWFLAKE_DELETED = FALSE; ``` To query deleted rows: ```sql SELECT * FROM my_table WHERE _SNOWFLAKE_DELETED = TRUE; ``` ## Dropped columns When a column is dropped from the source table, the connector does not drop the corresponding column from the destination table. Instead, the column is renamed by appending the `__SNOWFLAKE_DELETED` suffix to preserve historical values. For example, if a column named `EMAIL` is dropped from the source table, it is renamed to `EMAIL__SNOWFLAKE_DELETED` in the destination table. Rows that existed before the column was dropped retain their original values, while rows added after the drop have `NULL` in this column. You can still query historical values from the renamed column: ```sql SELECT EMAIL__SNOWFLAKE_DELETED FROM my_table; ``` ## Renamed columns Due to limitations in CDC (Change Data Capture) mechanisms, the connector cannot distinguish between a column being renamed and a column being dropped followed by a new column being added. As a result, when you rename a column in the source table, the connector treats this as two separate operations: dropping the original column and adding a new column with the new name. For example, if you rename a column from `A` to `B` in the source table, the destination table will contain: - `A__SNOWFLAKE_DELETED`: Contains values from before the rename. Rows added after the rename have `NULL` in this column. - `B`: Contains values from after the rename. Rows that existed before the rename have `NULL` in this column. ### Querying renamed columns To retrieve data from both the original and renamed columns as a single unified column, use a `COALESCE` or `CASE` expression: ```sql SELECT COALESCE(B, A__SNOWFLAKE_DELETED) AS A_RENAMED_TO_B FROM my_table; ``` Alternatively, using a `CASE` expression: ```sql SELECT CASE WHEN B IS NOT NULL THEN B ELSE A__SNOWFLAKE_DELETED END AS A_RENAMED_TO_B FROM my_table; ``` ### Creating a view for renamed columns Rather than manually modifying the destination table, you can create a view that presents the renamed column as a single unified column. This approach is recommended because it preserves the original data and avoids potential issues with ongoing replication. ```sql CREATE VIEW my_table_unified AS SELECT *, COALESCE(B, A__SNOWFLAKE_DELETED) AS A_RENAMED_TO_B FROM my_table; ``` Manually modifying the destination table structure (such as dropping or renaming columns) is not recommended, as it may interfere with ongoing replication and cause data inconsistencies. ## Journal tables During incremental replication, changes from the source database are first written to journal tables before being merged into the destination tables. The connector does not automatically remove data from journal tables, as this data may be useful for auditing, debugging, or reprocessing purposes. Journal tables are created in the same schema as their corresponding destination tables and follow this naming convention: `_JOURNAL__` Where: - `` is the name of the destination table. - `` is the creation timestamp in Unix epoch format (seconds since January 1, 1970), ensuring uniqueness. - `` starts at 1 and increments whenever the destination table schema changes, either due to schema changes in the source table or modifications to column filters. For example, if your destination table is `SALES.ORDERS`, the journal table might be named `SALES.ORDERS_JOURNAL_1705320000_1`. Do not drop journal tables while replication is in progress. Removing an active journal table may cause data loss or replication failures. Only drop journal tables after the corresponding source table has been fully removed from replication. ### Managing journal table storage If you need to manage storage costs by removing old journal data, you can create a Snowflake task that periodically cleans up journal tables for tables that are no longer being replicated. Before implementing journal cleanup, verify that: - The corresponding source tables have been fully removed from replication. - You no longer need the journal data for auditing or processing purposes. For information on creating and managing tasks for automated cleanup, see [Introduction to tasks](/user-guide/tasks-intro). ## Next steps Review [](/user-guide/data-integration/openflow/connectors/sql-server/data-mapping) to understand how the connector maps data types to Snowflake data types. Review [](/user-guide/data-integration/openflow/connectors/sql-server/setup) to set up the connector. --- title: About Openflow Connector for Workday source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/workday/about.md section: Loading & Unloading Data --- # About %workdayof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/workday/setup) The Openflow Connector for Workday allows to ingest Workday reports into Snowflake. It is built as the Apache NiFi flow and uses the RaaS (Report-as-a-Service) API to fetch data from Workday. The connector persists data in a dedicated table in the database and schema provided in the configuration. Use this connector if you're looking to do the following: - Get Workday data into Snowflake using Report-as-a-Service (RaaS) streams for enterprise-level analytics and planning ## Limitations - Only advanced Workday reports are supported. - Only reports in the JSON format are supported. - All limitations of the RaaS API apply. - The schema discovery is not supported - schema of a destination table is inferred based on data fetched from Workday. - The incremental load is not supported - the connector uses the truncate & load ingestion strategy. ## Next steps [](/user-guide/data-integration/openflow/connectors/workday/setup) --- title: About Openflow: BYOC deployments source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/about-byoc.md section: Loading & Unloading Data --- # About Openflow: BYOC deployments This feature is not available in the People's Republic of China. Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/setup-openflow-byoc) - [](/user-guide/data-integration/openflow/cost-byoc) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/processors/index) - [](/user-guide/data-integration/openflow/controllers/index) Openflow BYOC *is* Openflow and contains all the benefits of Openflow, but within your existing cloud. ## Typical BYOC workflow
User persona Task AWS cloud engineer/administrator Creates a set of deployments in their AWS cloud account. The Openflow UI is used to manage deployments and runtime creation and maintenance. The Openflow UI allows users to create, upgrade, and delete runtimes in all deployments. Snowflake sign-ins are used to authenticate to Openflow, and roles and privileges are used to control access to Openflow deployments and runtimes. Data engineer (pipeline author, responsible for data ingestion) Uses the runtime canvas to build completely new flows or to configure deployed connectors. Creates a completely new flow or uses an existing connector as-is or as a starting point to customize. Populates data in the bronze layer within your Snowflake account (or other target system). Connectors are a simple way to solve for a specific integration use case, and less technical users can deploy them without necessarily needing a data engineer. Data engineer (pipeline operator) Configures the flow parameters and runs the flow. Data engineer (responsible for transformation to silver and gold layers) Responsible for transforming data from the bronze layer that was populated by the pipeline to silver and gold layers for analytics. Business user Makes use of gold layer objects for analytics.
## Limitations - As described in the [Snowflake Openflow BYOC terms](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-terms/), securing Openflow BYOC is a shared responsibility model. - Openflow authorization uses roles and their associated privileges that are directly granted to the user. Currently, Openflow does not support authorization when the role is attached to another role within the user's role hierarchy. ## Next steps [](/user-guide/data-integration/openflow/setup-openflow-byoc) --- title: About Snowflake and SAP® Zero-Copy Integration source: https://docs.snowflake.com/en/user-guide/data-integration/zero-copy/about-sap-snowflake.md section: Loading & Unloading Data --- # About %snowsapzerocopyintegration% - [](/user-guide/data-integration/zero-copy/sap-sql/setup-tasks) Snowflake and SAP® have partnered to offer customers a seamless zero-copy integration between the two platforms. The integration leverages SAP® Business Data Cloud that enables customers to harmonize SAP® and non-SAP® data at scale in Snowflake, while optimizing total cost of ownership across workloads. Leveraging zero copy data access, data and AI teams can work with semantically rich SAP® Data Products in real time without added cost and complexity of ETL pipelines, and allows them to build AI and machine learning applications fueled by trusted SAP Data Products and grounded in the context of all their mission-critical data, ensuring accurate, reliable, and trustworthy AI outcomes. ## Two Ways to Integrate Snowflake and SAP® The integration delivers two distinct offerings, providing customers choice. Both leverage SAP® Business Data Cloud to enable zero-copy data sharing between SAP® Business Data Cloud and Snowflake. ### SAP® Snowflake Designed for new Snowflake customers, SAP® Snowflake makes Snowflake available in SAP® Business Data Cloud as a certified SAP® Solution Extension. From advanced analytics and ML to data engineering, applications, and marketplace it puts the Snowflake platform directly in the hands of SAP® users. For more information, see [SAP Snowflake](https://help.sap.com/docs/business-data-cloud/introducing-sap-snowflake/introducing-sap-snowflake) in the SAP® documentation. ![SAP® Snowflake architecture](/static/images/openflow/sap-snowflake.png) #### SAP® Business Data Cloud Connect for Snowflake Designed for existing Snowflake customers, SAP® Business Data Cloud (BDC) Connect for Snowflake enables customers to share Data Products from SAP® BDC with their existing Snowflake accounts. This gives Snowflake users real-time access to semantically rich SAP® Data Products without duplication of data. ![SAP® BDC Connect for Snowflake architecture](/static/images/openflow/sap-bdc.png) For more information and set up instructions for either of these offerings, see [](/user-guide/data-integration/zero-copy/sap-sql/setup-tasks). --- title: About the Openflow Connector for Google BigQuery source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/google-big-query/about.md section: Loading & Unloading Data --- # About the %bigqueryof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/google-big-query/setup) - [](/user-guide/data-integration/openflow/connectors/google-big-query/use) The %bigqueryof% connects a Google BigQuery project to Snowflake and replicates data from selected datasets, tables, and views on a schedule. The connector performs an initial full load for each table, followed by incremental updates using BigQuery's native change-tracking functionality. Views are replicated using a truncate and load strategy. ## Use cases The connector supports the following use cases: - **Replication to Snowflake:** Continuously mirror datasets from BigQuery into Snowflake for downstream analytics and modeling. Incremental changes arrive on a schedule with a 10 minute delay window. - **Selective replication:** Define which regions, datasets, tables, and views to include using names or regex filters for broad coverage with control. - **Migration and change capture:** Perform a one-time snapshot load for migrations, then run incremental syncs using BigQuery's change history to keep tables in sync. - **View replication:** Replicate standard and materialized BigQuery views to Snowflake using a truncate and load strategy on a configurable schedule. ## The table replication lifecycle A table's replication cycle begins with schema discovery and an initial snapshot load of the data. The cycle transitions to incremental synchronization after data has been ingested into Snowflake. 1. **Schema Introspection:** The connector discovers the source table's schema, validates its data types, and creates a corresponding destination schema and table in Snowflake. 2. **Snapshot Load:** After creating schema and table, the connector performs a full copy of all existing data from the BigQuery table to Snowflake. This process runs sequentially for each table in the configuration. 3. **Incremental Sync:** Once the initial load is complete, the table enters a scheduled incremental synchronization mode. On each run, the connector uses BigQuery's CHANGES function to read the journal of row-level changes (inserts, updates, deletes) that occurred since the last synchronization. These changes are then fetched and merged into the destination table in Snowflake. ## Openflow requirements The minimum runtime size must be `Medium`. Use a larger runtime and multi-node Openflow setup if you are replicating large data volumes. ## Limitations - BigQuery guarantees that data streams used to fetch source data remain valid for at least 6 hours. As a result, the process of reading the source table must be completed in less than 6 hours to prevent the data streams from expiring. You must use a larger, multi-node runtime when ingesting tables with data volumes that are larger than 100GB. - BigQuery’s BIGNUMERIC type supports a higher precision (up to 76 digits) than Snowflake's NUMBER type (38 digits). The connector cannot ingest values from BIGNUMERIC columns that exceed the Snowflake limit. - The connector does not support replication of external tables. - View replication uses a truncate and load strategy only. Incremental synchronization (CDC) is not supported for views. - Incremental syncs require a primary key to correctly handle updates and deletes. For tables without a primary key, the connector does not support deletes and treats updates as new inserts. You must ensure that the primary key constraints are met. If the field marked as the primary key is not unique, data inconsistency can occur during incremental mode. - The connector uses the [BigQuery's CHANGES](https://cloud.google.com/bigquery/docs/reference/standard-sql/time-series-functions#changes) function for incremental updates. Because this function cannot query the last ten minutes of table history, replicated data in incremental mode has a minimum 10-minute lag behind the source. - The incremental sync process is limited to a maximum 24-hour data window due to the BigQuery CHANGES function. If the replication lag for a table exceeds this period, the connector truncates the change window to 24 hours to proceed with the sync. This truncation can result in data loss. - The connector inherits all other limitations of the BigQuery CHANGES function. For more information, see the [BigQuery CHANGES function documentation](https://cloud.google.com/bigquery/docs/reference/standard-sql/time-series-functions#changes). ## View replication The connector supports replication of standard views and materialized views from BigQuery to Snowflake. Unlike table replication, views do not support incremental synchronization (CDC). Instead, the connector uses a **truncate and load** strategy: on each synchronization cycle, the connector fully replaces the data in the Snowflake destination table with the current contents of the source view. The view synchronization frequency is configured separately from table incremental sync frequency using the **View Sync Frequency** parameter. Runs do not overlap. If a cycle takes longer than the configured interval, the next run waits for the previous run to finish. You can filter which views to replicate using the **Included View Names** and **Included View Names Regex** parameters. These filters apply across all datasets selected for replication. The connector creates temporary tables in BigQuery during view ingestion. Use the **Temporary Table Dataset** parameter to specify a dedicated dataset for these temporary tables. Snowflake recommends using a separate dataset for temporary tables and not using the ingested dataset for this purpose. ## Data type mapping The connector maps BigQuery data types to the corresponding Snowflake data types.
BigQuery Data Type Snowflake Data Type BIGNUMERIC NUMBER NUMERIC NUMBER GEOGRAPHY VARCHAR DATETIME TIMESTAMP_NTZ JSON OBJECT STRUCT OBJECT RANGE OBJECT INTERVAL OBJECT TIMESTAMP TIMESTAMP_NTZ DATE DATE TIME TIME INT64 / INTEGER NUMBER FLOAT64 FLOAT BOOL / BOOLEAN BOOLEAN STRING VARCHAR BYTES BINARY ARRAY ARRAY
## Track data changes in Google BigQuery The connector's incremental sync functionality is built on [BigQuery's native CHANGES function](https://cloud.google.com/bigquery/docs/reference/standard-sql/time-series-functions#changes). When you enable change history on a source table, BigQuery maintains an internal journal of all row-level modifications (inserts, updates, and deletes). The connector queries this journal on a configured incremental sync frequency schedule to retrieve a feed of changes. The connector materializes these changes into a journal table within the same BigQuery dataset. This journal table follows a consistent naming convention: `___journal` These journal tables are managed entirely by the connector during the replication process and are used to merge data into the final destination table in Snowflake. Do not modify the journal tables in any way. Modifying journal tables can disrupt the synchronization process and lead to data integrity issues. The merge operation handles changes differently for tables with a Primary Key (PK) and tables without one. ### Tables with a Primary Key For tables with a primary key, the connector handles data changes as follows:
Inserts and Updates:
Rows identified as `INSERT` or `UPDATE` are "upserted" into the corresponding Snowflake table.
Deletes:
To preserve data history, the connector uses a soft-delete strategy. Instead of physically removing a deleted row from Snowflake, the connector performs an `UPDATE` on the target row, setting the `_SNOWFLAKE_DELETED` column to `TRUE`.
### Tables without a Primary Key For tables without a primary key, the connector handles data changes as follows:
Inserts and Updates:
Rows identified as `INSERT` or `UPDATE` are treated the same way and are inserted into the corresponding Snowflake table.
Deletes:
Not supported.
The connector automatically adds the `_SNOWFLAKE_DELETED` (BOOLEAN) column to the destination table schema when it is created. ### Configured synchronization frequency schedule vs actual synchronization frequency The Incremental Sync Frequency schedule determines the table synchronization frequency. If the schedule you specified is more frequent than the actual time required to synchronize the table, the system does not follow the schedule you specified. This occurs because incremental cycles must execute sequentially and cannot overlap. ## Schema Evolution The connector supports several common schema changes in the source BigQuery table. The following schema changes are detected and propagated to the Snowflake destination table:
Column Addition:
New columns added in BigQuery are automatically added to the corresponding Snowflake table.
Column Deletion (Soft Delete):
When a column is dropped in BigQuery, the connector performs a "soft delete" in Snowflake. The column is not dropped from the destination table. Instead, it is renamed by adding the `_SNOWFLAKE_DELETED` suffix to the end of the column name. For example `my_column` becomes `my_column_SNOWFLAKE_DELETED`. This preserves historical data in Snowflake.
Column Rename:
A column rename operation is a two-step process: 1. The original column is "soft deleted" and renamed with the `_SNOWFLAKE_DELETED` suffix added. 2. A new column with the new name is added to the Snowflake table.
Primary Key Modification:
Adding, removing and changing primary keys is supported.
Data Type Changes:
Only changes that widen the existing type are tolerated. Any change that narrows a column’s type or converts it to an incompatible type is not supported and will cause replication for that table to fail.
## Next steps For information on how to set up the connector, see the following topic: - [](/user-guide/data-integration/openflow/connectors/google-big-query/setup) --- title: About the Openflow Connector for Salesforce Bulk API source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about.md section: Loading & Unloading Data --- # About the %salesforcebulkapiof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/configure-connector) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/formula-fields) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/troubleshoot) This topic describes the basic concepts of the %salesforcebulkapiof%, its workflow, and limitations. ## Zero-copy integration with Salesforce Data Cloud Snowflake offers zero-copy bidirectional sharing and integration with Salesforce. This integration is recommended if you use Salesforce Data Cloud and require near real-time bidirectional integration. For more information about zero-copy integration with Salesforce Data Cloud, see the following blog posts: - [Share Your Data from Salesforce Data Cloud to Snowflake](https://developer.salesforce.com/blogs/2024/08/share-your-data-from-salesforce-data-cloud-to-snowflake) - [Zero Copy Data Federation with Snowflake and Salesforce Data Cloud](https://developer.salesforce.com/blogs/2024/08/zero-copy-data-federation-with-snowflake-and-salesforce-data-cloud) ## About the %salesforcebulkapiof% The %salesforcebulkapiof% provides replication-based data integration. This connector is designed for users who do not use Salesforce Data Cloud and prefer a fully managed Snowflake Openflow connector. The connector uses public Salesforce REST APIs to replicate data from Salesforce to Snowflake at a user-defined frequency. The connector supports Change Data Capture (CDC) and keeps data in Snowflake in sync with Salesforce. You can use one or both types of data integrations depending on your specific use cases. This topic describes how to set up and use the %salesforcebulkapiof% to replicate data from Salesforce to Snowflake. ## Use cases Use the %salesforcebulkapiof% to replicate standard or custom objects from Salesforce to Snowflake at a user-specified frequency and keep them up to date in Snowflake. ## Workflow The following workflow describes the steps to set up and use the %salesforcebulkapiof%. 1. A Salesforce administrator creates and configures an external client app in Salesforce and approves it for a specific user. 2. The Openflow administrator performs the following tasks: 1. Create a service user for the connector, a warehouse for the connector, and a destination database and schema to replicate into. 2. Install the connector. 3. Specify the required parameters for the flow template. 3. The data engineer runs the flow to replicate objects from Salesforce to Snowflake. ## Limitations Consider the following limitations when using the connector: - Custom Salesforce domains are not supported. - Traversing object relationships and fetching related objects is not supported. - The connector does not support hard deletes in Snowflake. You can either run a query on the destination table to delete all rows where the `isDeleted` column is `true` or perform a full refresh of the destination table to reflect "hard deletes". - Fields of type `location`, `address`, or `base64` are not supported and are ignored. - You cannot consolidate data from multiple Salesforce instances into a single database in Snowflake. Data from a single Salesforce instance or org is ingested into a single database in Snowflake. A table is created in this database for each Salesforce object replicated. - Files attached to Salesforce records are ignored. - Formula fields are not replicated as data from Salesforce. Instead, the connector can translate supported Salesforce formulas into Snowflake SQL views. See [](#salesforce-formula-fields) for details on supported formulas and limitations. ## Authentication The connector uses the OAuth 2.0 JWT Bearer Flow via an external client app to connect to Salesforce and to retrieve data. This is the only supported OAuth flow for the connector. Using a different OAuth flow type (such as Authorization Code Flow) or misconfiguring the external client app can result in `invalid_grant` errors. See [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce) for documentation on how to configure the external client app in Salesforce, and [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/troubleshoot) for help with common authentication errors. ## Replication lifecycle The connector replicates data in two stages: initial replication and incremental replication. ### Initial replication The connector calls the Salesforce Bulk API 2.0 to discover standard and custom objects specified in the connector configuration. The connector respects Bulk API 2.0 API limits. - The connector creates one table per custom or standard object with one column for each field. - The connector uses Snowpipe Streaming for the initial load to insert rows in the table based on the values of the fields from the Salesforce object. ### Incremental replication Incremental updates use a Snowflake warehouse that can be configured in the connector parameters. Depending on your latency and data freshness requirements, you can configure the refresh frequency for updates from 1 minute to 24 hours, which determines how often the tables in Snowflake are refreshed. Using the refresh frequency you specify, the connector calls the Salesforce Bulk API to detect changes in previously ingested objects. The connector identifies changed records by checking specific timestamp fields in the Salesforce objects. For most objects, the connector uses the `SystemModstamp` field. If `SystemModstamp` is not available, the connector attempts to use the following fields, in order of preference: 1. `LastModifiedDate` 2. `CreatedDate` 3. `LoginTime` For history tables (objects where History Tracking is enabled), the connector always uses the `CreatedDate` field to detect changes. The connector then uses Snowpipe Streaming to push the incremental data into a staging table and executes a merge query to load the data into the final destination table. ## Schema evolution The connector supports schema evolution when the source objects change in Salesforce.
When a new field is added to the source object:
The connector adds a new column to the destination table in Snowflake.
When an existing field is renamed in the source object:
The connector treats the rename as both a field deletion and a field addition. The field addition causes a new column to be added to the destination table. The field deletion is handled as described next.
When an existing field is deleted in the source object:
The connector supports three strategies: - Delete: Deletes the corresponding column in the destination table in Snowflake. This is the default behavior. - Ignore: Ignores the deleted field in the source and skips it in the future. - Rename: Renames the deleted field in the destination table.
For example, if the deletion strategy is set to `Ignore` and a field is renamed for a Salesforce object, the existing column in Snowflake will be unchanged and a new column with the new field name will be added. ## How objects are deleted When objects are deleted in Salesforce, the connector does not "hard delete" them from Snowflake. The connector performs "soft deletes" for objects deleted in Salesforce and indicates that the source objects were deleted by setting the `isDeleted` column to `true` in the corresponding Snowflake tables. The connector does not support "hard deletes". You can either run a query on the destination table to delete all rows where the `isDeleted` column is `true` or perform a full refresh of the destination table to reflect "hard deletes". The connector may miss delete operations in situations where objects are deleted in Salesforce and purged from Salesforce's recycling bin when the connector was not running, for example if the connector was paused or stopped. You must perform a full refresh of the destination table to recover in these situations. ## Automatic retry handling The connector automatically retries failed operations or API errors using an exponential backoff strategy. The connector waits one second before the first retry, then doubles the wait time for each subsequent retry (two seconds, four seconds, and so on). If the failures persist, the connector stops retrying until the next scheduled run. You can monitor this activity in the [event table](/user-guide/data-integration/openflow/monitor). ## Use multiple connector instances to handle different sync schedules If you need to sync different objects at different frequencies, for example some every 30 minutes and others every 24 hours, Snowflake recommends deploying two separate connector instances within the same runtime. You can then configure the sync parameters independently for each instance. Deploying multiple connector instances in the same runtime does not incur additional costs. Similarly, if you need to fully fetch some objects every time the connector runs, Snowflake recommends deploying two separate connector instances within the same runtime and configuring the parameters for each instance. ## Salesforce formula fields Salesforce formula fields are calculated fields whose values are derived from expressions defined in Salesforce. Because the Salesforce Bulk API does not support incremental retrieval of formula field values, the connector takes a different approach: it translates the Salesforce formula expressions into Snowflake SQL and creates a view for each object that contains formula fields. To enable this feature, set the **Enable Views Creation** parameter to `true` in the connector configuration. See [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/configure-connector) for details. For more information, see [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/formula-fields). ## Next steps For information on how to set up the connector, see the following topic: - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce) --- title: About the Openflow Connector for Shopify source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/shopify/about.md section: Loading & Unloading Data --- # About the %shopifyof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/shopify/setup) - [](/user-guide/data-integration/openflow/connectors/shopify/object-definitions) - [](/user-guide/data-integration/openflow/connectors/shopify/maintain) This topic describes the basic concepts, workflow, and limitations of the %shopifyof%. The %shopifyof% replicates data from a Shopify store into Snowflake using the [Shopify Admin GraphQL API](https://shopify.dev/docs/api/admin-graphql). The connector leverages [Bulk Operations](https://shopify.dev/docs/api/usage/bulk-operations/queries) to efficiently extract large volumes of data and uses Snowpipe Streaming to load it into Snowflake. It supports initial bulk loads, incremental updates, and delete detection. ## Use cases Use the %shopifyof% to replicate data from your Shopify store into Snowflake for the following: - **E-commerce analytics:** Centralize order, product, and customer data in Snowflake for cross-channel reporting and business intelligence. - **Inventory management:** Bring inventory items and locations into Snowflake for demand forecasting and supply chain optimization. - **Customer insights:** Replicate customer and segment data for personalization, cohort analysis, and lifetime value modeling. - **Operational reporting:** Track fulfillment orders, draft orders, and transactions in Snowflake for real-time operational dashboards. ## Supported objects The connector ships with a built-in catalog of commonly replicated Shopify object types, including: - **Orders:** order data including line items, shipping and billing addresses, financial status, and fulfillment details. - **Products** and **Product Variants:** product catalog data including pricing, inventory, and variant information. - **Customers:** customer profiles, contact details, and marketing preferences. - **Collections:** manual and automated product collections. - **Inventory Items:** stock quantities and inventory tracking data. - **Fulfillment Orders:** fulfillment assignments and shipping details. The connector isn't limited to these objects. You can replicate any object type supported by the Shopify Admin GraphQL API by providing a custom definition in the **Object Definitions Override** parameter. Custom definitions let you choose which fields to extract, use GraphQL aliases to label or rename fields, and promote values from nested objects into dedicated top-level Snowflake columns. For more information, see [](/user-guide/data-integration/openflow/connectors/shopify/object-definitions). For objects not in the built-in catalog, the connector can also auto-discover the schema using GraphQL introspection. For more information, see [Auto-discovery](#label-auto-discovery). ## The replication lifecycle The connector replicates data in two stages: initial bulk load and incremental synchronization. ### Initial bulk load When the connector runs for the first time (or after a state reset), it performs a bulk query for each configured object type using the Shopify Bulk Operations API. The connector: 1. Submits a bulk query to Shopify for each object type. 2. Polls Shopify until the bulk operation completes and a JSONL result file is available. 3. Downloads the JSONL result, flattens child connections into separate tables (with a `__PARENT_ID` column linking them to the parent record), and derives the Snowflake table schema from the GraphQL response. 4. Loads data into Snowflake using Snowpipe Streaming and merges it into the destination tables. ### Incremental synchronization After the initial load, the connector transitions to incremental mode. It uses timestamp-based watermarks to retrieve only records that have changed since the last sync. The connector selects the incremental field by checking the object's available fields against a priority list (`updatedAt`, `createdAt`, `processedAt`) and using the first match. The incremental frequency is user-configurable. Each incremental run retrieves changed records and merges them into the destination tables. ## Authentication The connector authenticates with Shopify using an Admin API access token obtained from a custom app in your Shopify store. The token is passed in the `X-Shopify-Access-Token` HTTP header for all API requests. For instructions on creating a custom app and generating an access token, see [](/user-guide/data-integration/openflow/connectors/shopify/setup). ## Auto-discovery The connector ships with a built-in object catalog that defines the GraphQL query structure for a set of commonly used Shopify object types. For objects not included in the catalog, the connector can optionally query the Shopify Admin GraphQL introspection endpoint to discover the schema dynamically. Snowflake plans to add more objects to the catalog in a future release. Auto-discovered definitions are cached in NiFi distributed state for 24 hours to avoid repeated introspection calls. For more information, see the [Enable Introspection](/user-guide/data-integration/openflow/connectors/shopify/setup#label-shopify-parameters) parameter. ## How deletes are handled For objects that support delete detection, the connector periodically queries the Shopify [Events API](https://shopify.dev/docs/api/admin-graphql/latest/queries/events) using `action: "destroy"` and applies soft deletes in Snowflake. Only object types that emit destroy events in the Shopify Events API support delete detection. The connector sets a `__SNOWFLAKE_IS_DELETED` column to TRUE and a `__SNOWFLAKE_DELETED_AT` column to the timestamp of the deletion event. Rows are never physically removed from the destination table. When a parent record is soft-deleted, the connector cascades the soft delete to all registered child tables (for example, variants associated with a deleted product). ## Automatic retry and rate limiting The connector respects Shopify's rate limiting model, which uses a leaky bucket algorithm with a 1,000-point capacity that refills at 50 points per second. The connector tracks available points and automatically waits when the bucket is low to avoid throttling errors. For throttling responses, the connector retries automatically. When Shopify returns an HTTP 429, the connector waits for the duration specified in the `Retry-After` header before retrying. When the API returns a `THROTTLED` GraphQL error, the connector retries with exponential backoff. The default configuration allows up to 3 retries with an initial backoff of 1 second. ## Child record flattening For objects with nested connections (such as order line items or returns), the connector automatically extracts child records into separate Snowflake tables. Each child table includes a `__PARENT_ID` column that references the parent record's Shopify GID, enabling joins between parent and child tables. ## Limitations Consider the following limitations when using the connector: - The connector requires a Shopify custom app with Admin API access. OAuth authentication flows aren't supported. - The [Shopify Bulk Operations API](https://shopify.dev/docs/api/usage/bulk-operations/queries) supports a maximum of 5 nested connections per query. - The connector currently supports data extraction (ingestion) only. Writing data back to Shopify isn't supported. - Schema evolution isn't supported in the current release. If source objects gain or lose fields in Shopify, you must [reset the connector state for the affected object](/user-guide/data-integration/openflow/connectors/shopify/maintain#label-reload-a-specific-object) to re-ingest it with the updated schema. Snowflake plans to add schema evolution support in a future release. - Rate limits depend on your Shopify plan. The connector respects Shopify's leaky bucket throttling, but very high-volume stores with many objects might require careful scheduling to avoid sustained throttling. - Delete detection is only available for the object types listed in the **Objects to Track for Deletes** parameter. If an object type doesn't emit destroy events in the Shopify Events API, delete polls for that type return zero results. - For objects with nested child connections (such as order line items), the connector fetches up to the configured `pageSize` child records per parent (default: 250). Child records beyond this limit aren't ingested. ## Next steps To set up the connector, see [](/user-guide/data-integration/openflow/connectors/shopify/setup). --- title: About the Openflow Connector for Veeva Vault source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/veeva-vault/about.md section: Loading & Unloading Data --- # About the %veevavaultof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/veeva-vault/setup) - [](/user-guide/data-integration/openflow/connectors/veeva-vault/use) The %veevavaultof% replicates data from a Veeva Vault instance into Snowflake using [Direct Data API](https://general.veevavault.dev/direct-data-api). The connector downloads Direct Data files, extracts the CSV data they contain, and loads that data into Snowflake tables using Snowpipe Streaming. It supports full snapshots, incremental updates, and optional audit log ingestion. ## Use cases The connector supports the following use cases: - **Full data replication:** Perform a one-time full snapshot of all Veeva Vault data into Snowflake for reporting, analytics, and compliance. - **Incremental synchronization:** After the initial snapshot, the connector polls for incremental Direct Data files on a configurable schedule (default: every 15 minutes) to keep Snowflake tables up to date with changes in Veeva Vault. - **Audit log ingestion:** Optionally ingest Veeva Vault audit log archives alongside data archives, providing a complete audit trail in Snowflake. - **Migration and analytics:** Centralize Veeva Vault data in Snowflake for cross-system analytics, data science, and regulatory reporting. ## The replication lifecycle A table's replication cycle begins with a full data snapshot and then transitions to incremental synchronization. 1. **Snapshot phase:** The connector downloads the latest full Direct Data file from Veeva Vault. This archive is a tar.gz file containing one CSV per Vault data object. The connector unpacks the archive, creates a destination table in Snowflake for each object (if it doesn't already exist), loads the data through Snowpipe Streaming into a staging table, and merges the staging data into the final destination table. 2. **Incremental phase:** After the snapshot completes, the connector polls Veeva Vault for incremental Direct Data files. Each incremental file contains only the records that changed since the previous incremental file. The connector applies updates through the same staging-and-merge pipeline and processes deletes separately based on the configured delete strategy. Data freshness in Snowflake depends on how frequently Veeva Vault publishes Direct Data files and the configured sync frequency of the connector. 3. **Audit log phase (optional):** When audit log ingestion is enabled, the connector also downloads `log_directdata` files and loads them into Snowflake following the same pipeline. The connector groups Direct Data files by their reported time window and processes one window at a time to ensure each batch is handled atomically before moving to the next. The connector tracks its progress using a persisted state that records the last processed timestamp. If the connector is stopped and restarted, it resumes from where it left off. ## Ingestion modes The connector supports three ingestion modes that control how Direct Data files are consumed:
SNAPSHOT_AND_INCREMENTAL (default):
The connector first processes the latest full Direct Data file (snapshot). Once the snapshot is complete, it transitions to polling for incremental archives. This is the recommended mode for most deployments.
SNAPSHOT:
The connector continuously polls for the latest full Direct Data file. Each time a new full file becomes available, it is processed. Use this mode when you want to periodically replace all data in Snowflake with a fresh full export.
INCREMENTAL:
The connector polls only for incremental Direct Data files. No full snapshot is performed. Use this mode when a snapshot has already been loaded by other means or when only recent changes are needed. You can optionally specify a start time to control how far back incremental polling begins.
## Authentication The connector authenticates with Veeva Vault using session-based authentication. You provide a service account username and password, and the connector obtains a session identifier from the Vault API auth endpoint. This session is reused across requests and is automatically refreshed when it expires. For Snowflake authentication, the connector supports two strategies:
SNOWFLAKE_MANAGED (default):
Uses the Snowflake-managed token associated with the Openflow runtime role. This is the recommended strategy for both %ofsfspcs-plural% and %ofbyoc-plural%.
KEY_PAIR:
Uses a user-provided RSA key pair for authentication. This strategy is available only on %ofbyoc-plural% and is intended for cross-account scenarios where the connector needs to write to a Snowflake account different from the one hosting the Openflow runtime.
## How deletes are handled When the connector receives a delete extract from Veeva Vault, it applies the deletes in Snowflake according to the configured delete strategy:
Hard Delete (default):
Rows are permanently removed from the destination table using a `DELETE` statement.
Soft Delete:
Rows are not removed. Instead, the connector sets a `__SNOWFLAKE_DELETED` column to `TRUE` and a `__SNOWFLAKE_DELETED_AT` column to the current timestamp. If these columns don't exist in the destination table, the connector adds them automatically.
## Schema evolution The connector supports schema evolution when the structure of Veeva Vault data changes between files. When the connector detects new columns in an incoming file, it automatically adds those columns to the destination and staging tables in Snowflake. When a column is no longer present in the incoming file, the connector applies the configured column removal strategy:
Drop Column (default):
Drops the column from the Snowflake table.
Rename Column:
Renames the column by appending a configurable suffix (default: `__deleted`). This preserves historical data in the table.
Ignore Column:
Leaves the column as-is in the Snowflake table and stops populating it.
## Automatic retry handling The connector automatically retries failed API calls using an exponential backoff strategy. Retryable conditions include HTTP status codes 500, 502, 503, and 504, as well as transient network errors. If a session expires or becomes invalid, the connector automatically re-authenticates and retries the request. ## Limitations Consider the following limitations when using the connector: - Direct Data must be enabled on your Vault instance before using the connector. Contact your Veeva Vault administrator to enable this feature. - The connector authenticates using session-based username and password credentials. Other authentication methods (such as OAuth) aren't yet supported. - The connector replicates structured data from Direct Data files only. Document and attachment content (such as files stored in Veeva Vault) isn't replicated. - The connector currently only performs an initial load of objects of type [legacy_workflow](https://platform.veevavault.help/en/gr/5205/) and doesn't replicate ongoing changes. ## Next steps For information on how to set up the connector, see [](/user-guide/data-integration/openflow/connectors/veeva-vault/setup). --- title: ADLSCredentialsControllerService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/adlscredentialscontrollerservice.md section: Loading & Unloading Data --- # ADLSCredentialsControllerService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Defines credentials for ADLS processors. ## Tags adls, azure, cloud, credentials, microsoft, storage ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Account Key * Account Key The storage account key. This is an admin-like password providing access to every container in this account. It is recommended one uses Shared Access Signature (SAS) token, Managed Identity or Service Principal instead for fine-grained control with policies. There are certain risks in allowing the account key to be stored as a FlowFile attribute. While it does provide for a more flexible flow by allowing the account key to be fetched dynamically from a FlowFile attribute, care must be taken to restrict access to the event provenance data (e.g., by strictly controlling the policies governing provenance for this processor). In addition, the provenance repositories may be put on encrypted disk partitions. Credentials Type * Credentials Type SAS_TOKEN - Account Key - SAS Token - Managed Identity - Service Principal Credentials type to be used for authenticating to Azure Endpoint Suffix * Endpoint Suffix dfs.core.windows.net Storage accounts in public Azure always use a common FQDN suffix. Override this endpoint suffix with a different suffix in certain circumstances (like Azure Stack or non-public Azure regions). Managed Identity Client ID Managed Identity Client ID Client ID of the managed identity. The property is required when User Assigned Managed Identity is used for authentication. It must be empty in case of System Assigned Managed Identity. SAS Token * SAS Token Shared Access Signature token (the leading '?' may be included) There are certain risks in allowing the SAS token to be stored as a FlowFile attribute. While it does provide for a more flexible flow by allowing the SAS token to be fetched dynamically from a FlowFile attribute, care must be taken to restrict access to the event provenance data (e.g., by strictly controlling the policies governing provenance for this processor). In addition, the provenance repositories may be put on encrypted disk partitions. Service Principal Client ID * Service Principal Client ID Client ID (or Application ID) of the Client/Application having the Service Principal. Service Principal Client Secret * Service Principal Client Secret Password of the Client/Application. Service Principal Tenant ID * Service Principal Tenant ID Tenant ID of the Azure Active Directory hosting the Service Principal. Storage Account Name * Storage Account Name The storage account name. There are certain risks in allowing the account name to be stored as a FlowFile attribute. While it does provide for a more flexible flow by allowing the account name to be fetched dynamically from a FlowFile attribute, care must be taken to restrict access to the event provenance data (e.g., by strictly controlling the policies governing provenance for this processor). In addition, the provenance repositories may be put on encrypted disk partitions. Proxy Configuration Service proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: ADLSCredentialsControllerServiceLookup source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/adlscredentialscontrollerservicelookup.md section: Loading & Unloading Data --- # ADLSCredentialsControllerServiceLookup This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides an ADLSCredentialsService that can be used to dynamically select another ADLSCredentialsService. This service requires an attribute named 'adls.credentials.name' to be passed in, and will throw an exception if the attribute is missing. The value of 'adls.credentials.name' will be used to select the ADLSCredentialsService that has been registered with that name. This will allow multiple ADLSCredentialsServices to be defined and registered, and then selected dynamically at runtime by tagging flow files with the appropriate 'adls.credentials.name' attribute. ## Tags adls, azure, cloud, credentials, microsoft, storage ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: All controller services (alphabetical) source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/index.md section: Loading & Unloading Data --- # All controller services (alphabetical) - [](/user-guide/data-integration/openflow/about) This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). This topic provides a list of all openflow controller services in alphabetical order. The list includes:
- Type of controller service (Snowflake or not) - The name of each controller service - A summary of each controller service
## A
Controller Description [ADLSCredentialsControllerService](/user-guide/data-integration/openflow/controllers/adlscredentialscontrollerservice) Defines credentials for ADLS processors. [ADLSCredentialsControllerServiceLookup](/user-guide/data-integration/openflow/controllers/adlscredentialscontrollerservicelookup) Provides an ADLSCredentialsService that can be used to dynamically select another ADLSCredentialsService. [AmazonGlueEncodedSchemaReferenceReader](/user-guide/data-integration/openflow/controllers/amazonglueencodedschemareferencereader) Reads Schema Identifier according to AWS Glue Schema encoding as a header consisting of a two byte markers and a 16 byte UUID [AmazonGlueSchemaRegistry](/user-guide/data-integration/openflow/controllers/amazonglueschemaregistry) Provides a Schema Registry that interacts with the AWS Glue Schema Registry so that those Schemas that are stored in the Glue Schema Registry can be used in NiFi. [AmazonMSKConnectionService](/user-guide/data-integration/openflow/controllers/amazonmskconnectionservice) Provides and manages connections to AWS MSK Kafka Brokers for producer or consumer operations. %logo-snowflake-blue% [AmazonMSKConnectionService](/user-guide/data-integration/openflow/controllers/amazonmskconnectionservice) Provides and manages connections to AWS MSK Kafka Brokers for producer or consumer operations. [ApicurioSchemaRegistry](/user-guide/data-integration/openflow/controllers/apicurioschemaregistry) Provides a Schema Registry that interacts with the Apicurio Schema Registry so that those Schemas that are stored in the Apicurio Schema Registry can be used in NiFi. [AvroReader](/user-guide/data-integration/openflow/controllers/avroreader) Parses Avro data and returns each Avro record as an separate Record object. [AvroRecordSetWriter](/user-guide/data-integration/openflow/controllers/avrorecordsetwriter) Writes the contents of a RecordSet in Binary Avro format. [AvroSchemaRegistry](/user-guide/data-integration/openflow/controllers/avroschemaregistry) Provides a service for registering and accessing schemas. [AWSCredentialsProviderControllerService](/user-guide/data-integration/openflow/controllers/awscredentialsprovidercontrollerservice) Defines credentials for Amazon Web Services processors. [AzureBlobStorageFileResourceService](/user-guide/data-integration/openflow/controllers/azureblobstoragefileresourceservice) Provides an Azure Blob Storage file resource for other components. [AzureCosmosDBClientService](/user-guide/data-integration/openflow/controllers/azurecosmosdbclientservice) Provides a controller service that configures a connection to Cosmos DB (Core SQL API) and provides access to that connection to other Cosmos DB-related components. [AzureDataLakeStorageFileResourceService](/user-guide/data-integration/openflow/controllers/azuredatalakestoragefileresourceservice) Provides an Azure Data Lake Storage (ADLS) file resource for other components. [AzureEventHubRecordSink](/user-guide/data-integration/openflow/controllers/azureeventhubrecordsink) Format and send Records to Azure Event Hubs [AzureStorageCredentialsControllerService_v12](/user-guide/data-integration/openflow/controllers/azurestoragecredentialscontrollerservice_v12) Provides credentials for Azure Storage processors using Azure Storage client library v12. [AzureStorageCredentialsControllerServiceLookup_v12](/user-guide/data-integration/openflow/controllers/azurestoragecredentialscontrollerservicelookup_v12) Provides an AzureStorageCredentialsService_v12 that can be used to dynamically select another AzureStorageCredentialsService_v12.
## C
Controller Description [CEFReader](/user-guide/data-integration/openflow/controllers/cefreader) Parses CEF (Common Event Format) events, returning each row as a record. [ConfluentEncodedSchemaReferenceReader](/user-guide/data-integration/openflow/controllers/confluentencodedschemareferencereader) Reads Schema Identifier according to Confluent encoding as a header consisting of a byte marker and an integer represented as four bytes [ConfluentEncodedSchemaReferenceWriter](/user-guide/data-integration/openflow/controllers/confluentencodedschemareferencewriter) Writes Schema Identifier according to Confluent encoding as a header consisting of a byte marker and an integer represented as four bytes [ConfluentProtobufMessageNameResolver](/user-guide/data-integration/openflow/controllers/confluentprotobufmessagenameresolver) Resolves Protobuf message names from Confluent Schema Registry wire format by decoding message indexes and looking up the fully qualified name in the schema definition For Confluent wire format reference see: [https://docs](https://docs). [ConfluentSchemaRegistry](/user-guide/data-integration/openflow/controllers/confluentschemaregistry) Provides a Schema Registry that interacts with the Confluent Schema Registry so that those Schemas that are stored in the Confluent Schema Registry can be used in NiFi. [CSVReader](/user-guide/data-integration/openflow/controllers/csvreader) Parses CSV-formatted data, returning each row in the CSV file as a separate record. [CSVRecordLookupService](/user-guide/data-integration/openflow/controllers/csvrecordlookupservice) A reloadable CSV file-based lookup service. [CSVRecordSetWriter](/user-guide/data-integration/openflow/controllers/csvrecordsetwriter) Writes the contents of a RecordSet as CSV data.
## D
Controller Description %logo-snowflake-blue% [DatabaseLookup](/user-guide/data-integration/openflow/controllers/databaselookup) A Lookup Service that allows for enrichment with a database using a user-specified SQL statement. [DatabaseRecordLookupService](/user-guide/data-integration/openflow/controllers/databaserecordlookupservice) A relational-database-based lookup service. [DatabaseRecordSink](/user-guide/data-integration/openflow/controllers/databaserecordsink) Provides a service to write records using a configured database connection. [DBCPConnectionPool](/user-guide/data-integration/openflow/controllers/dbcpconnectionpool) Provides Database Connection Pooling Service. [DBCPConnectionPoolLookup](/user-guide/data-integration/openflow/controllers/dbcpconnectionpoollookup) Provides a DBCPService that can be used to dynamically select another DBCPService. [DeveloperBoxClientService](/user-guide/data-integration/openflow/controllers/developerboxclientservice) Provides Box client objects through which Box API calls can be used. [DistributedMapCacheLookupService](/user-guide/data-integration/openflow/controllers/distributedmapcachelookupservice) Allows to choose a distributed map cache client to retrieve the value associated to a key.
## E
Controller Description [ElasticSearchClientServiceImpl](/user-guide/data-integration/openflow/controllers/elasticsearchclientserviceimpl) A controller service for accessing an Elasticsearch client, using the Elasticsearch (low-level) REST Client. [ElasticSearchLookupService](/user-guide/data-integration/openflow/controllers/elasticsearchlookupservice) Lookup a record from Elasticsearch Server associated with the specified document ID. [ElasticSearchStringLookupService](/user-guide/data-integration/openflow/controllers/elasticsearchstringlookupservice) Lookup a string value from Elasticsearch Server associated with the specified document ID. [EmailRecordSink](/user-guide/data-integration/openflow/controllers/emailrecordsink) Provides a RecordSinkService that can be used to send records in email using the specified writer for formatting. [EmbeddedHazelcastCacheManager](/user-guide/data-integration/openflow/controllers/embeddedhazelcastcachemanager) A service that runs embedded Hazelcast and provides cache instances backed by that. [ExcelReader](/user-guide/data-integration/openflow/controllers/excelreader) Parses a Microsoft Excel document returning each row in each sheet as a separate record. [ExternalHazelcastCacheManager](/user-guide/data-integration/openflow/controllers/externalhazelcastcachemanager) A service that provides cache instances backed by Hazelcast running outside of NiFi.
## F
Controller Description [FreeFormTextRecordSetWriter](/user-guide/data-integration/openflow/controllers/freeformtextrecordsetwriter) Writes the contents of a RecordSet as free-form text.
## G
Controller Description [GCPCredentialsControllerService](/user-guide/data-integration/openflow/controllers/gcpcredentialscontrollerservice) Defines credentials for Google Cloud Platform processors. [GCSFileResourceService](/user-guide/data-integration/openflow/controllers/gcsfileresourceservice) Provides a Google Compute Storage (GCS) file resource for other components. [GrokReader](/user-guide/data-integration/openflow/controllers/grokreader) Provides a mechanism for reading unstructured text data, such as log files, and structuring the data so that it can be processed.
## H
Controller Description [HazelcastMapCacheClient](/user-guide/data-integration/openflow/controllers/hazelcastmapcacheclient) An implementation of DistributedMapCacheClient that uses Hazelcast as the backing cache. [HikariCPConnectionPool](/user-guide/data-integration/openflow/controllers/hikaricpconnectionpool) Provides Database Connection Pooling Service based on HikariCP. [HttpRecordSink](/user-guide/data-integration/openflow/controllers/httprecordsink) Format and send Records to a configured uri using HTTP post.
## I
Controller Description [IPLookupService](/user-guide/data-integration/openflow/controllers/iplookupservice) A lookup service that provides several types of enrichment information for IP addresses.
## J
Controller Description [JettyWebSocketClient](/user-guide/data-integration/openflow/controllers/jettywebsocketclient) Implementation of WebSocketClientService. [JettyWebSocketServer](/user-guide/data-integration/openflow/controllers/jettywebsocketserver) Implementation of WebSocketServerService. [JMSConnectionFactoryProvider](/user-guide/data-integration/openflow/controllers/jmsconnectionfactoryprovider) Provides a generic service to create vendor specific javax. [JndiJmsConnectionFactoryProvider](/user-guide/data-integration/openflow/controllers/jndijmsconnectionfactoryprovider) Provides a service to lookup an existing JMS ConnectionFactory using the Java Naming and Directory Interface (JNDI). [JsonConfigBasedBoxClientService](/user-guide/data-integration/openflow/controllers/jsonconfigbasedboxclientservice) Provides Box client objects through which Box API calls can be used. [JsonPathReader](/user-guide/data-integration/openflow/controllers/jsonpathreader) Parses JSON records and evaluates user-defined JSON Path 's against each JSON object. [JsonRecordSetWriter](/user-guide/data-integration/openflow/controllers/jsonrecordsetwriter) Writes the results of a RecordSet as either a JSON Array or one JSON object per line. %logo-snowflake-blue% [JsonTableColumnFilter](/user-guide/data-integration/openflow/controllers/jsontablecolumnfilter) Provides a table column filter based on a JSON configuration. [JsonTreeReader](/user-guide/data-integration/openflow/controllers/jsontreereader) Parses JSON into individual Record objects. [JWTBearerOAuth2AccessTokenProvider](/user-guide/data-integration/openflow/controllers/jwtbeareroauth2accesstokenprovider) Provides OAuth 2.
## K
Controller Description [Kafka3ConnectionService](/user-guide/data-integration/openflow/controllers/kafka3connectionservice) Provides and manages connections to Kafka Brokers for producer or consumer operations. %logo-snowflake-blue% [Kafka3ConnectionService](/user-guide/data-integration/openflow/controllers/kafka3connectionservice) Provides and manages connections to Kafka Brokers for producer or consumer operations.
## L
Controller Description [LoggingRecordSink](/user-guide/data-integration/openflow/controllers/loggingrecordsink) Provides a RecordSinkService that can be used to log records to the application log (nifi-app.
## M
Controller Description [MapCacheClientService](/user-guide/data-integration/openflow/controllers/mapcacheclientservice) Provides the ability to communicate with a MapCacheServer. [MapCacheServer](/user-guide/data-integration/openflow/controllers/mapcacheserver) Provides a map (key/value) cache that can be accessed over a socket. %logo-snowflake-blue% [MicrosoftClientCertificateOAuth2TokenProvider](/user-guide/data-integration/openflow/controllers/microsoftclientcertificateoauth2tokenprovider) Provides OAuth2 access tokens for the Microsoft Graph API using client_credentials with a client certificate. %logo-snowflake-blue% [MicrosoftGraphAuthenticationProvider](/user-guide/data-integration/openflow/controllers/microsoftgraphauthenticationprovider) Provides authentication for the Microsoft Graph API, which can be used for interacting with Microsoft 365 services. [MongoDBControllerService](/user-guide/data-integration/openflow/controllers/mongodbcontrollerservice) Provides a controller service that configures a connection to MongoDB and provides access to that connection to other Mongo-related components. [MongoDBLookupService](/user-guide/data-integration/openflow/controllers/mongodblookupservice) Provides a lookup service based around MongoDB.
## P
Controller Description %logo-snowflake-blue% [ParquetIcebergWriter](/user-guide/data-integration/openflow/controllers/parqueticebergwriter) Provides record serialization for Apache Iceberg using Apache Parquet formatting [PEMEncodedSSLContextProvider](/user-guide/data-integration/openflow/controllers/pemencodedsslcontextprovider) SSLContext Provider configurable using PEM Private Key and Certificate files. %logo-snowflake-blue% [PolarisIcebergCatalog](/user-guide/data-integration/openflow/controllers/polarisicebergcatalog) Provides Apache Iceberg integration with Apache Polaris Catalog access over REST HTTP [PropertiesFileLookupService](/user-guide/data-integration/openflow/controllers/propertiesfilelookupservice) A reloadable properties file-based lookup service [ProtobufReader](/user-guide/data-integration/openflow/controllers/protobufreader) Parses a Protocol Buffers message from binary format.
## R
Controller Description [ReaderLookup](/user-guide/data-integration/openflow/controllers/readerlookup) Provides a RecordReaderFactory that can be used to dynamically select another RecordReaderFactory. [RecordSetWriterLookup](/user-guide/data-integration/openflow/controllers/recordsetwriterlookup) Provides a RecordSetWriterFactory that can be used to dynamically select another RecordSetWriterFactory. [RecordSinkServiceLookup](/user-guide/data-integration/openflow/controllers/recordsinkservicelookup) Provides a RecordSinkService that can be used to dynamically select another RecordSinkService. [RedisConnectionPoolService](/user-guide/data-integration/openflow/controllers/redisconnectionpoolservice) A service that provides connections to Redis. [RedisDistributedMapCacheClientService](/user-guide/data-integration/openflow/controllers/redisdistributedmapcacheclientservice) An implementation of DistributedMapCacheClient that uses Redis as the backing cache. %logo-snowflake-blue% [RemoveFieldRecordReader](/user-guide/data-integration/openflow/controllers/removefieldrecordreader) A wrapper for a RecordReaderFactory that supports filtering out specified fields from NiFi Records. [RestLookupService](/user-guide/data-integration/openflow/controllers/restlookupservice) Use a REST service to look up values.
## S
Controller Description [S3FileResourceService](/user-guide/data-integration/openflow/controllers/s3fileresourceservice) Provides an Amazon Web Services (AWS) S3 file resource for other components. %logo-snowflake-blue% [SalesforceDataCloudOAuthTokenProvider](/user-guide/data-integration/openflow/controllers/salesforcedatacloudoauthtokenprovider) Retrieves an OAuth2 access token from Salesforce using the configured OAuth2 Access Token Provider and exchanges the token for a Data Cloud API token. [ScriptedLookupService](/user-guide/data-integration/openflow/controllers/scriptedlookupservice) Allows the user to provide a scripted LookupService instance in order to enrich records from an incoming flow file. [ScriptedReader](/user-guide/data-integration/openflow/controllers/scriptedreader) Allows the user to provide a scripted RecordReaderFactory instance in order to read/parse/generate records from an incoming flow file. [ScriptedRecordSetWriter](/user-guide/data-integration/openflow/controllers/scriptedrecordsetwriter) Allows the user to provide a scripted RecordSetWriterFactory instance in order to write records to an outgoing flow file. [ScriptedRecordSink](/user-guide/data-integration/openflow/controllers/scriptedrecordsink) Allows the user to provide a scripted RecordSinkService instance in order to transmit records to the desired target. [SetCacheClientService](/user-guide/data-integration/openflow/controllers/setcacheclientservice) Provides the ability to communicate with a SetCacheServer. [SetCacheServer](/user-guide/data-integration/openflow/controllers/setcacheserver) Provides a set (collection of unique values) cache that can be accessed over a socket. [SimpleCsvFileLookupService](/user-guide/data-integration/openflow/controllers/simplecsvfilelookupservice) A reloadable CSV file-based lookup service. [SimpleDatabaseLookupService](/user-guide/data-integration/openflow/controllers/simpledatabaselookupservice) A relational-database-based lookup service. [SimpleKeyValueLookupService](/user-guide/data-integration/openflow/controllers/simplekeyvaluelookupservice) Allows users to add key/value pairs as User-defined Properties. [SimpleRedisDistributedMapCacheClientService](/user-guide/data-integration/openflow/controllers/simpleredisdistributedmapcacheclientservice) An implementation of DistributedMapCacheClient that uses Redis as the backing cache. [SimpleScriptedLookupService](/user-guide/data-integration/openflow/controllers/simplescriptedlookupservice) Allows the user to provide a scripted LookupService instance in order to enrich records from an incoming flow file. [SlackRecordSink](/user-guide/data-integration/openflow/controllers/slackrecordsink) Format and send Records to a configured Channel using the Slack Post Message API. [SmbjClientProviderService](/user-guide/data-integration/openflow/controllers/smbjclientproviderservice) Provides access to SMB Sessions with shared authentication credentials. %logo-snowflake-blue% [SnowflakeConnectionService](/user-guide/data-integration/openflow/controllers/snowflakeconnectionservice) Provides pooled database connections to Snowflake services %logo-snowflake-blue% [SnowflakeDatabaseDialectService](/user-guide/data-integration/openflow/controllers/snowflakedatabasedialectservice) Database Dialect Service supporting Snowflake. %logo-snowflake-blue% [SnowflakeSignJWTService](/user-guide/data-integration/openflow/controllers/snowflakesignjwtservice) Provides OAuth2 access token using a JWT signed with a secret stored in Snowflake. %logo-snowflake-blue% [SnowflakeTableSchemaRegistry](/user-guide/data-integration/openflow/controllers/snowflaketableschemaregistry) Uses Snowflake tables as the source of schema — utilises Snowpipe Streaming REST API. %logo-snowflake-blue% [StandardAnthropicLLMService](/user-guide/data-integration/openflow/controllers/standardanthropicllmservice) A Controller Service that provides integration with Anthropic's Claude AI models through their Messages API. %logo-snowflake-blue% [StandardAtlassianRequestRateManager](/user-guide/data-integration/openflow/controllers/standardatlassianrequestratemanager) Provides rate limiting coordination for Atlassian API calls across processors to prevent cascading rate limit issues. [StandardAzureCredentialsControllerService](/user-guide/data-integration/openflow/controllers/standardazurecredentialscontrollerservice) Provide credentials to use with an Azure client. %logo-snowflake-blue% [StandardConfluenceClientService](/user-guide/data-integration/openflow/controllers/standardconfluenceclientservice) Provides connection service to Confluence APIs %logo-snowflake-blue% [StandardDatabricksWorkspaceClientService](/user-guide/data-integration/openflow/controllers/standarddatabricksworkspaceclientservice) Databricks client. [StandardDropboxCredentialService](/user-guide/data-integration/openflow/controllers/standarddropboxcredentialservice) Defines credentials for Dropbox processors. [StandardFileResourceService](/user-guide/data-integration/openflow/controllers/standardfileresourceservice) Provides a file resource for other components. [StandardHashiCorpVaultClientService](/user-guide/data-integration/openflow/controllers/standardhashicorpvaultclientservice) A controller service for interacting with HashiCorp Vault. [StandardHttpContextMap](/user-guide/data-integration/openflow/controllers/standardhttpcontextmap) Provides the ability to store and retrieve HTTP requests and responses external to a Processor, so that multiple Processors can interact with the same HTTP request. %logo-snowflake-blue% [StandardHubSpotClientService](/user-guide/data-integration/openflow/controllers/standardhubspotclientservice) HubSpot Controller Service to integrate with HubSpot HTTP api. [StandardJsonSchemaRegistry](/user-guide/data-integration/openflow/controllers/standardjsonschemaregistry) Provides a service for registering and accessing JSON schemas. [StandardKustoIngestService](/user-guide/data-integration/openflow/controllers/standardkustoingestservice) Sends batches of flowfile content or stream flowfile content to an Azure ADX cluster. [StandardKustoQueryService](/user-guide/data-integration/openflow/controllers/standardkustoqueryservice) Standard implementation of Kusto Query Service for Azure Data Explorer %logo-snowflake-blue% [StandardMilvusConnectionService](/user-guide/data-integration/openflow/controllers/standardmilvusconnectionservice) Provides connection service to a Milvus instance [StandardOauth2AccessTokenProvider](/user-guide/data-integration/openflow/controllers/standardoauth2accesstokenprovider) Provides OAuth 2. %logo-snowflake-blue% [StandardOCRService](/user-guide/data-integration/openflow/controllers/standardocrservice) Provides integration to Openflow OCR Service %logo-snowflake-blue% [StandardOpenAILLMService](/user-guide/data-integration/openflow/controllers/standardopenaillmservice) A Controller Service that provides integration with OpenAI's Chat Completion API. [StandardPGPPrivateKeyService](/user-guide/data-integration/openflow/controllers/standardpgpprivatekeyservice) PGP Private Key Service provides Private Keys loaded from files or properties [StandardPGPPublicKeyService](/user-guide/data-integration/openflow/controllers/standardpgppublickeyservice) PGP Public Key Service providing Public Keys loaded from files [StandardPrivateKeyService](/user-guide/data-integration/openflow/controllers/standardprivatekeyservice) Private Key Service provides access to a Private Key loaded from configured sources [StandardProtobufReader](/user-guide/data-integration/openflow/controllers/standardprotobufreader) Parses Protocol Buffers messages from binary format into NiFi Records. [StandardProxyConfigurationService](/user-guide/data-integration/openflow/controllers/standardproxyconfigurationservice) Provides a set of configurations for different NiFi components to use a proxy server. [StandardRestrictedSSLContextService](/user-guide/data-integration/openflow/controllers/standardrestrictedsslcontextservice) Restricted implementation of the SSLContextService. [StandardS3EncryptionService](/user-guide/data-integration/openflow/controllers/standards3encryptionservice) Adds configurable encryption to S3 Put and S3 Fetch operations. %logo-snowflake-blue% [StandardSalesforceBulkJobsStateService](/user-guide/data-integration/openflow/controllers/standardsalesforcebulkjobsstateservice) Stores Salesforce Bulk Jobs state per object type at cluster scope %logo-snowflake-blue% [StandardSalesforceClientService](/user-guide/data-integration/openflow/controllers/standardsalesforceclientservice) Provides connection service to Salesforce APIs %logo-snowflake-blue% [StandardSalesforceDataCloudClientService](/user-guide/data-integration/openflow/controllers/standardsalesforcedatacloudclientservice) Provides connection service to Salesforce Data Cloud APIs %logo-snowflake-blue% [StandardSlackRateLimiterService](/user-guide/data-integration/openflow/controllers/standardslackratelimiterservice) Provides rate limiting coordination for Slack API calls across processors to prevent cascading rate limit issues [StandardSSLContextService](/user-guide/data-integration/openflow/controllers/standardsslcontextservice) Standard implementation of the SSLContextService. %logo-snowflake-blue% [StandardTableStateService](/user-guide/data-integration/openflow/controllers/standardtablestateservice) A controller Service that provides and manages table state. %logo-snowflake-blue% [StandardVectaraClientService](/user-guide/data-integration/openflow/controllers/standardvectaraclientservice) Vectara Controller Service to integrate with Vectara HTTP Api. [StandardWebClientServiceProvider](/user-guide/data-integration/openflow/controllers/standardwebclientserviceprovider) Web Client Service Provider with support for configuring standard HTTP connection properties %logo-snowflake-blue% [StateManagedCdcSchemaRegistry](/user-guide/data-integration/openflow/controllers/statemanagedcdcschemaregistry) Uses the in-built NiFi State Management to store the hashes of table schemas. [Syslog5424Reader](/user-guide/data-integration/openflow/controllers/syslog5424reader) Provides a mechanism for reading RFC 5424 compliant Syslog data, such as log files, and structuring the data so that it can be processed. [SyslogReader](/user-guide/data-integration/openflow/controllers/syslogreader) Attempts to parses the contents of a Syslog message in accordance to RFC5424 and RFC3164.
## U
Controller Description [UDPEventRecordSink](/user-guide/data-integration/openflow/controllers/udpeventrecordsink) Format and send Records as UDP Datagram Packets to a configurable destination
## V
Controller Description [VolatileSchemaCache](/user-guide/data-integration/openflow/controllers/volatileschemacache) Provides a Schema Cache that evicts elements based on a Least-Recently-Used algorithm.
## W
Controller Description [WindowsEventLogReader](/user-guide/data-integration/openflow/controllers/windowseventlogreader) Reads Windows Event Log data as XML content having been generated by ConsumeWindowsEventLog, ParseEvtx, etc.
## X
Controller Description [XMLFileLookupService](/user-guide/data-integration/openflow/controllers/xmlfilelookupservice) A reloadable XML file-based lookup service. [XMLReader](/user-guide/data-integration/openflow/controllers/xmlreader) Reads XML content and creates Record objects. [XMLRecordSetWriter](/user-guide/data-integration/openflow/controllers/xmlrecordsetwriter) Writes a RecordSet to XML.
## Y
Controller Description [YamlTreeReader](/user-guide/data-integration/openflow/controllers/yamltreereader) Parses YAML into individual Record objects.
--- title: All processors (alphabetical) source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/index.md section: Loading & Unloading Data --- # All processors (alphabetical) - [](/user-guide/data-integration/openflow/about) This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). This topic provides a list of all Snowflake openflow processors in alphabetical order. The list includes:
- The name of each processor - A summary of each processor
## A
Processor Description %logo-snowflake-blue% [AbortQueryJob](/user-guide/data-integration/openflow/processors/abortqueryjob) Aborts a Query Job in Salesforce using the Bulk API 2. [AttributesToCSV](/user-guide/data-integration/openflow/processors/attributestocsv) Generates a CSV representation of the input FlowFile Attributes. [AttributesToJSON](/user-guide/data-integration/openflow/processors/attributestojson) Generates a JSON representation of the input FlowFile Attributes.
## C
Processor Description [CalculateRecordStats](/user-guide/data-integration/openflow/processors/calculaterecordstats) Counts the number of Records in a record set, optionally counting the number of elements per category, where the categories are defined by user-defined properties. %logo-snowflake-blue% [CaptureChangeMySQL](/user-guide/data-integration/openflow/processors/capturechangemysql) Reads CDC events from a MySQL database. %logo-snowflake-blue% [CaptureChangePostgreSQL](/user-guide/data-integration/openflow/processors/capturechangepostgresql) Reads CDC events from a PostgreSQL database. %logo-snowflake-blue% [CaptureChangeSqlServer](/user-guide/data-integration/openflow/processors/capturechangesqlserver) Reads CDC events from a SQL Server database. %logo-snowflake-blue% [CaptureGoogleDriveChanges](/user-guide/data-integration/openflow/processors/capturegoogledrivechanges) Captures changes to a Shared Google Drive and emits a FlowFile for each change that occurs. %logo-snowflake-blue% [CaptureMicrosoft365GroupsChanges](/user-guide/data-integration/openflow/processors/capturemicrosoft365groupschanges) Captures Microsoft365 groups changes and emits a FlowFile for each change that occurs. %logo-snowflake-blue% [CaptureSharepointChanges](/user-guide/data-integration/openflow/processors/capturesharepointchanges) Captures changes from a Sharepoint Document Library and emits a FlowFile for each change that occurs. %logo-snowflake-blue% [CheckMetaAdsReportReadiness](/user-guide/data-integration/openflow/processors/checkmetaadsreportreadiness) Processor checking if the Meta Ads report is ready for download. %logo-snowflake-blue% [ChunkRecordText](/user-guide/data-integration/openflow/processors/chunkrecordtext) Chunks text with options for recursively splitting by delimiters and max character length. %logo-snowflake-blue% [ChunkText](/user-guide/data-integration/openflow/processors/chunktext) Chunks text with options for recursively splitting by delimiters and max character length. [CompressContent](/user-guide/data-integration/openflow/processors/compresscontent) Compresses or decompresses the contents of FlowFiles using a user-specified compression algorithm and updates the mime. [ConnectWebSocket](/user-guide/data-integration/openflow/processors/connectwebsocket) Acts as a WebSocket client endpoint to interact with a remote WebSocket server. [ConsumeAMQP](/user-guide/data-integration/openflow/processors/consumeamqp) Consumes AMQP Messages from an AMQP Broker using the AMQP 0. [ConsumeAzureEventHub](/user-guide/data-integration/openflow/processors/consumeazureeventhub) Receives messages from Microsoft Azure Event Hubs with checkpointing to ensure consistent event processing. [ConsumeBoxEnterpriseEvents](/user-guide/data-integration/openflow/processors/consumeboxenterpriseevents) Consumes Enterprise Events from Box admin_logs_streaming Stream Type. [ConsumeBoxEvents](/user-guide/data-integration/openflow/processors/consumeboxevents) Consumes all events from Box. [ConsumeElasticsearch](/user-guide/data-integration/openflow/processors/consumeelasticsearch) A processor that repeatedly runs a paginated query against a field using a Range query to consume new Documents from an Elasticsearch index/query. [ConsumeGCPubSub](/user-guide/data-integration/openflow/processors/consumegcpubsub) Consumes messages from the configured Google Cloud PubSub subscription. [ConsumeIMAP](/user-guide/data-integration/openflow/processors/consumeimap) Consumes messages from Email Server using IMAP protocol. [ConsumeJMS](/user-guide/data-integration/openflow/processors/consumejms) Consumes JMS Message of type BytesMessage, TextMessage, ObjectMessage, MapMessage or StreamMessage transforming its content to a FlowFile and transitioning it to 'success' relationship. [ConsumeKafka](/user-guide/data-integration/openflow/processors/consumekafka) Consumes messages from Apache Kafka Consumer API. %logo-snowflake-blue% [ConsumeKafka](/user-guide/data-integration/openflow/processors/consumekafka) Consumes messages from Apache Kafka Consumer API. [ConsumeKinesisStream](/user-guide/data-integration/openflow/processors/consumekinesisstream) Reads data from the specified AWS Kinesis stream and outputs a FlowFile for every processed Record (raw) or a FlowFile for a batch of processed records if a Record Reader and Record Writer are configured. [ConsumeMQTT](/user-guide/data-integration/openflow/processors/consumemqtt) Subscribes to a topic and receives messages from an MQTT broker [ConsumePOP3](/user-guide/data-integration/openflow/processors/consumepop3) Consumes messages from Email Server using POP3 protocol. [ConsumeSlack](/user-guide/data-integration/openflow/processors/consumeslack) Retrieves messages from one or more configured Slack channels. %logo-snowflake-blue% [ConsumeSlackConversation](/user-guide/data-integration/openflow/processors/consumeslackconversation) Retrieves messages from Slack conversations available to the App. %logo-snowflake-blue% [ConsumeSlackHistory](/user-guide/data-integration/openflow/processors/consumeslackhistory) Fetches historical messages from all Slack channels available to the App. %logo-snowflake-blue% [ConsumeSnowflakeStream](/user-guide/data-integration/openflow/processors/consumesnowflakestream) Fetches data from a Snowflake stream and writes it to a FlowFile. [ConsumeTwitter](/user-guide/data-integration/openflow/processors/consumetwitter) Streams tweets from Twitter's streaming API v2. [ControlRate](/user-guide/data-integration/openflow/processors/controlrate) Controls the rate at which data is transferred to follow-on processors. [ConvertCharacterSet](/user-guide/data-integration/openflow/processors/convertcharacterset) Converts a FlowFile's content from one character set to another [ConvertRecord](/user-guide/data-integration/openflow/processors/convertrecord) Converts records from one data format to another using configured Record Reader and Record Write Controller Services. %logo-snowflake-blue% [ConvertToJournalSchema](/user-guide/data-integration/openflow/processors/converttojournalschema) Converts the incoming database schema into the appropriate schema for a Snowflake CDC Journal table. [CopyAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/copyazureblobstorage_v12) Copies a blob in Azure Blob Storage from one account/container to another. [CopyS3Object](/user-guide/data-integration/openflow/processors/copys3object) Copies a file from one bucket and key to another in AWS S3 [CountText](/user-guide/data-integration/openflow/processors/counttext) Counts various metrics on incoming text. %logo-snowflake-blue% [CreateAmazonAdsReport](/user-guide/data-integration/openflow/processors/createamazonadsreport) Processor which creates report configuration for Amazon Ads connector. %logo-snowflake-blue% [CreateAzureOpenAiEmbeddings](/user-guide/data-integration/openflow/processors/createazureopenaiembeddings) Uses Azure OpenAI to create embeddings for text. [CreateBoxFileMetadataInstance](/user-guide/data-integration/openflow/processors/createboxfilemetadatainstance) Creates a metadata instance for a Box file using a specified template with values from the flowFile content. [CreateBoxMetadataTemplate](/user-guide/data-integration/openflow/processors/createboxmetadatatemplate) Creates a Box metadata template using field specifications from the flowFile content. %logo-snowflake-blue% [CreateCohereEmbeddings](/user-guide/data-integration/openflow/processors/createcohereembeddings) Uses Cohere to create embeddings for text. %logo-snowflake-blue% [CreateMetaAdsReport](/user-guide/data-integration/openflow/processors/createmetaadsreport) Processor which creates report configuration for Meta Ads connector. %logo-snowflake-blue% [CreateOpenAiEmbeddings](/user-guide/data-integration/openflow/processors/createopenaiembeddings) Uses OpenAI to create embeddings for text. %logo-snowflake-blue% [CreateSnowflakeEmbeddings](/user-guide/data-integration/openflow/processors/createsnowflakeembeddings) Create vector embeddings using Snowflake Cortex Large Language Model functions %logo-snowflake-blue% [CreateVertexAIEmbeddings](/user-guide/data-integration/openflow/processors/createvertexaiembeddings) Uses VertexAI to create embeddings for text. [CryptographicHashContent](/user-guide/data-integration/openflow/processors/cryptographichashcontent) Calculates a cryptographic hash value for the flowfile content using the given algorithm and writes it to an output attribute.
## D
Processor Description [DebugFlow](/user-guide/data-integration/openflow/processors/debugflow) The DebugFlow processor aids testing and debugging the FlowFile framework by allowing various responses to be explicitly triggered in response to the receipt of a FlowFile or a timer event without a FlowFile if using timer or cron based scheduling. [DecryptContentAge](/user-guide/data-integration/openflow/processors/decryptcontentage) Decrypt content using the age-encryption. [DecryptContentPGP](/user-guide/data-integration/openflow/processors/decryptcontentpgp) Decrypt contents of OpenPGP messages. [DeduplicateRecord](/user-guide/data-integration/openflow/processors/deduplicaterecord) This processor de-duplicates individual records within a record set. [DeleteAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/deleteazureblobstorage_v12) Deletes the specified blob from Azure Blob Storage. [DeleteAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/deleteazuredatalakestorage) Deletes the provided file from Azure Data Lake Storage [DeleteBoxFileMetadataInstance](/user-guide/data-integration/openflow/processors/deleteboxfilemetadatainstance) Deletes a metadata instance from a Box file using the specified template key [DeleteByQueryElasticsearch](/user-guide/data-integration/openflow/processors/deletebyqueryelasticsearch) Delete from an Elasticsearch index using a query. %logo-snowflake-blue% [DeleteDBFSResource](/user-guide/data-integration/openflow/processors/deletedbfsresource) Delete a DBFS files and directories. [DeleteDynamoDB](/user-guide/data-integration/openflow/processors/deletedynamodb) Deletes a document from DynamoDB based on hash and range key. [DeleteFile](/user-guide/data-integration/openflow/processors/deletefile) Deletes a file from the filesystem. [DeleteGCSObject](/user-guide/data-integration/openflow/processors/deletegcsobject) Deletes objects from a Google Cloud Bucket. [DeleteGridFS](/user-guide/data-integration/openflow/processors/deletegridfs) Deletes a file from GridFS using a file name or a query. %logo-snowflake-blue% [DeleteMilvus](/user-guide/data-integration/openflow/processors/deletemilvus) Deletes vectors from Milvus database from a collection by ID. [DeleteMongo](/user-guide/data-integration/openflow/processors/deletemongo) Executes a delete query against a MongoDB collection. %logo-snowflake-blue% [DeletePinecone](/user-guide/data-integration/openflow/processors/deletepinecone) Deletes vectors from a Pinecone index. %logo-snowflake-blue% [DeleteQueryJob](/user-guide/data-integration/openflow/processors/deletequeryjob) Deletes a Query Job in Salesforce using the Bulk API 2. [DeleteS3Object](/user-guide/data-integration/openflow/processors/deletes3object) Deletes a file from an Amazon S3 Bucket. [DeleteSFTP](/user-guide/data-integration/openflow/processors/deletesftp) Deletes a file residing on an SFTP server. [DeleteSQS](/user-guide/data-integration/openflow/processors/deletesqs) Deletes a message from an Amazon Simple Queuing Service Queue %logo-snowflake-blue% [DeleteUnityCatalogResource](/user-guide/data-integration/openflow/processors/deleteunitycatalogresource) Delete a Unity Catalog file or directory. %logo-snowflake-blue% [DescribeDataShare](/user-guide/data-integration/openflow/processors/describedatashare) Describe the specified data share metadata in Salesforce Data Cloud. %logo-snowflake-blue% [DescribeSFDCObject](/user-guide/data-integration/openflow/processors/describesfdcobject) Describe the specified object metadata in Salesforce. [DetectDuplicate](/user-guide/data-integration/openflow/processors/detectduplicate) Caches a value, computed from FlowFile attributes, for each incoming FlowFile and determines if the cached value has already been seen. [DistributeLoad](/user-guide/data-integration/openflow/processors/distributeload) Distributes FlowFiles to downstream processors based on a Distribution Strategy. [DuplicateFlowFile](/user-guide/data-integration/openflow/processors/duplicateflowfile) Intended for load testing, this processor will create the configured number of copies of each incoming FlowFile.
## E
Processor Description [EncodeContent](/user-guide/data-integration/openflow/processors/encodecontent) Encode or decode the contents of a FlowFile using Base64, Base32, or hex encoding schemes [EncryptContentAge](/user-guide/data-integration/openflow/processors/encryptcontentage) Encrypt content using the age-encryption. [EncryptContentPGP](/user-guide/data-integration/openflow/processors/encryptcontentpgp) Encrypt contents using OpenPGP. [EnforceOrder](/user-guide/data-integration/openflow/processors/enforceorder) Enforces expected ordering of FlowFiles that belong to the same data group within a single node. %logo-snowflake-blue% [EnrichAttributes](/user-guide/data-integration/openflow/processors/enrichattributes) Looks up a value using the configured Lookup Service and adds the results to the FlowFile as one or more attributes. %logo-snowflake-blue% [EnrichCdcStream](/user-guide/data-integration/openflow/processors/enrichcdcstream) Enriches incoming FlowFiles that come from CaptureChangePostgreSQL, etc. [EvaluateJsonPath](/user-guide/data-integration/openflow/processors/evaluatejsonpath) Evaluates one or more JsonPath expressions against the content of a FlowFile. %logo-snowflake-blue% [EvaluateRagAnswerCorrectness](/user-guide/data-integration/openflow/processors/evaluateraganswercorrectness) Evaluates the correctness of generated answers in a Retrieval-Augmented Generation (RAG) context by computing metrics such as F1 score, cosine similarity, and answer correctness. %logo-snowflake-blue% [EvaluateRagFaithfulness](/user-guide/data-integration/openflow/processors/evaluateragfaithfulness) Evaluates the faithfulness of generated answers in a Retrieval-Augmented Generation (RAG) system by analyzing responses using an LLM (e. %logo-snowflake-blue% [EvaluateRagRetrieval](/user-guide/data-integration/openflow/processors/evaluateragretrieval) Calculates retrieval metrics (Precision@N, Recall@N, FScore@N, MAP@N, MRR) for a RAG system using an LLM as a judge. [EvaluateXPath](/user-guide/data-integration/openflow/processors/evaluatexpath) Evaluates one or more XPaths against the content of a FlowFile. [EvaluateXQuery](/user-guide/data-integration/openflow/processors/evaluatexquery) Evaluates one or more XQueries against the content of a FlowFile. [ExecuteGroovyScript](/user-guide/data-integration/openflow/processors/executegroovyscript) Experimental Extended Groovy script processor. [ExecuteProcess](/user-guide/data-integration/openflow/processors/executeprocess) Runs an operating system command specified by the user and writes the output of that command to a FlowFile. [ExecuteScript](/user-guide/data-integration/openflow/processors/executescript) Experimental - Executes a script given the flow file and a process session. [ExecuteSQL](/user-guide/data-integration/openflow/processors/executesql) Executes provided SQL select query. [ExecuteSQLRecord](/user-guide/data-integration/openflow/processors/executesqlrecord) Executes provided SQL select query. %logo-snowflake-blue% [ExecuteSQLStatement](/user-guide/data-integration/openflow/processors/executesqlstatement) Executes a SQL DDL or DML Statement against a database. [ExecuteStreamCommand](/user-guide/data-integration/openflow/processors/executestreamcommand) The ExecuteStreamCommand processor provides a flexible way to integrate external commands and scripts into NiFi data flows. [ExtractAvroMetadata](/user-guide/data-integration/openflow/processors/extractavrometadata) Extracts metadata from the header of an Avro datafile. [ExtractEmailAttachments](/user-guide/data-integration/openflow/processors/extractemailattachments) Extract attachments from a mime formatted email file, splitting them into individual flowfiles. [ExtractEmailHeaders](/user-guide/data-integration/openflow/processors/extractemailheaders) Using the flowfile content as source of data, extract header from an RFC compliant email file adding the relevant attributes to the flowfile. [ExtractGrok](/user-guide/data-integration/openflow/processors/extractgrok) Evaluates one or more Grok Expressions against the content of a FlowFile, adding the results as attributes or replacing the content of the FlowFile with a JSON notation of the matched content [ExtractRecordSchema](/user-guide/data-integration/openflow/processors/extractrecordschema) Extracts the record schema from the FlowFile using the supplied Record Reader and writes it to the 'avro. %logo-snowflake-blue% [ExtractSchemaColumns](/user-guide/data-integration/openflow/processors/extractschemacolumns) Extracts the record schema columns from the FlowFile using the supplied Record Reader and writes it to the 'schema. [ExtractStructuredBoxFileMetadata](/user-guide/data-integration/openflow/processors/extractstructuredboxfilemetadata) Extracts metadata from a Box file using Box AI. [ExtractText](/user-guide/data-integration/openflow/processors/extracttext) Evaluates one or more Regular Expressions against the content of a FlowFile.
## F
Processor Description [FetchAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/fetchazureblobstorage_v12) Retrieves the specified blob from Azure Blob Storage and writes its content to the content of the FlowFile. [FetchAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/fetchazuredatalakestorage) Fetch the specified file from Azure Data Lake Storage [FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) Fetches files from a Box Folder. [FetchBoxFileInfo](/user-guide/data-integration/openflow/processors/fetchboxfileinfo) Fetches metadata for files from Box and adds it to the FlowFile's attributes. [FetchBoxFileMetadataInstance](/user-guide/data-integration/openflow/processors/fetchboxfilemetadatainstance) Retrieves specific metadata instance associated with a Box file using template key and scope. [FetchBoxFileRepresentation](/user-guide/data-integration/openflow/processors/fetchboxfilerepresentation) Fetches a Box file representation using a representation hint and writes it to the FlowFile content. [FetchDistributedMapCache](/user-guide/data-integration/openflow/processors/fetchdistributedmapcache) Computes cache key(s) from FlowFile attributes, for each incoming FlowFile, and fetches the value(s) from the Distributed Map Cache associated with each key. [FetchDropbox](/user-guide/data-integration/openflow/processors/fetchdropbox) Fetches files from Dropbox. [FetchFile](/user-guide/data-integration/openflow/processors/fetchfile) Reads the contents of a file from disk and streams it into the contents of an incoming FlowFile. [FetchFTP](/user-guide/data-integration/openflow/processors/fetchftp) Fetches the content of a file from a remote FTP server and overwrites the contents of an incoming FlowFile with the content of the remote file. [FetchGCSObject](/user-guide/data-integration/openflow/processors/fetchgcsobject) Fetches a file from a Google Cloud Bucket. [FetchGoogleDrive](/user-guide/data-integration/openflow/processors/fetchgoogledrive) Fetches files from a Google Drive Folder. %logo-snowflake-blue% [FetchGoogleDriveFileComments](/user-guide/data-integration/openflow/processors/fetchgoogledrivefilecomments) Fetches comments and their replies for a Google Drive file. %logo-snowflake-blue% [FetchGoogleDriveMetadata](/user-guide/data-integration/openflow/processors/fetchgoogledrivemetadata) Fetches Google Drive file metadata. [FetchGridFS](/user-guide/data-integration/openflow/processors/fetchgridfs) Retrieves one or more files from a GridFS bucket by file name or by a user-defined query. %logo-snowflake-blue% [FetchJiraFields](/user-guide/data-integration/openflow/processors/fetchjirafields) Retrieves comprehensive metadata for all fields available in the Jira Cloud instance using the REST API v3 /field endpoint. %logo-snowflake-blue% [FetchJiraIssues](/user-guide/data-integration/openflow/processors/fetchjiraissues) Fetches issues from Jira Cloud using REST API v3 with configurable search options. %logo-snowflake-blue% [FetchMicrosoftDataverseTable](/user-guide/data-integration/openflow/processors/fetchmicrosoftdataversetable) Fetch records from Microsoft Dataverse Tables [FetchS3Object](/user-guide/data-integration/openflow/processors/fetchs3object) Retrieves the contents of an S3 Object and writes it to the content of a FlowFile [FetchSFTP](/user-guide/data-integration/openflow/processors/fetchsftp) Fetches the content of a file from a remote SFTP server and overwrites the contents of an incoming FlowFile with the content of the remote file. %logo-snowflake-blue% [FetchSharepointFile](/user-guide/data-integration/openflow/processors/fetchsharepointfile) Fetches the contents of a file from a Sharepoint Drive, optionally downloading a PDF or HTML version of the file when applicable. %logo-snowflake-blue% [FetchSharepointMetadata](/user-guide/data-integration/openflow/processors/fetchsharepointmetadata) For each drive item retrieves its metadata and permissions and writes them as FlowFile attributes. %logo-snowflake-blue% [FetchSlackConversationInfo](/user-guide/data-integration/openflow/processors/fetchslackconversationinfo) Fetches Slack conversation info and member emails %logo-snowflake-blue% [FetchSlackFile](/user-guide/data-integration/openflow/processors/fetchslackfile) Downloads a file shared on Slack. %logo-snowflake-blue% [FetchSlackMessage](/user-guide/data-integration/openflow/processors/fetchslackmessage) Fetches data about a single Slack message [FetchSmb](/user-guide/data-integration/openflow/processors/fetchsmb) Fetches files from a SMB Share. %logo-snowflake-blue% [FetchSnowflakeTableProperties](/user-guide/data-integration/openflow/processors/fetchsnowflaketableproperties) Reads properties from a table and stores them as flow file attributes. %logo-snowflake-blue% [FetchSourceTableSchema](/user-guide/data-integration/openflow/processors/fetchsourcetableschema) Fetches the table schema (i. %logo-snowflake-blue% [FetchTableSnapshot](/user-guide/data-integration/openflow/processors/fetchtablesnapshot) Fetches a snapshot of a table from a database. [FilterAttribute](/user-guide/data-integration/openflow/processors/filterattribute) Filters the attributes of a FlowFile by retaining specified attributes and removing the rest or by removing specified attributes and retaining the rest. %logo-snowflake-blue% [FindConfluencePages](/user-guide/data-integration/openflow/processors/findconfluencepages) Processor for finding Confluence pages using space name and page name. %logo-snowflake-blue% [FindSharepointDriveItem](/user-guide/data-integration/openflow/processors/findsharepointdriveitem) Finds a Sharepoint Drive Item by its Drive ID and Item path. [FlattenJson](/user-guide/data-integration/openflow/processors/flattenjson) Provides the user with the ability to take a nested JSON document and flatten it into a simple key/value pair document. [ForkEnrichment](/user-guide/data-integration/openflow/processors/forkenrichment) Used in conjunction with the JoinEnrichment processor, this processor is responsible for adding the attributes that are necessary for the JoinEnrichment processor to perform its function. [ForkRecord](/user-guide/data-integration/openflow/processors/forkrecord) This processor allows the user to fork a record into multiple records.
## G
Processor Description %logo-snowflake-blue% [GenerateAnswersFromContext](/user-guide/data-integration/openflow/processors/generateanswersfromcontext) Generates synthetic answers for each question present in the incoming records using a Large Language Model (LLM). %logo-snowflake-blue% [GenerateAnswersFromGroundTruth](/user-guide/data-integration/openflow/processors/generateanswersfromgroundtruth) Generates synthetic answers for each question in the incoming records using an LLM. [GenerateFlowFile](/user-guide/data-integration/openflow/processors/generateflowfile) This processor creates FlowFiles with random data or custom content. %logo-snowflake-blue% [GenerateJSON](/user-guide/data-integration/openflow/processors/generatejson) Produces a batch of JSON Objects with random field values based on a configurable JSON Schema. [GenerateRecord](/user-guide/data-integration/openflow/processors/generaterecord) This processor creates FlowFiles with records having random value for the specified fields. [GenerateTableFetch](/user-guide/data-integration/openflow/processors/generatetablefetch) Generates SQL select queries that fetch "pages" of rows from a table. [GeoEnrichIP](/user-guide/data-integration/openflow/processors/geoenrichip) Looks up geolocation information for an IP address and adds the geo information to FlowFile attributes. [GeoEnrichIPRecord](/user-guide/data-integration/openflow/processors/geoenrichiprecord) Looks up geolocation information for an IP address and adds the geo information to FlowFile attributes. %logo-snowflake-blue% [GetAmazonAdsReport](/user-guide/data-integration/openflow/processors/getamazonadsreport) Processor downloading report from Amazon Ads if ready. [GetAwsPollyJobStatus](/user-guide/data-integration/openflow/processors/getawspollyjobstatus) Retrieves the current status of an AWS Polly job. [GetAwsTextractJobStatus](/user-guide/data-integration/openflow/processors/getawstextractjobstatus) Retrieves the current status of an AWS Textract job. [GetAwsTranscribeJobStatus](/user-guide/data-integration/openflow/processors/getawstranscribejobstatus) Retrieves the current status of an AWS Transcribe job. [GetAwsTranslateJobStatus](/user-guide/data-integration/openflow/processors/getawstranslatejobstatus) Retrieves the current status of an AWS Translate job. [GetAzureEventHub](/user-guide/data-integration/openflow/processors/getazureeventhub) Receives messages from Microsoft Azure Event Hubs without reliable checkpoint tracking. [GetAzureQueueStorage_v12](/user-guide/data-integration/openflow/processors/getazurequeuestorage_v12) Retrieves the messages from an Azure Queue Storage. [GetBoxFileCollaborators](/user-guide/data-integration/openflow/processors/getboxfilecollaborators) Retrieves all collaborators on a Box file and adds the collaboration information to the FlowFile's attributes. [GetBoxGroupMembers](/user-guide/data-integration/openflow/processors/getboxgroupmembers) Retrieves members for a Box Group and writes their details in FlowFile attributes. %logo-snowflake-blue% [GetConfluenceAuditRecords](/user-guide/data-integration/openflow/processors/getconfluenceauditrecords) Processor listing Confluence audit records. %logo-snowflake-blue% [GetConfluenceGroupUsers](/user-guide/data-integration/openflow/processors/getconfluencegroupusers) Processor that downloads information about users belonging to a given Confluence group %logo-snowflake-blue% [GetConfluencePageContent](/user-guide/data-integration/openflow/processors/getconfluencepagecontent) Processor downloading Confluence pages. %logo-snowflake-blue% [GetConfluencePageIds](/user-guide/data-integration/openflow/processors/getconfluencepageids) Downloads changed Confluence pages since the last sync and emits each as a FlowFile with metadata. %logo-snowflake-blue% [GetConfluencePagePermissions](/user-guide/data-integration/openflow/processors/getconfluencepagepermissions) Processor downloading Confluence page permissions. %logo-snowflake-blue% [GetConfluenceSpaceIds](/user-guide/data-integration/openflow/processors/getconfluencespaceids) Processor for retrieving Confluence space ids. %logo-snowflake-blue% [GetConfluenceSpacePermissions](/user-guide/data-integration/openflow/processors/getconfluencespacepermissions) Processor downloading Confluence space permissions. %logo-snowflake-blue% [GetDataShareCredentials](/user-guide/data-integration/openflow/processors/getdatasharecredentials) Describe the specified data share metadata in Salesforce Data Cloud. %logo-snowflake-blue% [GetDataShareTables](/user-guide/data-integration/openflow/processors/getdatasharetables) Describe the specified data share metadata in Salesforce Data Cloud. %logo-snowflake-blue% [GetDBFSFile](/user-guide/data-integration/openflow/processors/getdbfsfile) Read a DBFS file. [GetDynamoDB](/user-guide/data-integration/openflow/processors/getdynamodb) Retrieves a document from DynamoDB based on hash and range key. [GetElasticsearch](/user-guide/data-integration/openflow/processors/getelasticsearch) Elasticsearch get processor that uses the official Elastic REST client libraries to fetch a single document from Elasticsearch by _id. [GetFile](/user-guide/data-integration/openflow/processors/getfile) Creates FlowFiles from files in a directory. [GetFileResource](/user-guide/data-integration/openflow/processors/getfileresource) This processor creates FlowFiles with the content of the configured File Resource. [GetFTP](/user-guide/data-integration/openflow/processors/getftp) Fetches files from an FTP Server and creates FlowFiles from them [GetGcpVisionAnnotateFilesOperationStatus](/user-guide/data-integration/openflow/processors/getgcpvisionannotatefilesoperationstatus) Retrieves the current status of an Google Vision operation. [GetGcpVisionAnnotateImagesOperationStatus](/user-guide/data-integration/openflow/processors/getgcpvisionannotateimagesoperationstatus) Retrieves the current status of an Google Vision operation. %logo-snowflake-blue% [GetGoogleAdsReport](/user-guide/data-integration/openflow/processors/getgoogleadsreport) A processor which can interact with Google Ads Reporting API. %logo-snowflake-blue% [GetGoogleGroupMembers](/user-guide/data-integration/openflow/processors/getgooglegroupmembers) Retrieves the members of one or more Google Groups, specified as a comma-separated list of group IDs that is given as a FlowFile attribute. %logo-snowflake-blue% [GetGoogleSheets](/user-guide/data-integration/openflow/processors/getgooglesheets) Processor responsible for fetching data from Google Sheets. [GetHubSpot](/user-guide/data-integration/openflow/processors/gethubspot) Retrieves JSON data from a private HubSpot application. %logo-snowflake-blue% [GetHubSpotObject](/user-guide/data-integration/openflow/processors/gethubspotobject) Get a HubSpot object and its associations by ID or unique value. %logo-snowflake-blue% [GetHubSpotSchema](/user-guide/data-integration/openflow/processors/gethubspotschema) Retrieves schema information for HubSpot object types including field names, types, and labels. %logo-snowflake-blue% [GetLinkedInAdsReport](/user-guide/data-integration/openflow/processors/getlinkedinadsreport) Processor downloading metrics from the LinkedIn Reporting APIs. %logo-snowflake-blue% [GetMicrosoft365GroupMembers](/user-guide/data-integration/openflow/processors/getmicrosoft365groupmembers) Retrieves Microsoft365 group members and emits a FlowFile for each change that occurs. [GetMongo](/user-guide/data-integration/openflow/processors/getmongo) Creates FlowFiles from documents in MongoDB loaded by a user-specified query. [GetMongoRecord](/user-guide/data-integration/openflow/processors/getmongorecord) A record-based version of GetMongo that uses the Record writers to write the MongoDB result set. %logo-snowflake-blue% [GetQueryJobResult](/user-guide/data-integration/openflow/processors/getqueryjobresult) Gets the results of a Query Job in Salesforce using the Bulk API 2. %logo-snowflake-blue% [GetQueryJobStatus](/user-guide/data-integration/openflow/processors/getqueryjobstatus) Gets the status of a Query Job in Salesforce using the Bulk API 2. [GetS3ObjectMetadata](/user-guide/data-integration/openflow/processors/gets3objectmetadata) Check for the existence of an Object in S3 and fetch its Metadata without attempting to download it. [GetS3ObjectTags](/user-guide/data-integration/openflow/processors/gets3objecttags) Check for the existence of an Object in S3 and fetch its Tags without attempting to download it. [GetSFTP](/user-guide/data-integration/openflow/processors/getsftp) Fetches files from an SFTP Server and creates FlowFiles from them %logo-snowflake-blue% [GetSharepointSiteGroupMembers](/user-guide/data-integration/openflow/processors/getsharepointsitegroupmembers) Retrieves all members of a SharePoint site group. [GetShopify](/user-guide/data-integration/openflow/processors/getshopify) Retrieves objects from a custom Shopify store. [GetSmbFile](/user-guide/data-integration/openflow/processors/getsmbfile) Reads file from a samba network location to FlowFiles. [GetSplunk](/user-guide/data-integration/openflow/processors/getsplunk) Retrieves data from Splunk Enterprise. [GetSQS](/user-guide/data-integration/openflow/processors/getsqs) Fetches messages from an Amazon Simple Queuing Service Queue %logo-snowflake-blue% [GetUnityCatalogFile](/user-guide/data-integration/openflow/processors/getunitycatalogfile) Read a Unity Catalog file up to 5 GiB. %logo-snowflake-blue% [GetUnityCatalogFileMetadata](/user-guide/data-integration/openflow/processors/getunitycatalogfilemetadata) Checks for Unity Catalog file metadata. [GetWorkdayReport](/user-guide/data-integration/openflow/processors/getworkdayreport) A processor which can interact with a configurable Workday Report. [GetZendesk](/user-guide/data-integration/openflow/processors/getzendesk) Incrementally fetches data from Zendesk API.
## H
Processor Description [HandleHttpRequest](/user-guide/data-integration/openflow/processors/handlehttprequest) Starts an HTTP Server and listens for HTTP Requests. [HandleHttpResponse](/user-guide/data-integration/openflow/processors/handlehttpresponse) Sends an HTTP Response to the Requestor that generated a FlowFile.
## I
Processor Description [IdentifyMimeType](/user-guide/data-integration/openflow/processors/identifymimetype) Attempts to identify the MIME Type used for a FlowFile. [InvokeHTTP](/user-guide/data-integration/openflow/processors/invokehttp) An HTTP client processor which can interact with a configurable HTTP Endpoint. [InvokeScriptedProcessor](/user-guide/data-integration/openflow/processors/invokescriptedprocessor) Experimental - Invokes a script engine for a Processor defined in the given script. [ISPEnrichIP](/user-guide/data-integration/openflow/processors/ispenrichip) Looks up ISP information for an IP address and adds the information to FlowFile attributes.
## J
Processor Description [JoinEnrichment](/user-guide/data-integration/openflow/processors/joinenrichment) Joins together Records from two different FlowFiles where one FlowFile, the 'original' contains arbitrary records and the second FlowFile, the 'enrichment' contains additional data that should be used to enrich the first. [JoltTransformJSON](/user-guide/data-integration/openflow/processors/jolttransformjson) Applies a list of Jolt specifications to either the FlowFile JSON content or a specified FlowFile JSON attribute. [JoltTransformRecord](/user-guide/data-integration/openflow/processors/jolttransformrecord) Applies a JOLT specification to each record in the FlowFile payload. [JSLTTransformJSON](/user-guide/data-integration/openflow/processors/jslttransformjson) Applies a JSLT transformation to the FlowFile JSON payload. [JsonQueryElasticsearch](/user-guide/data-integration/openflow/processors/jsonqueryelasticsearch) A processor that allows the user to run a query (with aggregations) written with the Elasticsearch JSON DSL.
## L
Processor Description %logo-snowflake-blue% [ListArchivedHubSpotData](/user-guide/data-integration/openflow/processors/listarchivedhubspotdata) Lists archived data from HubSpot for the chosen object type and generates one FlowFile per listed object with the corresponding metadata as FlowFile attributes. [ListAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/listazureblobstorage_v12) Lists blobs in an Azure Blob Storage container. [ListAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/listazuredatalakestorage) Lists directory in an Azure Data Lake Storage Gen 2 filesystem [ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) Lists files in a Box folder. [ListBoxFileInfo](/user-guide/data-integration/openflow/processors/listboxfileinfo) Fetches file metadata for each file in a Box Folder. [ListBoxFileMetadataInstances](/user-guide/data-integration/openflow/processors/listboxfilemetadatainstances) Retrieves all metadata instances associated with a Box file. [ListBoxFileMetadataTemplates](/user-guide/data-integration/openflow/processors/listboxfilemetadatatemplates) Retrieves all metadata templates associated with a Box file. %logo-snowflake-blue% [ListConfluenceGroups](/user-guide/data-integration/openflow/processors/listconfluencegroups) Processor listing Confluence groups. [ListDatabaseTables](/user-guide/data-integration/openflow/processors/listdatabasetables) Generates a set of flow files, each containing attributes corresponding to metadata about a table from a database connection. %logo-snowflake-blue% [ListDBFSDirectory](/user-guide/data-integration/openflow/processors/listdbfsdirectory) List file names in a DBFS directory and output a new FlowFile with the filename. [ListDropbox](/user-guide/data-integration/openflow/processors/listdropbox) Retrieves a listing of files from Dropbox (shortcuts are ignored). [ListenFTP](/user-guide/data-integration/openflow/processors/listenftp) Starts an FTP server that listens on the specified port and transforms incoming files into FlowFiles. [ListenHTTP](/user-guide/data-integration/openflow/processors/listenhttp) Starts an HTTP Server and listens on a given base path to transform incoming requests into FlowFiles. [ListenOTLP](/user-guide/data-integration/openflow/processors/listenotlp) Collect OpenTelemetry messages over HTTP or gRPC. [ListenSlack](/user-guide/data-integration/openflow/processors/listenslack) Retrieves real-time messages or Slack commands from one or more Slack conversations. [ListenSyslog](/user-guide/data-integration/openflow/processors/listensyslog) Listens for Syslog messages being sent to a given port over TCP or UDP. [ListenTCP](/user-guide/data-integration/openflow/processors/listentcp) Listens for incoming TCP connections and reads data from each connection using a line separator as the message demarcator. [ListenUDP](/user-guide/data-integration/openflow/processors/listenudp) Listens for Datagram Packets on a given port. [ListenUDPRecord](/user-guide/data-integration/openflow/processors/listenudprecord) Listens for Datagram Packets on a given port and reads the content of each datagram using the configured Record Reader. [ListenWebSocket](/user-guide/data-integration/openflow/processors/listenwebsocket) Acts as a WebSocket server endpoint to accept client connections. [ListFile](/user-guide/data-integration/openflow/processors/listfile) Retrieves a listing of files from the input directory. [ListFTP](/user-guide/data-integration/openflow/processors/listftp) Performs a listing of the files residing on an FTP server. [ListGCSBucket](/user-guide/data-integration/openflow/processors/listgcsbucket) Retrieves a listing of objects from a GCS bucket. [ListGoogleDrive](/user-guide/data-integration/openflow/processors/listgoogledrive) Performs a listing of concrete files (shortcuts are ignored) in a Google Drive folder. %logo-snowflake-blue% [ListGoogleDriveFileInfo](/user-guide/data-integration/openflow/processors/listgoogledrivefileinfo) Lists all files and folders in a specified Google Drive. %logo-snowflake-blue% [ListGoogleGroups](/user-guide/data-integration/openflow/processors/listgooglegroups) Lists all of the groups for a given domain in Google Workspace. %logo-snowflake-blue% [ListHubSpotObjects](/user-guide/data-integration/openflow/processors/listhubspotobjects) Fetches data from HubSpot for specified object types, and generates one FlowFile per listed object with the corresponding metadata as FlowFile attributes. %logo-snowflake-blue% [ListMicrosoftDataverseTables](/user-guide/data-integration/openflow/processors/listmicrosoftdataversetables) List Tables from Microsoft Dataverse environments [ListS3](/user-guide/data-integration/openflow/processors/lists3) Retrieves a listing of objects from an S3 bucket. %logo-snowflake-blue% [ListSFDCDataShares](/user-guide/data-integration/openflow/processors/listsfdcdatashares) List the available data shares in the organization that are available to the identified user. %logo-snowflake-blue% [ListSFDCObjects](/user-guide/data-integration/openflow/processors/listsfdcobjects) List the available objects in the organization that are available to the identified user. [ListSFTP](/user-guide/data-integration/openflow/processors/listsftp) Performs a listing of the files residing on an SFTP server. %logo-snowflake-blue% [ListSharepointDrives](/user-guide/data-integration/openflow/processors/listsharepointdrives) Emits a FlowFile for each Drive present in the specified Sharepoint Site. %logo-snowflake-blue% [ListSharepointSiteGroups](/user-guide/data-integration/openflow/processors/listsharepointsitegroups) Lists all SharePoint site groups available on a specified SharePoint site. [ListSmb](/user-guide/data-integration/openflow/processors/listsmb) Lists concrete files shared via SMB protocol. %logo-snowflake-blue% [ListTableNames](/user-guide/data-integration/openflow/processors/listtablenames) Fetches all source table names and matches them with one of the possible configurations: - regexp expression e. %logo-snowflake-blue% [ListUnityCatalogDirectory](/user-guide/data-integration/openflow/processors/listunitycatalogdirectory) List file names in a Unity Catalog directory and output a new FlowFile with the filename. [LogAttribute](/user-guide/data-integration/openflow/processors/logattribute) Emits attributes of the FlowFile at the specified log level [LogMessage](/user-guide/data-integration/openflow/processors/logmessage) Emits a log message at the specified log level [LookupAttribute](/user-guide/data-integration/openflow/processors/lookupattribute) Lookup attributes from a lookup service [LookupRecord](/user-guide/data-integration/openflow/processors/lookuprecord) Extracts one or more fields from a Record and looks up a value for those fields in a LookupService.
## M
Processor Description [MergeContent](/user-guide/data-integration/openflow/processors/mergecontent) Merges a Group of FlowFiles together based on a user-defined strategy and packages them into a single FlowFile. [MergeRecord](/user-guide/data-integration/openflow/processors/mergerecord) This Processor merges together multiple record-oriented FlowFiles into a single FlowFile that contains all of the Records of the input FlowFiles. %logo-snowflake-blue% [MergeSnowflakeJournalTable](/user-guide/data-integration/openflow/processors/mergesnowflakejournaltable) Triggers a merge operation on changes from journal table to a destination table in Snowflake. [ModifyBytes](/user-guide/data-integration/openflow/processors/modifybytes) Discard byte range at the start and end or all content of a binary file. [ModifyCompression](/user-guide/data-integration/openflow/processors/modifycompression) Changes the compression algorithm used to compress the contents of a FlowFile by decompressing the contents of FlowFiles using a user-specified compression algorithm and recompressing the contents using the specified compression format properties. [MonitorActivity](/user-guide/data-integration/openflow/processors/monitoractivity) Monitors the flow for activity and sends out an indicator when the flow has not had any data for some specified amount of time and again when the flow's activity is restored [MoveAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/moveazuredatalakestorage) Moves content within an Azure Data Lake Storage Gen 2.
## N
Processor Description [Notify](/user-guide/data-integration/openflow/processors/notify) Caches a release signal identifier in the distributed cache, optionally along with the FlowFile's attributes.
## O
Processor Description %logo-snowflake-blue% [OpenAiTranscribeAudio](/user-guide/data-integration/openflow/processors/openaitranscribeaudio) Transcribes audio into English text.
## P
Processor Description [PackageFlowFile](/user-guide/data-integration/openflow/processors/packageflowfile) This processor will package FlowFile attributes and content into an output FlowFile that can be exported from NiFi and imported back into NiFi, preserving the original attributes and content. [PaginatedJsonQueryElasticsearch](/user-guide/data-integration/openflow/processors/paginatedjsonqueryelasticsearch) A processor that allows the user to run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. [ParseEvtx](/user-guide/data-integration/openflow/processors/parseevtx) Parses the contents of a Windows Event Log file (evtx) and writes the resulting XML to the FlowFile %logo-snowflake-blue% [ParseExcelCellReference](/user-guide/data-integration/openflow/processors/parseexcelcellreference) Processor responsible for parsing Excel cell reference formula. [ParseSyslog](/user-guide/data-integration/openflow/processors/parsesyslog) Attempts to parses the contents of a Syslog message in accordance to RFC5424 and RFC3164 formats and adds attributes to the FlowFile for each of the parts of the Syslog message. [ParseSyslog5424](/user-guide/data-integration/openflow/processors/parsesyslog5424) Attempts to parse the contents of a well formed Syslog message in accordance to RFC5424 format and adds attributes to the FlowFile for each of the parts of the Syslog message, including Structured Data. [PartitionRecord](/user-guide/data-integration/openflow/processors/partitionrecord) Splits, or partitions, record-oriented data based on the configured fields in the data. %logo-snowflake-blue% [PerformSnowflakeCortexOCR](/user-guide/data-integration/openflow/processors/performsnowflakecortexocr) Performs Optical Character Recognition (OCR) on PDF documents using Snowflake Cortex ML functions. %logo-snowflake-blue% [PickTablesForReplication](/user-guide/data-integration/openflow/processors/picktablesforreplication) Accepts a list of fully qualified table names and determines if a table: - is new (is not replicated, but was added in the source) - is existing (is replicated and exists in the source) - is stale (is replicated but no longer exists in the source) Configuration is passed as a FlowFile attribute. %logo-snowflake-blue% [PromptAnthropicAI](/user-guide/data-integration/openflow/processors/promptanthropicai) Sends a prompt to Anthropic, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. %logo-snowflake-blue% [PromptAzureOpenAI](/user-guide/data-integration/openflow/processors/promptazureopenai) Sends a prompt to Azure's OpenAI service, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. %logo-snowflake-blue% [PromptLLM](/user-guide/data-integration/openflow/processors/promptllm) This processor sends a user defined prompt to a Large Language Model (LLM) to respond. %logo-snowflake-blue% [PromptOpenAI](/user-guide/data-integration/openflow/processors/promptopenai) Sends a prompt to OpenAI, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. %logo-snowflake-blue% [PromptSnowflakeCortex](/user-guide/data-integration/openflow/processors/promptsnowflakecortex) Sends a prompt to Snowflake Cortex, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. %logo-snowflake-blue% [PromptVertexAI](/user-guide/data-integration/openflow/processors/promptvertexai) Sends a prompt to VertexAI, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. [PublishAMQP](/user-guide/data-integration/openflow/processors/publishamqp) Creates an AMQP Message from the contents of a FlowFile and sends the message to an AMQP Exchange. %logo-snowflake-blue% [PublishChangeDataSnowpipeStreaming](/user-guide/data-integration/openflow/processors/publishchangedatasnowpipestreaming) Publishes change data as Newline Delimited JSON to Snowflake Database Pipes using Snowpipe Streaming High Availability with concurrency group serialization. [PublishGCPubSub](/user-guide/data-integration/openflow/processors/publishgcpubsub) Publishes the content of the incoming flowfile to the configured Google Cloud PubSub topic. [PublishJMS](/user-guide/data-integration/openflow/processors/publishjms) Creates a JMS Message from the contents of a FlowFile and sends it to a JMS Destination (queue or topic) as JMS BytesMessage or TextMessage. [PublishKafka](/user-guide/data-integration/openflow/processors/publishkafka) Sends the contents of a FlowFile as either a message or as individual records to Apache Kafka using the Kafka Producer API. %logo-snowflake-blue% [PublishKafka](/user-guide/data-integration/openflow/processors/publishkafka) Sends the contents of a FlowFile as either a message or as individual records to Apache Kafka using the Kafka Producer API. [PublishMQTT](/user-guide/data-integration/openflow/processors/publishmqtt) Publishes a message to an MQTT topic [PublishSlack](/user-guide/data-integration/openflow/processors/publishslack) Posts a message to the specified Slack channel. %logo-snowflake-blue% [PublishSnowpipeStreaming](/user-guide/data-integration/openflow/processors/publishsnowpipestreaming) Publishes Newline Delimited JSON to Snowflake Database Pipes using Snowpipe Streaming High Availability. [PutAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/putazureblobstorage_v12) Puts content into a blob on Azure Blob Storage. [PutAzureCosmosDBRecord](/user-guide/data-integration/openflow/processors/putazurecosmosdbrecord) This processor is a record-aware processor for inserting data into Cosmos DB with Core SQL API. [PutAzureDataExplorer](/user-guide/data-integration/openflow/processors/putazuredataexplorer) Acts as an Azure Data Explorer sink which sends FlowFiles to the provided endpoint. [PutAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/putazuredatalakestorage) Writes the contents of a FlowFile as a file on Azure Data Lake Storage Gen 2 [PutAzureEventHub](/user-guide/data-integration/openflow/processors/putazureeventhub) Send FlowFile contents to Azure Event Hubs [PutAzureQueueStorage_v12](/user-guide/data-integration/openflow/processors/putazurequeuestorage_v12) Writes the content of the incoming FlowFiles to the configured Azure Queue Storage. [PutBigQuery](/user-guide/data-integration/openflow/processors/putbigquery) Writes the contents of a FlowFile to a Google BigQuery table. [PutBoxFile](/user-guide/data-integration/openflow/processors/putboxfile) Puts content to a Box folder. [PutCloudWatchMetric](/user-guide/data-integration/openflow/processors/putcloudwatchmetric) Publishes metrics to Amazon CloudWatch. [PutDatabaseRecord](/user-guide/data-integration/openflow/processors/putdatabaserecord) The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. %logo-snowflake-blue% [PutDatabricksSQL](/user-guide/data-integration/openflow/processors/putdatabrickssql) Submit a SQL Execution using Databricks REST API then write the JSON response to FlowFile Content. %logo-snowflake-blue% [PutDBFSFile](/user-guide/data-integration/openflow/processors/putdbfsfile) Write FlowFile content to DBFS. [PutDistributedMapCache](/user-guide/data-integration/openflow/processors/putdistributedmapcache) Gets the content of a FlowFile and puts it to a distributed map cache, using a cache key computed from FlowFile attributes. [PutDropbox](/user-guide/data-integration/openflow/processors/putdropbox) Puts content to a Dropbox folder. [PutDynamoDB](/user-guide/data-integration/openflow/processors/putdynamodb) Puts a document from DynamoDB based on hash and range key. [PutDynamoDBRecord](/user-guide/data-integration/openflow/processors/putdynamodbrecord) Inserts items into DynamoDB based on record-oriented data. [PutElasticsearchJson](/user-guide/data-integration/openflow/processors/putelasticsearchjson) An Elasticsearch put processor that uses the official Elastic REST client libraries. [PutElasticsearchRecord](/user-guide/data-integration/openflow/processors/putelasticsearchrecord) A record-aware Elasticsearch put processor that uses the official Elastic REST client libraries. [PutEmail](/user-guide/data-integration/openflow/processors/putemail) Sends an e-mail to configured recipients for each incoming FlowFile [PutFile](/user-guide/data-integration/openflow/processors/putfile) Writes the contents of a FlowFile to the local file system [PutFTP](/user-guide/data-integration/openflow/processors/putftp) Sends FlowFiles to an FTP Server [PutGCSObject](/user-guide/data-integration/openflow/processors/putgcsobject) Writes the contents of a FlowFile as an object in a Google Cloud Storage. [PutGoogleDrive](/user-guide/data-integration/openflow/processors/putgoogledrive) Writes the contents of a FlowFile as a file in Google Drive. [PutGridFS](/user-guide/data-integration/openflow/processors/putgridfs) Writes a file to a GridFS bucket. %logo-snowflake-blue% [PutHubSpot](/user-guide/data-integration/openflow/processors/puthubspot) Upsert a HubSpot object. %logo-snowflake-blue% [PutIcebergTable](/user-guide/data-integration/openflow/processors/puticebergtable) Store records in Iceberg using configurable Catalog for managing namespaces and tables. [PutKinesisFirehose](/user-guide/data-integration/openflow/processors/putkinesisfirehose) Sends the contents to a specified Amazon Kinesis Firehose. [PutKinesisStream](/user-guide/data-integration/openflow/processors/putkinesisstream) Sends the contents to a specified Amazon Kinesis. [PutLambda](/user-guide/data-integration/openflow/processors/putlambda) Sends the contents to a specified Amazon Lambda Function. [PutMongo](/user-guide/data-integration/openflow/processors/putmongo) Writes the contents of a FlowFile to MongoDB [PutMongoBulkOperations](/user-guide/data-integration/openflow/processors/putmongobulkoperations) Writes the contents of a FlowFile to MongoDB as bulk-update [PutMongoRecord](/user-guide/data-integration/openflow/processors/putmongorecord) This processor is a record-aware processor for inserting/upserting data into MongoDB. [PutRecord](/user-guide/data-integration/openflow/processors/putrecord) The PutRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file, and sends them to a destination specified by a Record Destination Service (i. [PutRedisHashRecord](/user-guide/data-integration/openflow/processors/putredishashrecord) Puts record field data into Redis using a specified hash value, which is determined by a RecordPath to a field in each record containing the hash value. [PutS3Object](/user-guide/data-integration/openflow/processors/puts3object) Writes the contents of a FlowFile as an S3 Object to an Amazon S3 Bucket. [PutSalesforceObject](/user-guide/data-integration/openflow/processors/putsalesforceobject) Creates new records for the specified Salesforce sObject. [PutSFTP](/user-guide/data-integration/openflow/processors/putsftp) Sends FlowFiles to an SFTP Server [PutSmbFile](/user-guide/data-integration/openflow/processors/putsmbfile) Writes the contents of a FlowFile to a samba network location. %logo-snowflake-blue% [PutSnowflakeInternalStageFile](/user-guide/data-integration/openflow/processors/putsnowflakeinternalstagefile) Puts files into a Snowflake internal stage. %logo-snowflake-blue% [PutSnowpipeStreaming](/user-guide/data-integration/openflow/processors/putsnowpipestreaming) Streams records into a Snowflake table. %logo-snowflake-blue% [PutSnowpipeStreaming2](/user-guide/data-integration/openflow/processors/putsnowpipestreaming2) Send Records formatted as Newline Delimited JSON to Snowflake Database Pipes using Snowpipe Streaming Version 2. [PutSNS](/user-guide/data-integration/openflow/processors/putsns) Sends the content of a FlowFile as a notification to the Amazon Simple Notification Service [PutSplunk](/user-guide/data-integration/openflow/processors/putsplunk) Sends logs to Splunk Enterprise over TCP, TCP + TLS/SSL, or UDP. [PutSplunkHTTP](/user-guide/data-integration/openflow/processors/putsplunkhttp) Sends flow file content to the specified Splunk server over HTTP or HTTPS. [PutSQL](/user-guide/data-integration/openflow/processors/putsql) Executes a SQL UPDATE or INSERT command. [PutSQS](/user-guide/data-integration/openflow/processors/putsqs) Publishes a message to an Amazon Simple Queuing Service Queue [PutSyslog](/user-guide/data-integration/openflow/processors/putsyslog) Sends Syslog messages to a given host and port over TCP or UDP. [PutTCP](/user-guide/data-integration/openflow/processors/puttcp) Sends serialized FlowFiles or Records over TCP to a configurable destination with optional support for TLS [PutUDP](/user-guide/data-integration/openflow/processors/putudp) The PutUDP processor receives a FlowFile and packages the FlowFile content into a single UDP datagram packet which is then transmitted to the configured UDP server. %logo-snowflake-blue% [PutUnityCatalogFile](/user-guide/data-integration/openflow/processors/putunitycatalogfile) Write FlowFile content with max size of 5 GiB to Unity Catalog. %logo-snowflake-blue% [PutVectaraDocument](/user-guide/data-integration/openflow/processors/putvectaradocument) Generate and upload a JSON document to Vectara's upload endpoint. %logo-snowflake-blue% [PutVectaraFile](/user-guide/data-integration/openflow/processors/putvectarafile) Upload a FlowFile content to Vectara's index endpoint. [PutWebSocket](/user-guide/data-integration/openflow/processors/putwebsocket) Sends messages to a WebSocket remote endpoint using a WebSocket session that is established by either ListenWebSocket or ConnectWebSocket. [PutZendeskTicket](/user-guide/data-integration/openflow/processors/putzendeskticket) Create Zendesk tickets using the Zendesk API.
## Q
Processor Description [QueryAzureDataExplorer](/user-guide/data-integration/openflow/processors/queryazuredataexplorer) Query Azure Data Explorer and stream JSON results to output FlowFiles [QueryDatabaseTable](/user-guide/data-integration/openflow/processors/querydatabasetable) Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. [QueryDatabaseTableRecord](/user-guide/data-integration/openflow/processors/querydatabasetablerecord) Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. %logo-snowflake-blue% [QueryMilvus](/user-guide/data-integration/openflow/processors/querymilvus) Queries a given collection in a Milvus database using vectors. %logo-snowflake-blue% [QueryPinecone](/user-guide/data-integration/openflow/processors/querypinecone) Queries Pinecone for vectors that are similar to the input vector, or retrieves a vector by ID. [QueryRecord](/user-guide/data-integration/openflow/processors/queryrecord) Evaluates one or more SQL queries against the contents of a FlowFile. [QuerySalesforceObject](/user-guide/data-integration/openflow/processors/querysalesforceobject) Retrieves records from a Salesforce sObject. [QuerySplunkIndexingStatus](/user-guide/data-integration/openflow/processors/querysplunkindexingstatus) Queries Splunk server in order to acquire the status of indexing acknowledgement.
## R
Processor Description [RemoveRecordField](/user-guide/data-integration/openflow/processors/removerecordfield) Modifies the contents of a FlowFile that contains Record-oriented data (i. [RenameRecordField](/user-guide/data-integration/openflow/processors/renamerecordfield) Renames one or more fields in each Record of a FlowFile. [ReplaceText](/user-guide/data-integration/openflow/processors/replacetext) Updates the content of a FlowFile by searching for some textual value in the FlowFile content (via Regular Expression/regex, or literal value) and replacing the section of the content that matches with some alternate value. [ReplaceTextWithMapping](/user-guide/data-integration/openflow/processors/replacetextwithmapping) Updates the content of a FlowFile by evaluating a Regular Expression against it and replacing the section of the content that matches the Regular Expression with some alternate value provided in a mapping file. [RetryFlowFile](/user-guide/data-integration/openflow/processors/retryflowfile) FlowFiles passed to this Processor have a 'Retry Attribute' value checked against a configured 'Maximum Retries' value. [RouteOnAttribute](/user-guide/data-integration/openflow/processors/routeonattribute) Routes FlowFiles based on their Attributes using the Attribute Expression Language [RouteOnContent](/user-guide/data-integration/openflow/processors/routeoncontent) Applies Regular Expressions to the content of a FlowFile and routes a copy of the FlowFile to each destination whose Regular Expression matches. [RouteText](/user-guide/data-integration/openflow/processors/routetext) Routes textual data based on a set of user-defined rules. %logo-snowflake-blue% [RunDatabricksJob](/user-guide/data-integration/openflow/processors/rundatabricksjob) Triggers a pre-defined Databricks job to run with custom parameters. [RunMongoAggregation](/user-guide/data-integration/openflow/processors/runmongoaggregation) A processor that runs an aggregation query whenever a flowfile is received.
## S
Processor Description [SampleRecord](/user-guide/data-integration/openflow/processors/samplerecord) Samples the records of a FlowFile based on a specified sampling strategy (such as Reservoir Sampling). [ScanAttribute](/user-guide/data-integration/openflow/processors/scanattribute) Scans the specified attributes of FlowFiles, checking to see if any of their values are present within the specified dictionary of terms [ScanContent](/user-guide/data-integration/openflow/processors/scancontent) Scans the content of FlowFiles for terms that are found in a user-supplied dictionary. [ScriptedFilterRecord](/user-guide/data-integration/openflow/processors/scriptedfilterrecord) This processor provides the ability to filter records out from FlowFiles using the user-provided script. [ScriptedPartitionRecord](/user-guide/data-integration/openflow/processors/scriptedpartitionrecord) Receives Record-oriented data (i. [ScriptedTransformRecord](/user-guide/data-integration/openflow/processors/scriptedtransformrecord) Provides the ability to evaluate a simple script against each record in an incoming FlowFile. [ScriptedValidateRecord](/user-guide/data-integration/openflow/processors/scriptedvalidaterecord) This processor provides the ability to validate records in FlowFiles using the user-provided script. [SearchElasticsearch](/user-guide/data-integration/openflow/processors/searchelasticsearch) A processor that allows the user to repeatedly run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. [SegmentContent](/user-guide/data-integration/openflow/processors/segmentcontent) Segments a FlowFile into multiple smaller segments on byte boundaries. [SignContentPGP](/user-guide/data-integration/openflow/processors/signcontentpgp) Sign content using OpenPGP Private Keys %logo-snowflake-blue% [SnowflakeDetectDuplicate](/user-guide/data-integration/openflow/processors/snowflakedetectduplicate) Checks if a FlowFile 's hash (provided as a FlowFile attribute) is already in a Snowflake table, and routes the FlowFile to' duplicate 'if found,'distinct 'if not found, or' failure' on errors. [SplitAvro](/user-guide/data-integration/openflow/processors/splitavro) Splits a binary encoded Avro datafile into smaller files based on the configured Output Size. [SplitContent](/user-guide/data-integration/openflow/processors/splitcontent) Splits incoming FlowFiles by a specified byte sequence [SplitExcel](/user-guide/data-integration/openflow/processors/splitexcel) This processor splits a multi sheet Microsoft Excel spreadsheet into multiple Microsoft Excel spreadsheets where each sheet from the original file is converted to an individual spreadsheet in its own flow file. [SplitJson](/user-guide/data-integration/openflow/processors/splitjson) Splits a JSON File into multiple, separate FlowFiles for an array element specified by a JsonPath expression. [SplitRecord](/user-guide/data-integration/openflow/processors/splitrecord) Splits up an input FlowFile that is in a record-oriented data format into multiple smaller FlowFiles [SplitText](/user-guide/data-integration/openflow/processors/splittext) Splits a text file into multiple smaller text files on line boundaries limited by maximum number of lines or total size of fragment. [SplitXml](/user-guide/data-integration/openflow/processors/splitxml) Splits an XML File into multiple separate FlowFiles, each comprising a child or descendant of the original root element [StartAwsPollyJob](/user-guide/data-integration/openflow/processors/startawspollyjob) Trigger a AWS Polly job. [StartAwsTextractJob](/user-guide/data-integration/openflow/processors/startawstextractjob) Trigger a AWS Textract job. [StartAwsTranscribeJob](/user-guide/data-integration/openflow/processors/startawstranscribejob) Trigger a AWS Transcribe job. [StartAwsTranslateJob](/user-guide/data-integration/openflow/processors/startawstranslatejob) Trigger a AWS Translate job. [StartGcpVisionAnnotateFilesOperation](/user-guide/data-integration/openflow/processors/startgcpvisionannotatefilesoperation) Trigger a Vision operation on file input. [StartGcpVisionAnnotateImagesOperation](/user-guide/data-integration/openflow/processors/startgcpvisionannotateimagesoperation) Trigger a Vision operation on image input. %logo-snowflake-blue% [SubmitQueryJob](/user-guide/data-integration/openflow/processors/submitqueryjob) Submits a Query Job to Salesforce using the Bulk API 2. %logo-snowflake-blue% [SummarizeText](/user-guide/data-integration/openflow/processors/summarizetext) This processor uses a Large Language Model (LLM) to summarize the content of a FlowFile.
## T
Processor Description [TagS3Object](/user-guide/data-integration/openflow/processors/tags3object) Adds or updates a tag on an Amazon S3 Object. [TailFile](/user-guide/data-integration/openflow/processors/tailfile) "Tails" a file, or a list of files, ingesting data from the file as it is written to the file. [TransformXml](/user-guide/data-integration/openflow/processors/transformxml) Applies the provided XSLT file to the FlowFile XML payload.
## U
Processor Description [UnpackContent](/user-guide/data-integration/openflow/processors/unpackcontent) Unpacks the content of FlowFiles that have been packaged with one of several different Packaging Formats, emitting one to many FlowFiles for each input FlowFile. [UpdateAttribute](/user-guide/data-integration/openflow/processors/updateattribute) Updates the Attributes for a FlowFile by using the Attribute Expression Language and/or deletes the attributes based on a regular expression [UpdateBoxFileMetadataInstance](/user-guide/data-integration/openflow/processors/updateboxfilemetadatainstance) Updates metadata template values for a Box file using the record in the given flowFile. %logo-snowflake-blue% [UpdateBulkJobState](/user-guide/data-integration/openflow/processors/updatebulkjobstate) Updates the status of a Salesforce Bulk Job in the shared state service for a specific object type [UpdateByQueryElasticsearch](/user-guide/data-integration/openflow/processors/updatebyqueryelasticsearch) Update documents in an Elasticsearch index using a query. [UpdateCounter](/user-guide/data-integration/openflow/processors/updatecounter) This processor allows users to set specific counters and key points in their flow. [UpdateDatabaseTable](/user-guide/data-integration/openflow/processors/updatedatabasetable) This processor uses a JDBC connection and incoming records to generate any database table changes needed to support the incoming records. [UpdateRecord](/user-guide/data-integration/openflow/processors/updaterecord) Updates the contents of a FlowFile that contains Record-oriented data (i. %logo-snowflake-blue% [UpdateSnowflakeDatabase](/user-guide/data-integration/openflow/processors/updatesnowflakedatabase) Updates the definition of a Snowflake table based on the schema provided in the incoming FlowFile. %logo-snowflake-blue% [UpdateSnowflakeIcebergDatabase](/user-guide/data-integration/openflow/processors/updatesnowflakeicebergdatabase) Updates the definition of a Snowflake Iceberg table. %logo-snowflake-blue% [UpdateSnowflakeSchema](/user-guide/data-integration/openflow/processors/updatesnowflakeschema) Creates Snowflake database schema if it does not exist. %logo-snowflake-blue% [UpdateSnowflakeStream](/user-guide/data-integration/openflow/processors/updatesnowflakestream) Manages Snowflake streams by creating, dropping, or replacing them based on the configured operation. %logo-snowflake-blue% [UpdateSnowflakeTable](/user-guide/data-integration/openflow/processors/updatesnowflaketable) Updates the definition of a Snowflake table based on the schema provided in the incoming FlowFile. %logo-snowflake-blue% [UpdateSnowflakeView](/user-guide/data-integration/openflow/processors/updatesnowflakeview) Creates or replaces Snowflake views based on column mappings provided in the incoming FlowFile. %logo-snowflake-blue% [UpdateTableState](/user-guide/data-integration/openflow/processors/updatetablestate) Updates the state of a table in the Table State Service %logo-snowflake-blue% [UpsertMilvus](/user-guide/data-integration/openflow/processors/upsertmilvus) Upserts vectors into Milvus database for a given collection %logo-snowflake-blue% [UpsertPinecone](/user-guide/data-integration/openflow/processors/upsertpinecone) Publishes vectors, including metadata, and optionally text, to a Pinecone index. %logo-snowflake-blue% [UpsertSFDCObjects](/user-guide/data-integration/openflow/processors/upsertsfdcobjects) Upserts the records from the incoming FlowFile into Salesforce
## V
Processor Description [ValidateCsv](/user-guide/data-integration/openflow/processors/validatecsv) Validates the contents of FlowFiles or a FlowFile attribute value against a user-specified CSV schema. [ValidateJson](/user-guide/data-integration/openflow/processors/validatejson) Validates the contents of FlowFiles against a configurable JSON Schema. [ValidateRecord](/user-guide/data-integration/openflow/processors/validaterecord) Validates the Records of an incoming FlowFile against a given schema. [ValidateXml](/user-guide/data-integration/openflow/processors/validatexml) Validates XML contained in a FlowFile. [VerifyContentMAC](/user-guide/data-integration/openflow/processors/verifycontentmac) Calculates a Message Authentication Code using the provided Secret Key and compares it with the provided MAC property [VerifyContentPGP](/user-guide/data-integration/openflow/processors/verifycontentpgp) Verify signatures using OpenPGP Public Keys
## W
Processor Description [Wait](/user-guide/data-integration/openflow/processors/wait) Routes incoming FlowFiles to the 'wait' relationship until a matching release signal is stored in the distributed cache from a corresponding Notify processor. %logo-snowflake-blue% [WaitForTableState](/user-guide/data-integration/openflow/processors/waitfortablestate) Blocks incoming FlowFiles until the corresponding table state is not equal to accepted state.
--- title: AmazonGlueEncodedSchemaReferenceReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/amazonglueencodedschemareferencereader.md section: Loading & Unloading Data --- # AmazonGlueEncodedSchemaReferenceReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Reads Schema Identifier according to AWS Glue Schema encoding as a header consisting of a two byte markers and a 16 byte UUID ## Tags avro, aws, glue, registry, schema ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: AmazonGlueSchemaRegistry source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/amazonglueschemaregistry.md section: Loading & Unloading Data --- # AmazonGlueSchemaRegistry This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a Schema Registry that interacts with the AWS Glue Schema Registry so that those Schemas that are stored in the Glue Schema Registry can be used in NiFi. When a Schema is looked up by name by this registry, it will find a Schema in the Glue Schema Registry with their names. ## Tags avro, aws, glue, registry, schema ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description AWS Credentials Provider Service AWS Credentials Provider Service The Controller Service that is used to obtain AWS credentials provider Cache Expiration * Cache Expiration 1 hour Specifies how long a Schema that is cached should remain in the cache. Once this time period elapses, a cached version of a schema will no longer be used, and the service will have to communicate with the Schema Registry again in order to obtain the schema. Cache Size * Cache Size 1000 Specifies how many Schemas should be cached from the Schema Registry Communications Timeout * Communications Timeout 30 secs Specifies how long to wait to receive data from the Schema Registry before considering the communications a failure Region * Region us-west-2 - AWS GovCloud (US-East) - AWS GovCloud (US-West) - Africa (Cape Town) - Asia Pacific (Hong Kong) - Asia Pacific (Hyderabad) - Asia Pacific (Jakarta) - Asia Pacific (Malaysia) - Asia Pacific (Melbourne) - Asia Pacific (Mumbai) - Asia Pacific (New Zealand) - Asia Pacific (Osaka) - Asia Pacific (Seoul) - Asia Pacific (Singapore) - Asia Pacific (Sydney) - Asia Pacific (Taipei) - Asia Pacific (Thailand) - Asia Pacific (Tokyo) - Canada (Central) - Canada West (Calgary) - China (Beijing) - China (Ningxia) - EU (Germany) - EU ISOE West - Europe (Frankfurt) - Europe (Ireland) - Europe (London) - Europe (Milan) - Europe (Paris) - Europe (Spain) - Europe (Stockholm) - Europe (Zurich) - Israel (Tel Aviv) - Mexico (Central) - Middle East (Bahrain) - Middle East (UAE) - South America (Sao Paulo) - US East (N. Virginia) - US East (Ohio) - US ISO East - US ISO WEST - US ISOB East (Ohio) - US ISOF EAST - US ISOF SOUTH - US West (N. California) - US West (Oregon) - aws global region - aws-cn global region - aws-iso global region - aws-iso-b global region - aws-iso-e global region - aws-iso-f global region - aws-us-gov global region The region of the cloud resources SSL Context Service SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections Schema Registry Name * Schema Registry Name The name of the Schema Registry Proxy Configuration Service proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: AmazonMSKConnectionService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/amazonmskconnectionservice.md section: Loading & Unloading Data --- # AmazonMSKConnectionService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides and manages connections to AWS MSK Kafka Brokers for producer or consumer operations. ## Tags aws, kafka, managed, msk, openflow, streaming ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description SSL Context Service SSL Context Service Service supporting SSL communication with Kafka brokers Acknowledgment Wait Time * ack.wait.time 5 sec After sending a message to Kafka, this indicates the amount of time that the service will wait for a response from Kafka.If Kafka does not acknowledge the message within this time period, the service will throw an exception. AWS Profile Name aws.profile.name The Amazon Web Services Profile to select when multiple profiles are available. Bootstrap Servers * bootstrap.servers Comma-separated list of Kafka Bootstrap Servers in the format host:port. Corresponds to Kafka bootstrap.servers property Client Timeout * default.api.timeout.ms 60 sec Default timeout for Kafka client operations. Mapped to Kafka default.api.timeout.ms. The Kafka request.timeout.ms property is derived from half of the configured timeout Transaction Isolation Level * isolation.level read_committed - Read Committed - Read Uncommitted Specifies how the service should handle transaction isolation levels when communicating with Kafka.The uncommited option means that messages will be received as soon as they are written to Kafka but will be pulled, even if the producer cancels the transactions.The committed option configures the service to not receive any messages for which the producer's transaction was canceled, but this can result in some latency since theconsumer must wait for the producer to finish its entire transaction instead of pulling as the messages become available.Corresponds to Kafka isolation.level property. Max Metadata Wait Time * max.block.ms 5 sec The amount of time publisher will wait to obtain metadata or wait for the buffer to flush during the 'send' call before failing theentire 'send' call. Corresponds to Kafka max.block.ms property Max Poll Records * max.poll.records 10000 Maximum number of records Kafka should return in a single poll. SASL Mechanism * sasl.mechanism AWS_MSK_IAM - AWS_MSK_IAM - SCRAM-SHA-512 SASL mechanism used for authentication. Corresponds to Kafka Client sasl.mechanism property SASL Password * sasl.password Password provided with configured username when using PLAIN or SCRAM SASL Mechanisms SASL Username * sasl.username Username provided with configured password when using PLAIN or SCRAM SASL Mechanisms Security Protocol * security.protocol PLAINTEXT - PLAINTEXT - SSL - SASL_PLAINTEXT - SASL_SSL Security protocol used to communicate with brokers. Corresponds to Kafka Client security.protocol property
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: Apache Kafka for JSON/AVRO data format source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kafka/kafka-json-avro.md section: Loading & Unloading Data --- # Apache Kafka for JSON/AVRO data format This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [About Openflow Connector for Kafka](about) - [Set up the Openflow Connector for Kafka](setup) - [Configure other authentication methods for Openflow Connector for Kafka](authentication) - [Apache Kafka with DLQ and metadata](kafka-dlq-metadata) This topic describes the Apache Kafka connectors for JSON and AVRO data formats. These are simplified connectors optimized for basic message ingestion with schema evolution and topic-to-table mapping capabilities. ## Connector variants ### JSON data format connector The Apache Kafka for JSON data format connector is designed for straightforward JSON message ingestion from Kafka topics to Snowflake tables. Key features: - JSON message format support - Schema evolution - Topic-to-table mapping - SASL authentication ### AVRO data format connector The Apache Kafka for AVRO data format connector is designed for AVRO message ingestion from Kafka topics to Snowflake tables with schema registry support. Key features: - AVRO message format support - Schema registry integration - Schema evolution - Topic-to-table mapping - SASL authentication ## Specific parameters In addition to the common parameters described in [Set up the Openflow Connector for Kafka](setup), these connectors have specific parameter contexts. ### Schema registry parameters (AVRO connector only) The AVRO connector includes additional parameters for schema registry integration:
Parameter Description Required Schema Registry Authentication Type The method of authenticating to schema registry if used. Otherwise, use *NONE*. One of: *NONE* / *BASIC*. Default: *NONE\* Yes Schema Registry URL The URL of Schema Registry. Required for *AVRO* message format. No Schema Registry Username The username for Schema Registry. Required for *AVRO* message format. No Schema Registry Password The password for Schema Registry. Required for *AVRO* message format. No AVRO Schema Access Strategy The method of accessing the AVRO schema of a message. Required for *AVRO*. One of: *embedded-avro-schema* / *schema-reference-reader* / *schema-text-property*. Default: *embedded-avro-schema\* No AVRO Schema Avro schema in case schema-text-property is used in AVRO Schema Access Strategy with the AVRO message format. Note: this should only be used in case all messages consumed from the configured Kafka Topic(s) share the same schema. No
## Limitations These simplified connectors have the following limitations compared to the full-featured DLQ and metadata connector: - **No RECORD_METADATA column** - Kafka metadata is not stored in the target tables - **No dead letter queue (DLQ)** - Failed messages are not routed to a DLQ topic - **No Iceberg table support** - Only regular Snowflake tables are supported - **Fixed schematization** - Schema detection is always enabled and cannot be disabled Schema detection is enabled by default in these connectors and cannot be disabled. This means message fields are automatically flattened into individual table columns with automatic schema evolution. ## Use cases These connectors are ideal for:
Simple data ingestion
When you only need the message content without Kafka metadata.
High-throughput scenarios
Where the simplified data structure improves performance.
Schema evolution use cases
Where automatic table schema updates are required
JSON or AVRO message formats
With consistent schemas
If you need Kafka metadata, DLQ support, or Iceberg table ingestion, use the [Apache Kafka with DLQ and metadata](kafka-dlq-metadata) connector instead. ## Schema detection and evolution These connectors support automatic schema detection and evolution. The structure of tables in Snowflake is defined and evolved automatically to support the structure of new data loaded by the connector. With schema detection enabled (which is always the case for these connectors), Snowflake can detect the schema of the streaming data and load data into tables that automatically match any user-defined schema. Snowflake also allows adding new columns or dropping the `NOT NULL` constraint from columns missing in new data files. Schema detection with the connector is supported with or without a provided schema registry. If using schema registry (Avro), the column will be created with the data types defined in the provided schema registry. If there is no schema registry (JSON), the data type will be inferred based on the data provided. JSON ARRAY is not supported for further schematization. ### Schema evolution behavior If the connector creates the target table, schema evolution is enabled by default. If you want to enable or disable schema evolution on an existing table, use the [ALTER TABLE](/sql-reference/sql/alter-table) command to set the `ENABLE_SCHEMA_EVOLUTION` parameter. You must also use a role that has the `OWNERSHIP` privilege on the table. For more information, see [Enable automatic table schema evolution](/user-guide/data-load-schema-evolution). --- title: Apache Kafka with DLQ and metadata source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kafka/kafka-dlq-metadata.md section: Loading & Unloading Data --- # Apache Kafka with DLQ and metadata This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [About Openflow Connector for Kafka](about) - [Set up the Openflow Connector for Kafka](setup) - [Configure other authentication methods for Openflow Connector for Kafka](authentication) - [Apache Kafka for JSON/AVRO data format](kafka-json-avro) This topic describes the Apache Kafka with DLQ and metadata connector. This is the full-featured connector that provides feature parity with the legacy Snowflake connector for Kafka and includes advanced capabilities for production use cases. ## Key features The Apache Kafka with DLQ and metadata connector provides comprehensive functionality: - **Dead Letter Queue (DLQ)** support for failed message handling - **RECORD_METADATA** column with Kafka message metadata - **Configurable schematization** - enable or disable schema detection - **Iceberg table support** with schema evolution - **Multiple message formats** - JSON and AVRO support - **Schema registry integration** for AVRO messages - **Topic-to-table mapping** with advanced patterns - **SASL authentication** support ## Specific parameters In addition to the common parameters described in [Set up the Openflow Connector for Kafka](setup), this connector includes additional parameter contexts for advanced features. ### Message format and schema parameters
Parameter Description Required Message Format The format of messages in Kafka. One of: *JSON* / *AVRO*. Default: *JSON\* Yes AVRO Schema Avro schema in case *schema-text-property* is used in AVRO Schema Access Strategy with the AVRO message format. Note: this should only be used in case all messages consumed from the configured Kafka Topic(s) share the same schema. No AVRO Schema Access Strategy The method of accessing the AVRO schema of a message. Required for *AVRO*. One of: *embedded-avro-schema* / *schema-reference-reader* / *schema-text-property*. Default: *embedded-avro-schema\* No
### Schema registry parameters
Parameter Description Required Schema Registry Authentication Type The method of authenticating to schema registry if used. Otherwise, use *NONE*. One of: *NONE* / *BASIC*. Default: *NONE\* Yes Schema Registry URL The URL of Schema Registry. Required for *AVRO* message format. No Schema Registry Username The username for Schema Registry. Required for *AVRO* message format. No Schema Registry Password The password for Schema Registry. Required for *AVRO* message format. No
### DLQ and advanced features parameters
Parameter Description Required Kafka DLQ Topic DLQ topic to send messages with parsing errors to Yes Schematization Enabled Determines whether data is inserted into individual columns or a single RECORD_CONTENT field. One of: *true* / *false*. Default: *true\* Yes Iceberg Enabled Specifies whether the processor ingests data into an Iceberg table. The processor fails if this property doesn't match the actual table type. Default: *false\* Yes
## Schematization behavior The connector's behavior changes based on the **Schematization Enabled** parameter: ### Schematization enabled When schematization is enabled, the connector: - Creates individual columns for each field in the message - Includes a **RECORD_METADATA** column with Kafka metadata - Automatically evolves the table schema when new fields are detected - Flattens nested JSON/AVRO structures into separate columns **Example table structure:**
Row RECORD_METADATA ACCOUNT SYMBOL SIDE QUANTITY 1 \{"timestamp":1669074170090, "headers": \{"current.iter... ABC123 ZTEST BUY 3572 2 \{"timestamp":1669074170400, "headers": \{"current.iter... XYZ789 ZABX SELL 3024
### Schematization disabled When schematization is disabled, the connector: - Creates only two columns: **RECORD_CONTENT** and **RECORD_METADATA** - Stores the entire message content as an OBJECT in **RECORD_CONTENT** - Does not perform automatic schema evolution - Provides maximum flexibility for downstream processing **Example table structure:**
Row RECORD_METADATA RECORD_CONTENT 1 \{"timestamp":1669074170090, "headers": \{"current.iter... \{"account": "ABC123", "symbol": "ZTEST", "side":... 2 \{"timestamp":1669074170400, "headers": \{"current.iter... \{"account": "XYZ789", "symbol": "ZABX", "side":...
Use the `Schematization Enabled` property in the connector configuration properties to enable or disable schema detection. ## Schema detection and evolution The connector supports schema detection and evolution. The structure of tables in Snowflake can be defined and evolved automatically to support the structure of new data loaded by the connector. Without schema detection and evolution, the Snowflake table loaded by the connector only consists of two `OBJECT` columns: `RECORD_CONTENT` and `RECORD_METADATA`. With schema detection and evolution enabled, Snowflake can detect the schema of the streaming data and load data into tables that automatically match any user-defined schema. Snowflake also allows adding new columns or dropping the `NOT NULL` constraint from columns missing in new data files. Schema detection with the connector is supported with or without a provided schema registry. If using schema registry (Avro), the column will be created with the data types defined in the provided schema registry. If there is no schema registry (JSON), the data type will be inferred based on the data provided. JSON ARRAY is not supported for further schematization. ### Enabling schema evolution If the connector creates the target table, schema evolution is enabled by default. If you want to enable or disable schema evolution on the existing table, use the [ALTER TABLE](/sql-reference/sql/alter-table) command to set the `ENABLE_SCHEMA_EVOLUTION` parameter. You must also use a role that has the `OWNERSHIP` privilege on the table. For more information, see [Enable automatic table schema evolution](/user-guide/data-load-schema-evolution). However, if schema evolution is disabled for an existing table, then the connector will try to send the rows with mismatched schemas to the configured dead-letter queues (DLQ). ### RECORD_METADATA structure The **RECORD_METADATA** column contains important Kafka message metadata:
Field Description offset The message offset within the Kafka partition topic The Kafka topic name partition The Kafka partition number key The message key (if present) timestamp The message timestamp SnowflakeConnectorPushTime Timestamp when the connector fetched the message from Kafka headers Map of message headers (if present)
## Dead Letter Queue (DLQ) The DLQ functionality handles messages that cannot be processed successfully: ### DLQ behavior - **Parse failures** - Messages with invalid JSON/AVRO format are sent to the DLQ - **Schema mismatches** - Messages that don't match the expected schema when schema evolution is disabled - **Processing errors** - Other processing failures during ingestion ## Iceberg table support Openflow Connector for Kafka can ingest data into a Snowflake-managed [Apache Iceberg™ table](/user-guide/tables-iceberg) when **Iceberg Enabled** is set to *true*. ### Requirements and limitations Before you configure the Openflow Kafka connector for Iceberg table ingestion, note the following requirements and limitations: - You must create an Iceberg table before running the connector. - Make sure that the user has access to inserting data into the created tables. ### Configuration and setup To configure the Openflow Connector for Kafka for Iceberg table ingestion, follow the steps in [Set up the Openflow Connector for Kafka](setup) with a few differences noted in the following sections. #### Enable ingestion into Iceberg table To enable ingestion into an Iceberg table, you must set the `Iceberg Enabled` parameter to `true`. #### Create an Iceberg table for ingestion Before you run the connector, you must create an Iceberg table. The initial table schema depends on your connector `Schematization Enabled` property settings. If you enable schematization, you must create a table with a column named `record_metadata`: ```sql CREATE OR REPLACE ICEBERG TABLE my_iceberg_table ( record_metadata OBJECT() ) EXTERNAL_VOLUME = 'my_volume' CATALOG = 'SNOWFLAKE' BASE_LOCATION = 'my_location/my_iceberg_table'; ``` The connector automatically creates the columns for message fields and alters the `record_metadata` column schema. If you don't enable schematization, you must create a table with a column named `record_content` of a type that matches the actual Kafka message content. The connector automatically creates the `record_metadata` column. When you create an Iceberg table, you can use Iceberg data types or [compatible Snowflake types](/user-guide/tables-iceberg-data-types). The semi-structured VARIANT type isn't supported. Instead, use a [structured OBJECT or MAP](/sql-reference/data-types-structured). For example, consider the following message: ```sqljson { "id": 1, "name": "Steve", "body_temperature": 36.6, "approved_coffee_types": ["Espresso", "Doppio", "Ristretto", "Lungo"], "animals_possessed": { "dogs": true, "cats": false }, "date_added": "2024-10-15" } ``` ### Iceberg table creation examples **With schematization enabled:** ```sql CREATE OR REPLACE ICEBERG TABLE my_iceberg_table ( RECORD_METADATA OBJECT( offset INTEGER, topic STRING, partition INTEGER, key STRING, timestamp TIMESTAMP, SnowflakeConnectorPushTime BIGINT, headers MAP(VARCHAR, VARCHAR) ), id INT, body_temperature FLOAT, name STRING, approved_coffee_types ARRAY(STRING), animals_possessed OBJECT(dogs BOOLEAN, cats BOOLEAN), date_added DATE ) EXTERNAL_VOLUME = 'my_volume' CATALOG = 'SNOWFLAKE' BASE_LOCATION = 'my_location/my_iceberg_table'; ``` **With schematization disabled:** ```sql CREATE OR REPLACE ICEBERG TABLE my_iceberg_table ( RECORD_METADATA OBJECT( offset INTEGER, topic STRING, partition INTEGER, key STRING, timestamp TIMESTAMP, SnowflakeConnectorPushTime BIGINT, headers MAP(VARCHAR, VARCHAR) ), RECORD_CONTENT OBJECT( id INT, body_temperature FLOAT, name STRING, approved_coffee_types ARRAY(STRING), animals_possessed OBJECT(dogs BOOLEAN, cats BOOLEAN), date_added DATE ) ) EXTERNAL_VOLUME = 'my_volume' CATALOG = 'SNOWFLAKE' BASE_LOCATION = 'my_location/my_iceberg_table'; ``` RECORD_METADATA must always be created. Field names inside nested structures such as `dogs` or `cats` are case sensitive. ## Use cases This connector is ideal for: - **Production environments** requiring DLQ - **Data lineage and auditing** where Kafka metadata is important - **Complex message processing** with schema evolution requirements - **Iceberg table integration** If you need simpler ingestion without metadata or DLQ features, consider the [Apache Kafka for JSON/AVRO data format](kafka-json-avro) connectors instead. --- title: ApicurioSchemaRegistry source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/apicurioschemaregistry.md section: Loading & Unloading Data --- # ApicurioSchemaRegistry This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a Schema Registry that interacts with the Apicurio Schema Registry so that those Schemas that are stored in the Apicurio Schema Registry can be used in NiFi. When a Schema is looked up by name by this registry, it will find a Schema in the Apicurio Schema Registry with their artifact identifiers. ## Tags apicurio, avro, registry, schema ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Cache Expiration * Cache Expiration 1 hour Specifies how long a Schema that is cached should remain in the cache. Once this time period elapses, a cached version of a schema will no longer be used, and the service will have to communicate with the Schema Registry again in order to obtain the schema. Cache Size * Cache Size 1000 Specifies how many Schemas should be cached from the Schema Registry. The cache size must be a non-negative integer. When it is set to 0, the cache is effectively disabled. Schema Group ID * Schema Group ID default The artifact Group ID for the schemas Schema Registry URL * Schema Registry URL The URL of the Schema Registry e.g. [http://localhost:8080](http://localhost:8080) Web Client Service Provider * Web Client Service Provider Controller service for HTTP client operations
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: AttributesToCSV 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/attributestocsv.md section: Loading & Unloading Data --- # AttributesToCSV 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Generates a CSV representation of the input FlowFile Attributes. The resulting CSV can be written to either a newly generated attribute named 'CSVAttributes' or written to the FlowFile as content. If the attribute value contains a comma, newline or double quote, then the attribute value will be escaped with double quotes. Any double quote characters in the attribute value are escaped with another double quote. ## Tags attributes, csv, flowfile ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description attribute-list Comma separated list of attributes to be included in the resulting CSV. If this value is left empty then all existing Attributes will be included. This list of attributes is case sensitive and supports attribute names that contain commas. If an attribute specified in the list is not found it will be emitted to the resulting CSV with an empty string or null depending on the 'Null Value' property. If a core attribute is specified in this list and the 'Include Core Attributes' property is false, the core attribute will be included. The attribute list ALWAYS wins. attributes-regex Regular expression that will be evaluated against the flow file attributes to select the matching attributes. This property can be used in combination with the attributes list property. The final output will contain a combination of matches found in the ATTRIBUTE_LIST and ATTRIBUTE_REGEX. destination Control if CSV value is written as a new flowfile attribute 'CSVData' or written in the flowfile content. include-core-attributes Determines if the FlowFile org.apache.nifi.flowfile.attributes. CoreAttributes, which are contained in every FlowFile, should be included in the final CSV value generated. Core attributes will be added to the end of the CSVData and CSVSchema strings. The Attribute List property overrides this setting. include-schema If true the schema (attribute names) will also be converted to a CSV string which will either be applied to a new attribute named 'CSVSchema' or applied at the first row in the content depending on the DESTINATION property setting. null-value If true a non existing or empty attribute will be 'null' in the resulting CSV. If false an empty string will be placed in the CSV
## Relationships
Name Description failure Failed to convert attributes to CSV success Successfully converted attributes to CSV
## Writes attributes
Name Description CSVSchema CSV representation of the Schema CSVData CSV representation of Attributes
--- title: AttributesToJSON 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/attributestojson.md section: Loading & Unloading Data --- # AttributesToJSON 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Generates a JSON representation of the input FlowFile Attributes. The resulting JSON can be written to either a new Attribute 'JSONAttributes' or written to the FlowFile as content. Attributes which contain nested JSON objects can either be handled as JSON or as escaped JSON depending on the strategy chosen. ## Tags attributes, flowfile, json ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Attributes List Comma separated list of attributes to be included in the resulting JSON. If this value is left empty then all existing Attributes will be included. This list of attributes is case sensitive. If an attribute specified in the list is not found it will be emitted to the resulting JSON with an empty string or NULL value. Destination Control if JSON value is written as a new flowfile attribute 'JSONAttributes' or written in the flowfile content. Writing to flowfile content will overwrite any existing flowfile content. Include Core Attributes Determines if the FlowFile org.apache.nifi.flowfile.attributes. CoreAttributes which are contained in every FlowFile should be included in the final JSON value generated. JSON Handling Strategy Strategy to use for handling attributes which contain nested JSON. Null Value If true a non existing selected attribute will be NULL in the resulting JSON. If false an empty string will be placed in the JSON Pretty Print Apply pretty print formatting to the output. attributes-to-json-regex Regular expression that will be evaluated against the flow file attributes to select the matching attributes. This property can be used in combination with the attributes list property.
## Relationships
Name Description failure Failed to convert attributes to JSON success Successfully converted attributes to JSON
## Writes attributes
Name Description JSONAttributes JSON representation of Attributes
--- title: AvroReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/avroreader.md section: Loading & Unloading Data --- # AvroReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Parses Avro data and returns each Avro record as an separate Record object. The Avro data may contain the schema itself, or the schema can be externalized and accessed by one of the methods offered by the 'Schema Access Strategy' property. ## Tags avro, comma, delimited, parse, reader, record, row, separated, values ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Schema Access Strategy * Schema Access Strategy embedded-avro-schema - Use 'Schema Name' Property - Use 'Schema Text' Property - Schema Reference Reader - Use Embedded Avro Schema Specifies how to obtain the schema that is to be used for interpreting the data. Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored. Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved. Cache Size * cache-size 1000 Specifies how many Schemas should be cached
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: AvroRecordSetWriter source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/avrorecordsetwriter.md section: Loading & Unloading Data --- # AvroRecordSetWriter This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Writes the contents of a RecordSet in Binary Avro format. ## Tags avro, record, recordset, result, row, serializer, set, writer ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Schema Access Strategy * Schema Access Strategy inherit-record-schema - Inherit Record Schema - Use 'Schema Name' Property - Use 'Schema Text' Property Specifies how to obtain the schema that is to be used for interpreting the data. Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored. Schema Cache Schema Cache Specifies a Schema Cache to add the Record Schema to so that Record Readers can quickly lookup the schema. Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier Schema Reference Writer * Schema Reference Writer Service implementation responsible for writing FlowFile attributes or content header with Schema reference information Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved. Schema Write Strategy * Schema Write Strategy avro-embedded - Embed Avro Schema - Do Not Write Schema - Set 'schema.name' Attribute - Set 'avro.schema' Attribute - Schema Reference Writer Specifies how the schema for a Record should be added to the data. Cache Size * cache-size 1000 Specifies how many Schemas should be cached Compression Format * compression-format NONE - BZIP2 - DEFLATE - NONE - SNAPPY - LZO Compression type to use when writing Avro files. Default is None. Encoder Pool Size * encoder-pool-size 32 Avro Writers require the use of an Encoder. Creation of Encoders is expensive, but once created, they can be reused. This property controls the maximum number of Encoders that can be pooled and reused. Setting this value too small can result in degraded performance, but setting it higher can result in more heap being used. This property is ignored if the Avro Writer is configured with a Schema Write Strategy of 'Embed Avro Schema'.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: AvroSchemaRegistry source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/avroschemaregistry.md section: Loading & Unloading Data --- # AvroSchemaRegistry This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a service for registering and accessing schemas. You can register a schema as a dynamic property where 'name' represents the schema name and 'value' represents the textual representation of the actual schema following the syntax and semantics of Avro's Schema format. ## Tags avro, csv, json, registry, schema ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Validate Field Names * avro-reg-validated-field-names true - true - false Whether or not to validate the field names in the Avro schema based on Avro naming rules. If set to true, all field names must be valid Avro names, which must begin with `[A-Za-z_]`, and subsequently contain only `[A-Za-z0-9_]`. If set to false, no validation will be performed on the field names.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: AWSCredentialsProviderControllerService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/awscredentialsprovidercontrollerservice.md section: Loading & Unloading Data --- # AWSCredentialsProviderControllerService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Defines credentials for Amazon Web Services processors. Uses default credentials without configuration. Default credentials support EC2 instance profile/role, default user profile, environment variables, etc. Additional options include access key / secret key pairs, credentials file, named profile, and assume role credentials. ## Tags aws, credentials, provider ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Access Key ID Access Key ID Assume Role ARN Assume Role ARN The AWS Role ARN for cross account access. This is used in conjunction with Assume Role Session Name and other Assume Role properties. Assume Role External ID Assume Role External ID External ID for cross-account access. This is used in conjunction with Assume Role ARN. Assume Role Proxy Configuration Service Assume Role Proxy Configuration Service Proxy configuration for cross-account access, if needed within your environment. This will configure a proxy to request for temporary access keys into another AWS account. Assume Role SSL Context Service Assume Role SSL Context Service SSL Context Service used when connecting to the STS Endpoint. Assume Role STS Endpoint Override Assume Role STS Endpoint Override The default AWS Security Token Service (STS) endpoint ("sts.amazonaws.com") works for all accounts that are not for China (Beijing) region or GovCloud. You only need to set this property to "sts.cn-north-1.amazonaws.com.cn" when you are requesting session credentials for services in China(Beijing) region or to "sts.us-gov-west-1.amazonaws.com" for GovCloud. Assume Role STS Region Assume Role STS Region us-west-2 - Middle East (UAE) - US ISOF SOUTH - Asia Pacific (Taipei) - US West (N. California) - US West (Oregon) - Africa (Cape Town) - Asia Pacific (Osaka) - Asia Pacific (Seoul) - Asia Pacific (Tokyo) - Middle East (Bahrain) - South America (Sao Paulo) - China (Beijing) - Asia Pacific (Singapore) - Asia Pacific (Sydney) - Asia Pacific (Jakarta) - Asia Pacific (Melbourne) - Asia Pacific (Malaysia) - US East (N. Virginia) - Asia Pacific (New Zealand) - US East (Ohio) - Asia Pacific (Thailand) - China (Ningxia) - Asia Pacific (Hyderabad) - Asia Pacific (Mumbai) - Europe (Milan) - Europe (Spain) - AWS GovCloud (US-East) - Israel (Tel Aviv) - Canada (Central) - Mexico (Central) - Europe (Frankfurt) - EU (Germany) - US ISO WEST - Europe (Zurich) - EU ISOE West - Europe (Stockholm) - Europe (Paris) - Europe (London) - Europe (Ireland) - Asia Pacific (Hong Kong) - Canada West (Calgary) - AWS GovCloud (US-West) - US ISO East - US ISOB East (Ohio) - US ISOF EAST The AWS Security Token Service (STS) region Assume Role STS Signer Override Assume Role STS Signer Override Default Signature - Default Signature - Signature Version 4 - Custom Signature The AWS STS library uses Signature Version 4 by default. This property allows you to plug in your own custom signer implementation. Assume Role Session Name * Assume Role Session Name The AWS Role Session Name for cross account access. This is used in conjunction with Assume Role ARN. Assume Role Session Time Assume Role Session Time 3600 Session time for role based session (between 900 and 3600 seconds). This is used in conjunction with Assume Role ARN. Credentials File Credentials File Path to a file containing AWS access key and secret key in properties file format. Custom Signer Class Name * Custom Signer Class Name Fully qualified class name of the custom signer class. The signer must implement com.amazonaws.auth.Signer interface. Custom Signer Module Location Custom Signer Module Location Comma-separated list of paths to files and/or directories which contain the custom signer's JAR file and its dependencies (if any). Profile Name Profile Name The AWS profile name for credentials from the profile configuration file. Secret Access Key Secret Access Key Use Anonymous Credentials Use Anonymous Credentials false - true - false If true, uses Anonymous credentials Use Default Credentials Use Default Credentials false - true - false If true, uses the Default Credential chain, including EC2 instance profiles or roles, environment variables, default user credentials, etc.
## State management This component does not store state. ## Restricted ## Restrictions
Required Permission Explanation access environment credentials The default configuration can read environment variables and system properties for credentials
## System Resource Considerations This component does not specify system resource considerations. --- title: AzureBlobStorageFileResourceService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/azureblobstoragefileresourceservice.md section: Loading & Unloading Data --- # AzureBlobStorageFileResourceService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides an Azure Blob Storage file resource for other components. ## Tags azure, blob, cloud, file, microsoft, resource, storage ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Blob Name * Blob Name $\{azure.blobname\} The full name of the blob Container Name * Container Name $\{azure.container\} Name of the Azure storage container. In case of PutAzureBlobStorage processor, container can be created if it does not exist. Storage Credentials * Storage Credentials Controller Service used to obtain Azure Blob Storage Credentials.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: AzureCosmosDBClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/azurecosmosdbclientservice.md section: Loading & Unloading Data --- # AzureCosmosDBClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a controller service that configures a connection to Cosmos DB (Core SQL API) and provides access to that connection to other Cosmos DB-related components. ## Tags azure, cosmos, document, service ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Cosmos DB Access Key Cosmos DB Access Key Cosmos DB Access Key from Azure Portal (Settings->Keys). Choose a read-write key to enable database or container creation at run time Cosmos DB Consistency Level Cosmos DB Consistency Level SESSION - STRONG - BOUNDED_STALENESS - SESSION - CONSISTENT_PREFIX - EVENTUAL Choose from five consistency levels on the consistency spectrum. Refer to Cosmos DB documentation for their differences Cosmos DB URI Cosmos DB URI Cosmos DB URI, typically in the form of https://\{databaseaccount\}.documents.azure.com:443/ Note this host URL is for Cosmos DB with Core SQL API from Azure Portal (Overview->URI)
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: AzureDataLakeStorageFileResourceService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/azuredatalakestoragefileresourceservice.md section: Loading & Unloading Data --- # AzureDataLakeStorageFileResourceService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides an Azure Data Lake Storage (ADLS) file resource for other components. ## Tags adlsgen2, azure, cloud, datalake, file, microsoft, resource, storage ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description ADLS Credentials * ADLS Credentials Controller Service used to obtain Azure Credentials. Directory Name * Directory Name $\{azure.directory\} Name of the Azure Storage Directory. The Directory Name cannot contain a leading '/'. The root directory can be designated by the empty string value. In case of the PutAzureDataLakeStorage processor, the directory will be created if not already existing. File Name * File Name $\{azure.filename\} The filename Filesystem Name * Filesystem Name $\{azure.filesystem\} Name of the Azure Storage File System (also called Container). It is assumed to be already existing.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: AzureEventHubRecordSink source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/azureeventhubrecordsink.md section: Loading & Unloading Data --- # AzureEventHubRecordSink This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Format and send Records to Azure Event Hubs ## Tags azure, record, sink ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Authentication Strategy * Authentication Strategy DEFAULT_AZURE_CREDENTIAL - Shared Access Key - Default Azure Credential Strategy for authenticating to Azure Event Hubs Event Hub Name * Event Hub Name Provides the Event Hub Name for connections Event Hub Namespace * Event Hub Namespace Provides provides the host for connecting to Azure Event Hubs Partition Key Partition Key A hint for Azure Event Hub message broker how to distribute messages across one or more partitions Service Bus Endpoint * Service Bus Endpoint .servicebus.windows.net - Azure - Azure China - Azure Germany - Azure US Government Provides the domain for connecting to Azure Event Hubs Shared Access Policy Shared Access Policy The name of the shared access policy. This policy must have Send claims Shared Access Policy Key Shared Access Policy Key The primary or secondary key of the shared access policy Transport Type * Transport Type Amqp - AMQP - AMQP_WEB_SOCKETS Advanced Message Queuing Protocol Transport Type for communication with Azure Event Hubs Record Writer * record-sink-record-writer Specifies the Controller Service to use for writing out the records.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: AzureStorageCredentialsControllerService_v12 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/azurestoragecredentialscontrollerservice_v12.md section: Loading & Unloading Data --- # AzureStorageCredentialsControllerService_v12 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides credentials for Azure Storage processors using Azure Storage client library v12. ## Tags azure, blob, cloud, credentials, microsoft, queue, storage ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Account Key * Account Key The storage account key. This is an admin-like password providing access to every container in this account. It is recommended one uses Shared Access Signature (SAS) token, Managed Identity or Service Principal instead for fine-grained control with policies. Credentials Type * Credentials Type SAS_TOKEN - Account Key - SAS Token - Managed Identity - Service Principal Credentials type to be used for authenticating to Azure Endpoint Suffix * Endpoint Suffix blob.core.windows.net Storage accounts in public Azure always use a common FQDN suffix. Override this endpoint suffix with a different suffix in certain circumstances (like Azure Stack or non-public Azure regions). Managed Identity Client ID Managed Identity Client ID Client ID of the managed identity. The property is required when User Assigned Managed Identity is used for authentication. It must be empty in case of System Assigned Managed Identity. SAS Token * SAS Token Shared Access Signature token (the leading '?' may be included) Service Principal Client ID * Service Principal Client ID Client ID (or Application ID) of the Client/Application having the Service Principal. Service Principal Client Secret * Service Principal Client Secret Password of the Client/Application. Service Principal Tenant ID * Service Principal Tenant ID Tenant ID of the Azure Active Directory hosting the Service Principal. Storage Account Name * Storage Account Name The storage account name. Proxy Configuration Service proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: AzureStorageCredentialsControllerServiceLookup_v12 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/azurestoragecredentialscontrollerservicelookup_v12.md section: Loading & Unloading Data --- # AzureStorageCredentialsControllerServiceLookup_v12 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides an AzureStorageCredentialsService_v12 that can be used to dynamically select another AzureStorageCredentialsService_v12. This service requires an attribute named 'azure.storage.credentials.name' to be passed in, and will throw an exception if the attribute is missing. The value of 'azure.storage.credentials.name' will be used to select the AzureStorageCredentialsService_v12 that has been registered with that name. This will allow multiple AzureStorageCredentialsServices_v12 to be defined and registered, and then selected dynamically at runtime by tagging flow files with the appropriate 'azure.storage.credentials.name' attribute. ## Tags azure, blob, cloud, credentials, microsoft, queue, storage ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: CalculateRecordStats 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/calculaterecordstats.md section: Loading & Unloading Data --- # CalculateRecordStats 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Counts the number of Records in a record set, optionally counting the number of elements per category, where the categories are defined by user-defined properties. ## Tags metrics, record, stats ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description record-stats-limit Limit the number of individual stats that are returned for each record path to the top N results. record-stats-reader A record reader to use for reading the records.
## Relationships
Name Description failure If a FlowFile cannot be processed for any reason, it is routed to this Relationship. success All FlowFiles that are successfully processed, are routed to this Relationship.
## Writes attributes
Name Description record.count A count of the records in the record set in the FlowFile. recordStats.<User Defined Property Name>.count A count of the records that contain a value for the user defined property. recordStats.<User Defined Property Name>.<value>.count Each value discovered for the user defined property will have its own count attribute. Total number of top N value counts to be added is defined by the limit configuration.
--- title: CaptureChangeMySQL 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/capturechangemysql.md section: Loading & Unloading Data --- # CaptureChangeMySQL 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-database-cdc-processors-nar ## Description Reads CDC events from a MySQL database. The processor continuously reads events from binary log files, filtering those related to the tables provided by the TableStateService, and discarding the rest. The processor outputs two types of FlowFiles: - DDLs containing the schema of a table (the initial schema and a new schema on every schema change). - DMLs with records representing changes to the data in the table. One FlowFile always represents data related to a single table. The DDL with the schema is written to the FlowFile content as a JSON object: \{ "columns": [ \{ "name": "<columnName>", "type": "<snowflakeType>", "nullable": <true|false>, "scale": <scale>, "precision": <precision> \}, ... ], "primaryKeys": ["<primaryKey1>", "<primaryKey2>", ...] \} Structure of the FlowFiles containing the DML records: \{ "primaryKeys": \{ "<column>": <value>, ... \}, "payload": \{ "<column>": <value>, ... \}, "metadata": \{ "<column>": <value>, ... \} ## Tags cdc, event, jdbc, mysql, sql ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Column Filter Store Service storing per-table column filtering settings. Connection Timeout Connection to source database timeout JDBC Driver Location Comma-separated list of files/folders and/or URLs containing the driver JAR and its dependencies (if any). For example '/var/tmp/mariadb-java-client-3.4.1.jar' JDBC URL JDBC URL of the database connection, ie. [jdbc:mariadb://localhost:3306/mysql](jdbc:mariadb://localhost:3306/mysql) Max Batch Size The maximum number of records to process in a single iteration. The number of records may exceed the maximum batch size when the last binlog event contains more than one row. Max Batch Wait Time The maximum time to wait for data to appear in the binlog. Max Queue Size The maximum number of elements read from binlog until reader thread will wait for onTrigger Password Password to access the MySQL database Record Writer The Record Writer is used for serializing DML events SSL Context Service SSL Context Service supporting encrypted socket communication SSL Mode SSL Mode used when SSL Context Service configured supporting certificate verification options Server ID Server ID (in the range from 1 to 2^32 - 1). This value MUST be unique across whole replication group (that is, different from any other Server ID being used by any master or slave). Keep in mind that each binary log client should be treated as a simplified slave and thus MUST also use a different Server ID. Server ID Strategy Determines how the server ID is selected Table State Store The shared store holding the state of replicated tables. Username Username to access the MySQL database
## State management
Scopes Description CLUSTER Information such as a 'pointer' to the current CDC event in the database is stored by this processor, such that it can continue from the same location if restarted.
## Relationships
Name Description success Successfully created FlowFile from CDC stream events
## Writes attributes
Name Description source.schema.name Name of the schema of the table from which an event originated source.table.name Name of the table from which an event originated cdc.event.type Type of event carried by the FlowFile: ddl or dml cdc.most.significant.position Ddl's most significant position in cdc stream cdc.least.significant.position Ddl's least significant position in cdc stream cdc.event.seen.at Timestamp from time when ddl event has been read by the processor
--- title: CaptureChangePostgreSQL 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/capturechangepostgresql.md section: Loading & Unloading Data --- # CaptureChangePostgreSQL 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-database-cdc-processors-nar ## Description Reads CDC events from a PostgreSQL database. The processor continuously reads events arriving in the stream, filtering for those related to tables provided by the TableStateService, and discarding the rest. After the current batch of events is processed, the processor confirms the replication slot position back to PostgreSQL, letting it trim the WAL. The processor outputs two types of FlowFiles: DDLs, containing the initial schema of a table, and then every time its schema changes, and DMLs, with records representing changes to data in the table. One FlowFile always represents data related to a single table. The DDL with the schema is written to the FlowFile content as a JSON object, in a form such as: \{ "columns": [ \{ "name": "<columnName>", "type": "<snowflakeType>", "nullable": <true|false>, "scale": <scale>, "precision": <precision> \}, ... ], "primaryKeys": ["<primaryKey1>", "<primaryKey2>", ...] \} The DML records are structured as: \{ "primaryKeys": \{ "<column>": <value>, ... \}, "payload": \{ "<column>": <value>, ... \}, "metadata": \{ "<column>": <value>, ... \} ## Tags cdc, event, jdbc, postgresql, sql ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Column Filter Store Service storing per-table column filtering settings. JDBC Driver Location Comma-separated list of files/folders and/or URLs containing the driver JAR and its dependencies (if any). For example '/var/tmp/postgresql-java-client-42.7.5.jar' JDBC URL JDBC URL of the database connection, ie. [jdbc:postgresql://localhost:5432/postgres](jdbc:postgresql://localhost:5432/postgres) Max Batch Size The maximum number of records to process in a single iteration Max Batch Wait Time The maximum time to wait for data to appear in the CDC stream. Password Password to access the PostgreSQL database Publication Name The name of the CDC publication to read from. Record Writer The Record Writer is used for serializing DML events Replication Slot Name The name of the replication slot to use. 63 characters maximum. If the slot doesn't exist, the processor will create it. SSL Context Service SSL Context Service supporting encrypted socket communication SSL Mode Whether to use and enforce SSL when connecting to PostgreSQL TOASTed Value Placeholder The value to put into a TOASTed column TOASTed Value Strategy Determines how to handle TOASTed values. Table State Store The shared store holding the state of replicated tables. Username Username to access the PostgreSQL database
## State management
Scopes Description CLUSTER Information such as a 'pointer' to the current CDC event in the database is stored by this processor, such that it can continue from the same location if restarted, and the name of the replication slot created in PostgreSQL.
## Relationships
Name Description success Successfully created FlowFile from CDC stream events
## Writes attributes
Name Description source.schema.name Name of the schema of the table from which an event originated source.table.name Name of the table from which an event originated cdc.event.type Type of event carried by the FlowFile: ddl or dml cdc.most.significant.position Ddl's most significant position in cdc stream cdc.least.significant.position Ddl's least significant position in cdc stream cdc.event.seen.at Timestamp from time when ddl event has been read by the processor
--- title: CaptureChangeSqlServer 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/capturechangesqlserver.md section: Loading & Unloading Data --- # CaptureChangeSqlServer 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-database-cdc-processors-nar ## Description Reads CDC events from a SQL Server database. The processor periodically queries Change Tracking tables in the database, but only for the tables provided by the TableStateService. The processor maintains a state of the last processed event for each table. The processor moves the position after each processed table. The processor supports multi-threading. The number of threads and connection limit configured in the pool collectively define the upper bound of open connections to the source database. The processor outputs two types of FlowFiles: DDLs, containing the initial schema of a table, and then every time its schema changes, and DMLs, with records representing changes to data in the table. One FlowFile always represents data related to a single table. The DDL with the schema is written to the FlowFile content as a JSON object, in a form such as: \{ "columns": [ \{ "name": "<columnName>", "type": "<snowflakeType>", "nullable": <true|false>, "scale": <scale>, "precision": <precision> \}, ... ], "primaryKeys": ["<primaryKey1>", "<primaryKey2>", ...] \} The DML records are structured as: \{ "primaryKeys": \{ "<column>": <value>, ... \}, "payload": \{ "<column>": <value>, ... \}, "metadata": \{ "<column>": <value>, ... \} ## Tags cdc, event, jdbc, sql, sql server ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Column Filter Store Service storing per-table column filtering settings. Connection Pool The connection pool Fetch Size The maximum number of rows loaded into memory at once Max Batch Size The maximum number of rows to fetch in a single batch Record Writer The Record Writer is used for serializing DML events Table Changes Query Interval The minimum time interval that must elapse before scheduling the next query for table changes. This controls the frequency of database polling to prevent excessive querying. Table State Store The shared store holding the state of replicated tables.
## State management
Scopes Description CLUSTER Information such as a version of the last processed record for each table is stored by this processor, such that it can continue from the same location if restarted.
## Relationships
Name Description success Successfully created FlowFile from CDC stream events
--- title: CaptureGoogleDriveChanges 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/capturegoogledrivechanges.md section: Loading & Unloading Data --- # CaptureGoogleDriveChanges 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-google-drive-nar ## Description Captures changes to a Shared Google Drive and emits a FlowFile for each change that occurs. This includes addition and deletion of files, as well as changes to file metadata and permissions. The processor is designed to be used in conjunction with the FetchGoogleDrive processor. ## Tags authorization, cdc, change data capture, cloud, drive, gcp, google, openflow, permissions, storage, unstructured ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Drive ID The ID of the Shared Google Drive to monitor. GCP Credentials Service The Controller Service used to obtain Google Cloud Platform credentials.
## State management
Scopes Description CLUSTER Stores a token/cursor to track which changes have already been processed.
## Relationships
Name Description created This Relationship is used for any files that are created. removed This Relationship is used for any files that are deleted. updated This Relationship is used for any files that are updated.
## Writes attributes
Name Description google.drive.drive.id The ID of the Shared Google Drive. google.drive.file.id The ID of the file that was changed. drive.id The ID of the file that was changed. This is repeated for compatibility with FetchGoogleDrive's default configuration. google.drive.file.name The name of the file that was changed. google.drive.change.type The type of change that occurred. Possible values are 'CREATED', 'UPDATED', or 'DELETED'. google.drive.change.time The timestamp of the change, in milliseconds since the Unix epoch. google.drive.created.time The timestamp when the file was created, in milliseconds since the Unix epoch. google.drive.webUrl A link for opening the file in a relevant Google editor or viewer in a browser. google.drive.size The size of the file in bytes. google.drive.md5 The MD5 checksum of the file. google.drive.version The version of the file. This changes based on user and system based updates to the file. google.drive.mime.type The MIME type of the file. google.drive.lastModifiedBy.displayName A display name of the user that modified the file. google.drive.lastModifiedBy.email An email of the user that modified the file. google.drive.permissions.<role>.users A comma-separated list of email addresses for users with the specified role. Valid roles are 'owner', 'organizer', 'fileOrganizer', 'writer', 'commenter', 'reader'. For example, if the owner is [john.doe@gmail.com](mailto:john.doe@gmail.com) and users [jane.doe@gmail.com](mailto:jane.doe@gmail.com) and [jake.doe@gmail.com](mailto:jake.doe@gmail.com) are readers, there would be an attribute named _google.drive.permissions.owner.users_ with the value _john.doe@gmail.com_, and an attribute named _google.drive.permissions.reader.users_ with the value _jane.doe@gmail.com, jake.doe@gmail.com_ google.drive.permissions.<role>.groups A comma-separated list of email addresses for groups with the specified role. Valid roles are 'owner', 'organizer', 'fileOrganizer', 'writer', 'commenter', 'reader'. For example, if the owner is _employees@openflow-all-dev.iam.gserviceaccount.com_ and the group _contractors@openflow-all-dev.iam.gserviceaccount.com_ is a reader, there would be an attribute named _google.drive.permissions.owner.groups_ with the value _employees@openflow-all-dev.iam.gserviceaccount.com_, and an attribute named _google.drive.permissions.reader.groups_ with the value _contractors@openflow-all-dev.iam.gserviceaccount.com_ google.drive.permissions.<role>.domains A comma-separated list of domain names for which all users have the given role. Valid roles are 'owner', 'organizer', 'fileOrganizer', 'writer', 'commenter', 'reader'. For example, if all users in the domain _snowflake.com_ have the role of reader, there would be an attribute named _google.drive.permissions.reader.domains_ with the value _snowflake.com_ google.drive.permissions.<role>.public If a file is shared publicly, this attribute will be added with a value of 'true' for any role that applies to the public. google.drive.file.path The hierarchical path of the file in Google Drive, e.g. 'parent_folder/child_folder/file.txt'.
## See also - [com.snowflake.openflow.runtime.processors.sharepoint.CaptureSharepointChanges](/user-guide/data-integration/openflow/processors/capturesharepointchanges) - [org.apache.nifi.processors.gcp.drive.FetchGoogleDrive](/user-guide/data-integration/openflow/processors/fetchgoogledrive) --- title: CaptureMicrosoft365GroupsChanges 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/capturemicrosoft365groupschanges.md section: Loading & Unloading Data --- # CaptureMicrosoft365GroupsChanges 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-msgraph-nar ## Description Captures Microsoft365 groups changes and emits a FlowFile for each change that occurs. This includes membership changes. ## Tags cdc, document, graph, library, microsoft, sharepoint, unstructured ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Authentication Service The service that provides authentication for the SharePoint API Fallback Retry Duration The time to wait before retrying the operation after a communication failure. This value is used when the response doesn't contain a Retry-After header.
## State management
Scopes Description CLUSTER Stores a delta token for Microsoft365 groups
## Relationships
Name Description deleted A FlowFile is routed to this relationship for each Microsoft365 group that has been deleted. updated A FlowFile is routed to this relationship for each Microsoft365 group whose membership has changed.
## Writes attributes
Name Description microsoft365.group.id An id of a changed group microsoft365.group.email An email of the changed group
--- title: CaptureSharepointChanges 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/capturesharepointchanges.md section: Loading & Unloading Data --- # CaptureSharepointChanges 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-msgraph-nar ## Description Captures changes from a Sharepoint Document Library and emits a FlowFile for each change that occurs. This includes additions and deletions of files and folders, as well as changes to permissions, metadata, and file content. ## Tags cdc, document, graph, library, microsoft, openflow, sharepoint, unstructured ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Authentication Service The service that provides authentication for the SharePoint API Change Capture Initial Action If the Processor is run without having any prior state, this property dictates how the Processor should treat existing Sharepoint items. Document Library Name The name of the Document Library to list. If not specified, all Document Libraries associated with the Site will be listed. Fallback Retry Duration The time to wait before retrying the operation after a communication failure. This value is used when the response doesn't contain a Retry-After header. Fetch Item Permissions If true, the Processor will fetch user and group permission information for the captured Sharepoint item. Folder Name The name of the Folder/Directory to list Item Permissions To Fetch A comma-separated list of permission types to fetch for the captured Sharepoint item. Available permission types: USER, GROUP, SITE_USER, SITE_GROUP. Site URL The URL of the Sharepoint Site that data will be retrieved from.
## State management
Scopes Description CLUSTER Stores tokens for each Sharepoint folder to track state about which events have already been captured.
## Relationships
Name Description created A FlowFile is routed to this relationship for each Sharepoint item that is created. deleted A FlowFile is routed to this relationship for each Sharepoint item that is deleted. updated A FlowFile is routed to this relationship for each Sharepoint item that is updated.
## Writes attributes
Name Description sharepoint.change.type The type of change that occurred. Possible values are 'Created', 'Updated', 'PermissionsUpdated', 'Deleted'. sharepoint.item.id The ID of the Sharepoint item that was changed. sharepoint.item.type The type of the Sharepoint item that was changed. Possible values are 'File' and 'Folder'. sharepoint.path The path of the Sharepoint item that was changed. This is the path relative to the root of the Document Library. sharepoint.filename The name of the Sharepoint item that was changed. This attribute is not available for 'Deleted' changes. sharepoint.size The size of the Sharepoint item that was changed. sharepoint.createdAt The creation timestamp of the Sharepoint item that was changed. sharepoint.lastModified The last modified timestamp of the Sharepoint item that was changed. sharepoint.createdBy.<identity>.id An id of the identity that created the Sharepoint item that was changed. This attribute is not always available. sharepoint.createdBy.<identity>.displayName A display name of the identity that created the Sharepoint item that was changed. This attribute is not always available. sharepoint.createdBy.<identity>.email An email of the identity that created the Sharepoint item that was changed. This attribute is not always available. sharepoint.lastModifiedBy.<identity>.id An id of the identity that modified the Sharepoint item that was changed. This attribute is not always available. sharepoint.lastModifiedBy.<identity>.displayName A display name of the identity that modified the Sharepoint item that was changed. This attribute is not always available. sharepoint.lastModifiedBy.<identity>.email An email of the identity that modified the Sharepoint item that was changed. This attribute is not always available. sharepoint.drive.id The ID of the Sharepoint Drive that contains the item that was changed. sharepoint.drive.name The name of the Sharepoint Drive that contains the item that was changed. sharepoint.site.id The ID of the Sharepoint Site that contains the item that was changed. sharepoint.site.url The URL of the Sharepoint Site that contains the item that was changed. sharepoint.ctag The CTag of the Sharepoint item that was changed. sharepoint.etag The ETag of the Sharepoint item that was changed. sharepoint.webUrl The browser view url of the Sharepoint item that was changed. sharepoint.permissions.read.groups A comma-separated list of groups that have read permissions on the Sharepoint item that was changed. For each group, if an e-mail address is available in Sharepoint, it will be included. Additionally, the group principal, such as _mygroup@mytenant.onmicrosoft.com_, is included. sharepoint.permissions.read.groups.ids A comma-separated list of group IDs that have read permissions on the Sharepoint item. sharepoint.permissions.read.users A comma-separated list of users that have read permissions on the Sharepoint item that was changed. For each user, if an e-mail address is available in Sharepoint, it will be included. Additionally, the user principal, such as _johndoe@mytenant.onmicrosoft.com_, is included. sharepoint.permissions.read.users.ids A comma-separated list of Microsoft365 user IDs that have read permissions on the Sharepoint item. sharepoint.permissions.read.siteusers A comma-separated list of Sharepoint site user emails that have read permissions on the Sharepoint item. sharepoint.permissions.read.siteusers.ids A comma-separated list of Sharepoint site user IDs that have read permissions on the Sharepoint item. sharepoint.permissions.read.sitegroups.ids A comma-separated list of Sharepoint site group IDs that have read permissions on the Sharepoint item. filename The name of the Sharepoint item that was changed. This attribute is not available for 'Deleted' changes. path The path of the Sharepoint item that was changed. This is the path relative to the root of the Document Library. mime.type The MIME type of the Sharepoint item that was changed. This attribute is only available for 'File' items. hash.quickxor The QuickXor hash of the Sharepoint item that was changed. This attribute is not always available. hash.sha256 The SHA-256 hash of the Sharepoint item that was changed. This attribute is not always available. hash.sha1 The SHA-1 hash of the Sharepoint item that was changed. This attribute is not always available. hash.crc32 The CRC32 hash of the Sharepoint item that was changed. This attribute is not always available.
## Use Cases Involving Other Components | Perform Change Data Capture on a Sharepoint Document Library, retrieving all data in the Document Library, including permissions, in order to keep a destination system in sync with Sharepoint. | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ## See also - [com.snowflake.openflow.runtime.processors.sharepoint.FetchSharepointFile](/user-guide/data-integration/openflow/processors/fetchsharepointfile) --- title: CEFReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/cefreader.md section: Loading & Unloading Data --- # CEFReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Parses CEF (Common Event Format) events, returning each row as a record. This reader allows for inferring a schema based on the first event in the FlowFile or providing an explicit schema for interpreting the values. ## Tags cef, parser, reader, record ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Schema Access Strategy * Schema Access Strategy infer-schema - Use 'Schema Name' Property - Use 'Schema Text' Property - Schema Reference Reader - Infer Schema Specifies how to obtain the schema that is to be used for interpreting the data. Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored. Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved. Accept empty extensions * accept-empty-extensions false - true - false If set to true, empty extensions will be accepted and will be associated to a null value. DateTime Locale * datetime-representation en-US The IETF BCP 47 representation of the Locale to be used when parsing date fields with long or short month names (e.g. may <en-US> vs. mai. <fr-FR>. The defaultvalue is generally safe. Only change if having issues parsing CEF messages Inference Strategy * inference-strategy custom-extensions-inferred - Headers only - Headers and extensions - With custom extensions as strings - With custom extensions inferred Defines the set of fields should be included in the schema and the way the fields are being interpreted. Invalid Field invalid-message-field Used when a line in the FlowFile cannot be parsed by the CEF parser. If set, instead of failing to process the FlowFile, a record is being added with one field. This record contains one field with the name specified by the property and the raw message as value. Raw Message Field raw-message-field If set the raw message will be added to the record using the property value as field name. This is not the same as the "rawEvent" extension field! Schema Inference Cache schema-inference-cache Specifies a Schema Cache to use when inferring the schema. If not populated, the schema will be inferred each time. However, if a cache is specified, the cache will first be consulted and if the applicable schema can be found, it will be used instead of inferring the schema.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: CheckMetaAdsReportReadiness 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/checkmetaadsreportreadiness.md section: Loading & Unloading Data --- # CheckMetaAdsReportReadiness 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-meta-ads-processors-nar ## Description Processor checking if the Meta Ads report is ready for download. ## Tags Facebook, Meta, Meta Ads, report ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Access Token Token required to request Meta Ads Marketing API. It must match pattern 'Bearer <Access Token Value>'. Meta Ads API Version Version of Meta Ads API which is used for report generation. Report ID ID of the generated report. Web Client Service Provider Service providing client for REST request execution.
## Relationships
Name Description failure Error FlowFiles transferred when receiving error response from Meta Ads Marketing API or when an error occurred during response processing. ready Response FlowFiles transferred when receiving Job Completed response from Meta Ads Marketing API. retry Response FlowFiles transferred when report prepared by Meta Ads Marketing API is not yet ready to be downloaded.
## Writes attributes
Name Description meta.ads.report.status Current state of the processed report.
--- title: ChunkRecordText 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/chunkrecordtext.md section: Loading & Unloading Data --- # ChunkRecordText 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-chunking-nar ## Description Chunks text with options for recursively splitting by delimiters and max character length. The input text is expected to be in a record-oriented FlowFile that matches the configured Record Reader format. ## Tags chunk, openflow, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Chunk Count Field Name The field name in the record to write the total number of chunks created from the original record. Chunk Delimiters Specifies a comma-separated list of character sequences. Meta-characters n, r and t are automatically un-escaped. Delimiters are recursively applied in order to chunk the text. Chunk Index Field Name The field name in the record to write the chunk index. Chunk Overlap The max number of characters to include from preceding and subsequent chunks. Chunking Strategy Strategy to chunk text. 'Recursive Delimiters' will chunk text according to the recursive split by character algorithm. In this algorithm input text is split by the first delimiter and merged back into chunks that do not exceed the 'Max Chunk Length'. Any splits that exceed 'Max Chunk Length' are then recursively split using the next delimiter. 'Max Chunk Length' will chunk text by creating chunks that are 'Max Chunk Length' in size. Language Language to use for parsing sentences. Max Chunk Length Maximum number of characters to include in output chunk. Setting this number too high can result in an out of memory error. Record Reader The Record Reader to use for reading the FlowFile. Record Writer The Record Writer to use for writing the results. Sentence Similarity Threshold Threshold for determining if two sentences are similar enough to occupy the same chunk. A value of 1.0 indicates the sentences are identical. A value of 0.0 indicates the sentences are completely dissimilar. Text Record Path The record path to a text field in the record. Trim Whitespace Trim whitespace surrounding the output text chunk.
## Relationships
Name Description original The input Flow File is routed to the original relationship. success Text chunks are routed to the success relationship.
## Writes attributes
Name Description chunk.strategy Strategy used to chunk text. One of 'Max Chunk Length', 'Recursive Delimiters', 'Sentence', 'Semantic'. chunk.semantic.threshold Threshold for determining if two sentences are similar enough to occupy the same chunk. This attribute is added only when the 'Semantic' chunking strategy is used. chunk.language Language used for parsing sentences. This attribute is added only when the 'Sentence' or 'Semantic' chunking strategy is used. chunk.delimiters Comma-separated list of delimiters used to chunk text. This attribute is added only when the 'Recursive Delimiters' chunking strategy is used. chunk.max.chars Maximum number of characters to include in each chunk.
--- title: ChunkText 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/chunktext.md section: Loading & Unloading Data --- # ChunkText 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-chunking-nar ## Description Chunks text with options for recursively splitting by delimiters and max character length. Each chunk is given the following attributes: fragment.identifier, fragment.index, fragment.count, segment.original.filename; these attributes can then be used by the MergeContent processor in order to reconstitute the original FlowFile ## Tags chunk, openflow, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Chunk Delimiters Specifies a comma-separated list of character sequences. Meta-characters n, r and t are automatically un-escaped. Delimiters are recursively applied in order to chunk the text. Chunk Overlap The max number of characters to include from preceding and subsequent chunks. Chunking Strategy Strategy to chunk text. 'Recursive Delimiters' will chunk text according to the recursive split by character algorithm. In this algorithm input text is split by the first delimiter and merged back into chunks that do not exceed the 'Max Chunk Length'. Any splits that exceed 'Max Chunk Length' are then recursively split using the next delimiter. 'Max Chunk Length' will chunk text by creating chunks that are 'Max Chunk Length' in size. Language Language to use for parsing sentences. Max Chunk Length Maximum number of characters to include in output chunk. Setting this number too high can result in an out of memory error. Sentence Similarity Threshold Threshold for determining if two sentences are similar enough to occupy the same chunk. A value of 1.0 indicates the sentences are identical. A value of 0.0 indicates the sentences are completely dissimilar. Trim Whitespace Trim whitespace surrounding the output text chunk.
## Relationships
Name Description failure If any error during parsing occurs, the input Flow File will be routed to the failure relationship. original The input Flow File is routed to the original relationship. success Text chunks are routed to the success relationship.
## Writes attributes
Name Description segment.original.filename Original filename of the input Flow File. fragment.identifier ID of the parent Flow File used to generate each chunk. fragment.index Index of the current Flow File chunk, starting at 0. fragment.count The total count of Flow File chunks produced. chunk.start.offsets The chunk.start.offsets attribute is added only to the original incoming FlowFile. It is a comma-separated list of start offsets for each chunk that gets generated. For example, if the FlowFile is chunked into 3 child FlowFiles, it might have a value of _0,183,365_ indicating that the first chunk starts at offset 0, the second chunk starts at offset 183, and the third chunk starts at offset 365. Offsets are based on the number of characters. chunk.end.offsets The chunk.end.offsets attribute is added only to the original incoming FlowFile. It is a comma-separated list of end offsets for each chunk that gets generated. For example, if the FlowFile is chunked into 3 child FlowFiles, it might have a value of _183,365,548_ indicating that the first chunk ends at offset 183, the second chunk ends at offset 365, and the third chunk ends at offset 548. Offsets are based on the number of characters. chunk.strategy Strategy used to chunk text. One of 'Max Chunk Length', 'Recursive Delimiters', 'Sentence', 'Semantic'. chunk.semantic.threshold Threshold for determining if two sentences are similar enough to occupy the same chunk. This attribute is added only when the 'Semantic' chunking strategy is used. chunk.language Language used for parsing sentences. This attribute is added only when the 'Sentence' or 'Semantic' chunking strategy is used. chunk.delimiters Comma-separated list of delimiters used to chunk text. This attribute is added only when the 'Recursive Delimiters' chunking strategy is used. chunk.max.chars Maximum number of characters to include in each chunk.
--- title: CompressContent 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/compresscontent.md section: Loading & Unloading Data --- # CompressContent 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Compresses or decompresses the contents of FlowFiles using a user-specified compression algorithm and updates the mime.type attribute as appropriate. A common idiom is to precede CompressContent with IdentifyMimeType and configure Mode='decompress' AND Compression Format='use mime.type attribute'. When used in this manner, the MIME type is automatically detected and the data is decompressed, if necessary. If decompression is unnecessary, the data is passed through to the 'success' relationship. This processor operates in a very memory efficient way so very large objects well beyond the heap size are generally fine to process. ## Tags brotli, bzip2, compress, content, decompress, deflate, gzip, lz4-framed, lzma, snappy, snappy framed, snappy-hadoop, xz-lzma2, zstd ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Compression Format The compression format to use. Valid values are: GZIP, Deflate, ZSTD, BZIP2, XZ-LZMA2, LZMA, Brotli, Snappy, Snappy Hadoop, Snappy Framed, and LZ4-Framed Compression Level The compression level to use; this is valid only when using gzip, deflate or xz-lzma2 compression. A lower value results in faster processing but less compression; a value of 0 indicates no (that is, simple archiving) for gzip or minimal for xz-lzma2 compression. Higher levels can mean much larger memory usage such as the case with levels 7-9 for xz-lzma/2 so be careful relative to heap size. Mode Indicates whether the processor should compress content or decompress content. Must be either 'compress' or 'decompress' Update Filename If true, will remove the filename extension when decompressing data (only if the extension indicates the appropriate compression format) and add the appropriate extension when compressing data
## Relationships
Name Description failure FlowFiles will be transferred to the failure relationship if they fail to compress/decompress success FlowFiles will be transferred to the success relationship after successfully being compressed or decompressed
## Writes attributes
Name Description mime.type If the Mode property is set to compress, the appropriate MIME Type is set. If the Mode property is set to decompress and the file is successfully decompressed, this attribute is removed, as the MIME Type is no longer known.
## Use cases | Compress the contents of a FlowFile | | ------------------------------------- | | Decompress the contents of a FlowFile | ## Use Cases Involving Other Components | Check whether or not a FlowFile is compressed and if so, decompress it. | | ----------------------------------------------------------------------- | --- title: Configure other authentication methods for Openflow Connector for Kafka source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kafka/authentication.md section: Loading & Unloading Data --- # Configure other authentication methods for Openflow Connector for Kafka This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [About Openflow](/user-guide/data-integration/openflow/about) - [Manage Openflow](/user-guide/data-integration/openflow/manage) - [Openflow connectors](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [About Openflow Connector for Kafka](about) - [Set up the Openflow Connector for Kafka](setup) This topic describes how to configure other authentication methods for the Openflow Connector for Kafka. The connector supports multiple authentication mechanisms beyond basic SASL authentication. Basic SASL authentication is configured through parameter contexts as described in [Set up the Openflow Connector for Kafka](setup). This page covers other authentication methods that require additional service configuration. ## Supported Authentication Methods The Openflow Connector for Kafka supports the following authentication mechanisms: - SASL with the following SASL mechanisms (configured via parameter contexts): - PLAIN - SCRAM-SHA-256 - SCRAM-SHA-512 - SASL with AWS MSK IAM (extra configuration required via services) - mTLS (extra configuration required via services) ## Configuring mTLS Authentication mTLS (mutual Transport Layer Security) authentication requires both the client and server to present certificates for mutual authentication. ### Prerequisites Before configuring mTLS authentication, ensure you have: 1. Generated and configured the necessary certificates for both the connector and the Kafka broker 2. Created a keystore containing the connector's private key and certificate 3. (Optional) Created a truststore containing the Kafka broker certificate or a certificate in the certification chain. This step is only required if the broker certificate is not signed by a trusted Certificate Authority (CA). 4. The supported keystore/truststore formats are PKCS12, JKS, and BCFKS ### Step 1: Configure SSL Context Service 1. From the NiFi canvas, access the **Controller Services** configuration: - Double click on the connector's processing group - Right-click on the canvas and select **Controller Services** 2. Add a new **StandardSSLContextService**. - Click the **+** to add a new controller service. - Select **StandardSSLContextService** from the list. - Click **Add**. 3. Configure the SSL Context Service properties:
Property Value Keystore Filename Full path to your keystore file (e.g., `/path/to/client-keystore.p12`), or Asset reference Keystore Password Password for the keystore Keystore Type Keystore format (`PKCS12`, `JKS`, or `BCFKS`) Key Password Password for the private key (if the key is encrypted) Truststore Filename Full path to your truststore file (e.g., `/path/to/client-truststore.p12`), or Asset reference Truststore Password Password for the truststore Truststore Type Truststore format (`PKCS12`, `JKS`, or `BCFKS`)
4. Enable the SSL Context Service: - Click **Enable** for the service. - Confirm that the service status shows as **Enabled**. ### Step 2: Configure Kafka3Connection Service 1. In the same **Controller Services** tab, locate the **Kafka3Connection** service. 2. Configure the following properties:
Property Value Security Protocol `SSL` SSL Context Service Select the SSL Context Service you created in Step 1
3. Keep all other [Kafka3Connection service](/user-guide/data-integration/openflow/controllers/kafka3connectionservice) settings unchanged 4. Verify the Kafka3Connection service: - Click **Verify** for the service. - Confirm that the service status shows as **Verified**. ## Configuring AWS MSK IAM Authentication AWS MSK IAM authentication allows you to use AWS Identity and Access Management (IAM) to authenticate to Amazon Managed Streaming for Apache Kafka (MSK). ### Prerequisites 1. Your Kafka cluster must be Amazon MSK with IAM authentication enabled. 2. You need to provide IAM credentials in Openflow with BYOC (bring your own cloud) configurations, deployed in your cloud. 3. The IAM role or user must have the necessary MSK permissions. ### Step 1: Create AmazonMSKConnectionService 1. From the NiFi canvas, access the **Controller Services** configuration: - Double click on the connector's processing group - Right-click on the canvas and select **Controller Services** 2. Add a new [AmazonMSKConnectionService](/user-guide/data-integration/openflow/controllers/amazonmskconnectionservice). - Click **+** to add a new controller service. - Select **AmazonMSKConnectionService** from the list. - Click **Add** 3. Configure the AmazonMSKConnectionService properties:
Property Value SASL Mechanism `AWS_MSK_IAM` Security Protocol `#{Kafka Security Protocol}` Bootstrap Servers `#{Kafka Bootstrap Servers}`
4. Verify the AmazonMSKConnectionService: - Click **Verify** for the service - Confirm that the service status shows as **Verified** ### Step 2: Configure ConsumeKafka Processor 1. In your Kafka connector flow, locate the **ConsumeKafka** processor 2. Configure the processor to use the new connection service: - Set the **Kafka Connection Service** property to the AmazonMSKConnectionService you created in [](#label-kafka-configure-aws-msk-iam-authentication). ### Step 3: Remove Old Kafka Connection Service 1. In the **Controller Services** tab, locate the old **Kafka3Connection** service. 2. Disable and remove the old service: - Click **Disable** for the old service. - Once disabled, click **Delete** to remove the old service. --- title: ConfluentEncodedSchemaReferenceReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/confluentencodedschemareferencereader.md section: Loading & Unloading Data --- # ConfluentEncodedSchemaReferenceReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Reads Schema Identifier according to Confluent encoding as a header consisting of a byte marker and an integer represented as four bytes ## Tags avro, confluent, kafka, registry, schema ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: ConfluentEncodedSchemaReferenceWriter source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/confluentencodedschemareferencewriter.md section: Loading & Unloading Data --- # ConfluentEncodedSchemaReferenceWriter This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Writes Schema Identifier according to Confluent encoding as a header consisting of a byte marker and an integer represented as four bytes ## Tags avro, confluent, kafka, registry, schema ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: ConfluentProtobufMessageNameResolver source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/confluentprotobufmessagenameresolver.md section: Loading & Unloading Data --- # ConfluentProtobufMessageNameResolver This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Resolves Protobuf message names from Confluent Schema Registry wire format by decoding message indexes and looking up the fully qualified name in the schema definition For Confluent wire format reference see: [https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#wire-format](https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#wire-format) ## Tags confluent, message, name, protobuf, registry, resolver, schema ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: ConfluentSchemaRegistry source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/confluentschemaregistry.md section: Loading & Unloading Data --- # ConfluentSchemaRegistry This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a Schema Registry that interacts with the Confluent Schema Registry so that those Schemas that are stored in the Confluent Schema Registry can be used in NiFi. The Confluent Schema Registry has a notion of a "subject" for schemas, which is their terminology for a schema name. When a Schema is looked up by name by this registry, it will find a Schema in the Confluent Schema Registry with that subject. ## Tags avro, confluent, kafka, registry, schema ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Authentication Type Authentication Type NONE - BASIC - NONE HTTP Client Authentication Type for Confluent Schema Registry Cache Expiration * Cache Expiration 1 hour Specifies how long a Schema that is cached should remain in the cache. Once this time period elapses, a cached version of a schema will no longer be used, and the service will have to communicate with the Schema Registry again in order to obtain the schema. Cache Size * Cache Size 1000 Specifies how many Schemas should be cached from the Schema Registry Communications Timeout * Communications Timeout 30 secs Specifies how long to wait to receive data from the Schema Registry before considering the communications a failure Password Password Password for authentication to Confluent Schema Registry SSL Context Service SSL Context Service Specifies the SSL Context Service to use for interacting with the Confluent Schema Registry Schema Registry URLs * Schema Registry URLs [http://localhost:8081](http://localhost:8081) A comma-separated list of URLs of the Schema Registry to interact with Username Username Username for authentication to Confluent Schema Registry
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: ConnectWebSocket 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/connectwebsocket.md section: Loading & Unloading Data --- # ConnectWebSocket 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-websocket-processors-nar ## Description Acts as a WebSocket client endpoint to interact with a remote WebSocket server. FlowFiles are transferred to downstream relationships according to received message types as WebSocket client configured with this processor receives messages from remote WebSocket server. If a new flowfile is passed to the processor, the previous sessions will be closed and any data being sent will be aborted. ## Tags WebSocket, consume, listen, subscribe ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description websocket-client-controller-service A WebSocket CLIENT Controller Service which can connect to a WebSocket server. websocket-client-id The client ID to identify WebSocket session. It should be unique within the WebSocket Client Controller Service. Otherwise, it throws WebSocketConfigurationException when it gets started.
## Relationships
Name Description binary message The WebSocket binary message output connected The WebSocket session is established disconnected The WebSocket session is disconnected failure FlowFile holding connection configuration attributes (like URL or HTTP headers) in case of connection failure success FlowFile holding connection configuration attributes (like URL or HTTP headers) in case of successful connection text message The WebSocket text message output
## Writes attributes
Name Description websocket.controller.service.id WebSocket Controller Service id. websocket.session.id Established WebSocket session id. websocket.endpoint.id WebSocket endpoint id. websocket.local.address WebSocket client address. websocket.remote.address WebSocket server address. websocket.message.type TEXT or BINARY.
--- title: ConsumeAMQP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumeamqp.md section: Loading & Unloading Data --- # ConsumeAMQP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-amqp-nar ## Description Consumes AMQP Messages from an AMQP Broker using the AMQP 0.9.1 protocol. Each message that is received from the AMQP Broker will be emitted as its own FlowFile to the 'success' relationship. ## Tags amqp, consume, get, message, rabbit, receive ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AMQP Version AMQP Version. Currently only supports AMQP v0.9.1. Auto-Acknowledge Messages If false (Non-Auto-Acknowledge), the messages will be acknowledged by the processor after transferring the FlowFiles to success and committing the NiFi session. Non-Auto-Acknowledge mode provides 'at-least-once' delivery semantics. If true (Auto-Acknowledge), messages that are delivered to the AMQP Client will be auto-acknowledged by the AMQP Broker just after sending them out. This generally will provide better throughput but will also result in messages being lost upon restart/crash of the AMQP Broker, NiFi or the processor. Auto-Acknowledge mode provides 'at-most-once' delivery semantics and it is recommended only if loosing messages is acceptable. Batch Size The maximum number of messages that should be processed in a single session. Once this many messages have been received (or once no more messages are readily available), the messages received will be transferred to the 'success' relationship and the messages will be acknowledged to the AMQP Broker. Setting this value to a larger number could result in better performance, particularly for very small messages, but can also result in more messages being duplicated upon sudden restart of NiFi. Brokers A comma-separated list of known AMQP Brokers in the format <host>:<port> (e.g., localhost:5672). If this is set, Host Name and Port are ignored. Only include hosts from the same AMQP cluster. Client Certificate Authentication Enabled Authenticate using the SSL certificate rather than user name/password. Header Key Prefix Text to be prefixed to header keys as the are added to the FlowFile attributes. Processor will append '.' to the value of this property Header Output Format Defines how to output headers from the received message Header Separator The character that is used to separate key-value for header in String. The value must be only one character. Host Name Network address of AMQP broker (e.g., localhost). If Brokers is set, then this property is ignored. Max Inbound Message Body Size Maximum body size of inbound (received) messages. Password Password used for authentication and authorization. Port Numeric value identifying Port of AMQP broker (e.g., 5671). If Brokers is set, then this property is ignored. Prefetch Count The maximum number of unacknowledged messages for the consumer. If consumer has this number of unacknowledged messages, AMQP broker will no longer send new messages until consumer acknowledges some of the messages already delivered to it. Allowed values: from 0 to 65535.0 means no limit Queue The name of the existing AMQP Queue from which messages will be consumed. Usually pre-defined by AMQP administrator. Remove Curly Braces If true Remove Curly Braces, Curly Braces in the header will be automatically remove. SSL Context Service The SSL Context Service used to provide client certificate information for TLS/SSL connections. Username Username used for authentication and authorization. Virtual Host Virtual Host name which segregates AMQP system for enhanced security.
## Relationships
Name Description success All FlowFiles that are received from the AMQP queue are routed to this relationship
## Writes attributes
Name Description amqp$appId The App ID field from the AMQP Message amqp$contentEncoding The Content Encoding reported by the AMQP Message amqp$contentType The Content Type reported by the AMQP Message amqp$headers The headers present on the AMQP Message. Added only if processor is configured to output this attribute. <Header Key Prefix>.<attribute> Each message header will be inserted with this attribute name, if processor is configured to output headers as attribute amqp$deliveryMode The numeric indicator for the Message's Delivery Mode amqp$priority The Message priority amqp$correlationId The Message's Correlation ID amqp$replyTo The value of the Message's Reply-To field amqp$expiration The Message Expiration amqp$messageId The unique ID of the Message amqp$timestamp The timestamp of the Message, as the number of milliseconds since epoch amqp$type The type of message amqp$userId The ID of the user amqp$clusterId The ID of the AMQP Cluster amqp$routingKey The routingKey of the AMQP Message amqp$exchange The exchange from which AMQP Message was received
--- title: ConsumeAzureEventHub 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumeazureeventhub.md section: Loading & Unloading Data --- # ConsumeAzureEventHub 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Receives messages from Microsoft Azure Event Hubs with checkpointing to ensure consistent event processing. Checkpoint tracking avoids consuming a message multiple times and enables reliable resumption of processing in the event of intermittent network failures. Checkpoint tracking requires external storage and provides the preferred approach to consuming messages from Azure Event Hubs. In clustered environment, ConsumeAzureEventHub processor instances form a consumer group and the messages are distributed among the cluster nodes (each message is processed on one cluster node only). ## Tags azure, cloud, eventhub, events, microsoft, streaming, streams ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Batch Size The number of messages to process within a NiFi session. This parameter affects throughput and consistency. NiFi commits its session and Event Hubs checkpoints after processing this number of messages. If NiFi session is committed, but fails to create an Event Hubs checkpoint, then it is possible that the same messages will be received again. The higher number, the higher throughput, but possibly less consistent. Checkpoint Strategy Specifies which strategy to use for storing and retrieving partition ownership and checkpoint information for each partition. Consumer Group The name of the consumer group to use. Event Hub Name The name of the event hub to pull messages from. Event Hub Namespace The namespace that the Azure Event Hubs is assigned to. This is generally equal to <Event Hub Names>-ns. Initial Offset Specify where to start receiving messages if offset is not yet stored in the checkpoint store. Message Receive Timeout The amount of time this consumer should wait to receive the Batch Size before returning. Prefetch Count Record Reader The Record Reader to use for reading received messages. The event hub name can be referred by Expression Language '$\{eventhub.name\}' to access a schema. Record Writer The Record Writer to use for serializing Records to an output FlowFile. The event hub name can be referred by Expression Language '$\{eventhub.name\}' to access a schema. If not specified, each message will create a FlowFile. Service Bus Endpoint To support namespaces not in the default windows.net domain. Shared Access Policy Key The key of the shared access policy. Either the primary or the secondary key can be used. Shared Access Policy Name The name of the shared access policy. This policy must have Listen claims. Storage Account Key The Azure Storage account key to store event hub consumer group state. Storage Account Name Name of the Azure Storage account to store event hub consumer group state. Storage Container Name Name of the Azure Storage container to store the event hub consumer group state. If not specified, event hub name is used. Storage SAS Token The Azure Storage SAS token to store Event Hub consumer group state. Always starts with a ? character. Transport Type Advanced Message Queuing Protocol Transport Type for communication with Azure Event Hubs Use Azure Managed Identity Choose whether or not to use the managed identity of Azure VM/VMSS proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## State management
Scopes Description LOCAL Local state is used to store the client id. Cluster state is used to store partition ownership and checkpoint information when component state is configured as the checkpointing strategy. CLUSTER Local state is used to store the client id. Cluster state is used to store partition ownership and checkpoint information when component state is configured as the checkpointing strategy.
## Relationships
Name Description success FlowFiles received from Event Hub.
## Writes attributes
Name Description eventhub.enqueued.timestamp The time (in milliseconds since epoch, UTC) at which the message was enqueued in the event hub eventhub.offset The offset into the partition at which the message was stored eventhub.sequence The sequence number associated with the message eventhub.name The name of the event hub from which the message was pulled eventhub.partition The name of the partition from which the message was pulled eventhub.property.* The application properties of this message. IE: 'application' would be 'eventhub.property.application'
--- title: ConsumeBoxEnterpriseEvents 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumeboxenterpriseevents.md section: Loading & Unloading Data --- # ConsumeBoxEnterpriseEvents 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Consumes Enterprise Events from Box admin_logs_streaming Stream Type. The content of the events is sent to the 'success' relationship as a JSON array. The last known position of the Box stream is stored in the processor state and is used to resume the stream from the last known position when the processor is restarted. ## Tags box, storage ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Box Client Service Controller Service used to obtain a Box API connection. Event Types A comma separated list of Enterprise Events to consume. If not set, all Events are consumed. See Additional Details for more information. Start Event Position What position to consume the Events from. Start Offset The offset to start consuming the Events from.
## State management
Scopes Description CLUSTER The last known position of the Box Event stream is stored in the processor state and is used to resume the stream from the last known position when the processor is restarted.
## Relationships
Name Description success Events received successfully will be sent out this relationship.
## See also - [org.apache.nifi.processors.box.ConsumeBoxEvents](/user-guide/data-integration/openflow/processors/consumeboxevents) - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) --- title: ConsumeBoxEvents 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumeboxevents.md section: Loading & Unloading Data --- # ConsumeBoxEvents 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Consumes all events from Box. This processor can be used to capture events such as uploads, modifications, deletions, etc. The content of the events is sent to the 'success' relationship as a JSON array. Events can be dropped in case of NiFi restart or if the queue capacity is exceeded. The last known position of the Box stream is stored in the processor state and is used to resume the stream from the last known position when the processor is restarted. ## Tags box, storage ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Box Client Service Controller Service used to obtain a Box API connection. Queue Capacity The maximum size of the internal queue used to buffer events being transferred from the underlying stream to the processor. Setting this value higher allows more messages to be buffered in memory during surges of incoming messages, but increases the total memory used by the processor during these surges.
## State management
Scopes Description CLUSTER The last known position of the Box stream is stored in the processor state and is used to resume the stream from the last known position when the processor is restarted.
## Relationships
Name Description success Events received successfully will be sent out this relationship.
## See also - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) - [org.apache.nifi.processors.box.PutBoxFile](/user-guide/data-integration/openflow/processors/putboxfile) --- title: ConsumeElasticsearch 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumeelasticsearch.md section: Loading & Unloading Data --- # ConsumeElasticsearch 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-elasticsearch-restapi-nar ## Description A processor that repeatedly runs a paginated query against a field using a Range query to consume new Documents from an Elasticsearch index/query. The processor will retrieve multiple pages of results until either no more results are available or the Pagination Keep Alive expiration is reached, after which the Range query will automatically update the field constraint based on the last retrieved Document value. ## Tags elasticsearch, elasticsearch7, elasticsearch8, elasticsearch9, json, page, query, scroll, search ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Additional Filters One or more query filters in JSON syntax, not Lucene syntax. Ex: [\{"match":\{"somefield":"somevalue"\}\}, \{"match":\{"anotherfield":"anothervalue"\}\}]. These filters wil be used as part of a Bool query's filter. Aggregation Results Format Format of Aggregation output. Aggregation Results Split Output a flowfile containing all aggregations or one flowfile for each individual aggregation. Aggregations One or more query aggregations (or "aggs"), in JSON syntax. Ex: \{"items": \{"terms": \{"field": "product", "size": 10\}\}\} Client Service An Elasticsearch client service to use for running queries. Fields Fields of indexed documents to be retrieved, in JSON syntax. Ex: ["user.id", "http.response.*", \{"field": "@timestamp", "format": "epoch_millis"\}] Index The name of the index to use. Initial Value The initial value to use for the query if the processor has not run previously. If the processor has run previously and stored a value in its state, this property will be ignored. If no value is provided, and the processor has not previously run, no Range query bounds will be used, i.e. all documents will be retrieved in the specified "Sort Order". Initial Value Date Format If the "Range Query Field" is a Date field, convert the "Initial Value" to a date with this format. If not specified, Elasticsearch will use the date format provided by the "Range Query Field"'s mapping. For valid syntax, see [https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html) Initial Value Date Time Zone If the "Range Query Field" is a Date field, convert the "Initial Value" to UTC with this time zone. Valid values are ISO 8601 UTC offsets, such as "+01:00" or "-08:00", and IANA time zone IDs, such as "Europe/London". Max JSON Field String Length The maximum allowed length of a string value when parsing a JSON document or attribute. Output No Hits Output a "hits" flowfile even if no hits found for query. If true, an empty "hits" flowfile will be output even if "aggregations" are output. Pagination Keep Alive Pagination "keep_alive" period. Period Elasticsearch will keep the scroll/pit cursor alive in between requests (this is not the time expected for all pages to be returned, but the maximum allowed time for requests between page retrievals). Pagination Type Pagination method to use. Not all types are available for all Elasticsearch versions, check the Elasticsearch docs to confirm which are applicable and recommended for your service. Query Attribute If set, the executed query will be set on each result flowfile in the specified attribute. Range Query Field Field to be tracked as part of an Elasticsearch Range query using a "gt" bound match. This field must exist within the Elasticsearch document for it to be retrieved. Script Fields Fields to created using script evaluation at query runtime, in JSON syntax. Ex: \{"test1": \{"script": \{"lang": "painless", "source": "doc[ 'price'].value * 2"\}\}, "test2": \{"script": \{"lang": "painless", "source": "doc[ 'price'].value * params.factor", "params": \{"factor": 2.0\}\}\}\} Search Results Format Format of Hits output. Search Results Split Output a flowfile containing all hits or one flowfile for each individual hit or one flowfile containing all hits from all paged responses. Size The maximum number of documents to retrieve in the query. If the query is paginated, this "size" applies to each page of the query, not the "size" of the entire result set. Sort Sort results by one or more fields, in JSON syntax. Ex: [\{"price" : \{"order" : "asc", "mode" : "avg"\}\}, \{"post_date" : \{"format": "strict_date_optional_time_nanos"\}\}] Sort Order The order in which to sort the "Range Query Field". A "sort" clause for the "Range Query Field" field will be prepended to any provided "Sort" clauses. If a "sort" clause already exists for the "Range Query Field" field, it will not be updated. Type The type of this document (used by Elasticsearch for indexing and searching).
## State management
Scopes Description CLUSTER The pagination state (scrollId, searchAfter, pitId, hitCount, pageCount, pageExpirationTimestamp, trackingRangeValue) is retained in between invocations of this processor until the Scroll/PiT has expired (when the current time is later than the last query execution plus the Pagination Keep Alive interval).
## Relationships
Name Description aggregations Aggregations are routed to this relationship. failure All flowfiles that fail for reasons unrelated to server availability go to this relationship. hits Search hits are routed to this relationship. retry All flowfiles that fail due to server/cluster availability go to this relationship.
## Writes attributes
Name Description mime.type application/json page.number The number of the page (request), starting from 1, in which the results were returned that are in the output flowfile hit.count The number of hits that are in the output flowfile elasticsearch.query.error The error message provided by Elasticsearch if there is an error querying the index.
## See also - [org.apache.nifi.processors.elasticsearch.PaginatedJsonQueryElasticsearch](/user-guide/data-integration/openflow/processors/paginatedjsonqueryelasticsearch) - [org.apache.nifi.processors.elasticsearch.SearchElasticsearch](/user-guide/data-integration/openflow/processors/searchelasticsearch) --- title: ConsumeGCPubSub 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumegcpubsub.md section: Loading & Unloading Data --- # ConsumeGCPubSub 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Consumes messages from the configured Google Cloud PubSub subscription. The 'Batch Size' property specified the maximum number of messages that will be pulled from the subscription in a single request. The 'Processing Strategy' property specifies if each message should be its own FlowFile or if messages should be grouped into a single FlowFile. Using the Demarcator strategy will provide best throughput when the format allows it. Using Record allows to convert data format as well as doing schema enforcement. Using the FlowFile strategy will generate one FlowFile per message and will have the message's attributes as FlowFile attributes. ## Tags consume, gcp, google, google-cloud, message, pubsub ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description GCP Credentials Provider Service The Controller Service used to obtain Google Cloud Platform credentials. Message Demarcator Since the PubSub client receives messages in batches, this Processor has an option to output FlowFiles which contains all the messages in a single batch. This property allows you to provide a string (interpreted as UTF-8) to use for demarcating apart multiple messages. To enter special character such as 'new line' use CTRL+Enter or Shift+Enter depending on the OS. Output Strategy The format used to output the Kafka Record into a FlowFile Record. Processing Strategy Strategy for processing PubSub Records and writing serialized output to FlowFiles Record Reader The Record Reader to use for incoming messages Record Writer The Record Writer to use in order to serialize the outgoing FlowFiles api-endpoint Override the gRPC endpoint in the form of [host:port] gcp-project-id Google Cloud Project ID gcp-pubsub-publish-batch-size Indicates the number of messages the cloud service should bundle together in a batch. If not set and left empty, only one message will be used in a batch gcp-pubsub-subscription Name of the Google Cloud Pub/Sub Subscription proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description success FlowFiles are routed to this relationship after a successful Google Cloud Pub/Sub operation.
## Writes attributes
Name Description gcp.pubsub.ackId Acknowledgement Id of the consumed Google Cloud PubSub message gcp.pubsub.messageSize Serialized size of the consumed Google Cloud PubSub message gcp.pubsub.attributesCount Number of attributes the consumed PubSub message has, if any gcp.pubsub.publishTime Timestamp value when the message was published gcp.pubsub.subscription Name of the PubSub subscription Dynamic Attributes Other than the listed attributes, this processor may write zero or more attributes, if the original Google Cloud Publisher client added any attributes to the message while sending
## See also - [org.apache.nifi.processors.gcp.pubsub.PublishGCPubSub](/user-guide/data-integration/openflow/processors/publishgcpubsub) --- title: ConsumeIMAP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumeimap.md section: Loading & Unloading Data --- # ConsumeIMAP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-email-nar ## Description Consumes messages from Email Server using IMAP protocol. The raw-bytes of each received email message are written as contents of the FlowFile ## Tags Consume, Email, Get, Imap, Ingest, Ingress, Message ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Authorization Mode How to authorize sending email on the user's behalf. Connection Timeout The amount of time to wait to connect to Email server Delete Messages Specify whether mail messages should be deleted after retrieval. Fetch Size Specify the maximum number of Messages to fetch per call to Email Server. Folder Email folder to retrieve messages from (e.g., INBOX) Host Name Network address of Email server (e.g., pop.gmail.com, imap.gmail.com . .) Mark Messages as Read Specify if messages should be marked as read after retrieval. OAuth2 Access Token Provider OAuth2 service that can provide access tokens. Password Password used for authentication and authorization with Email server. Port Numeric value identifying Port of Email server (e.g., 993) Use SSL Specifies if IMAP connection must be obtained via SSL encrypted connection (i.e., IMAPS) User Name User Name used for authentication and authorization with Email server.
## Relationships
Name Description success All messages that are the are successfully received from Email server and converted to FlowFiles are routed to this relationship
--- title: ConsumeJMS 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumejms.md section: Loading & Unloading Data --- # ConsumeJMS 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-jms-processors-nar ## Description Consumes JMS Message of type BytesMessage, TextMessage, ObjectMessage, MapMessage or StreamMessage transforming its content to a FlowFile and transitioning it to 'success' relationship. JMS attributes such as headers and properties will be copied as FlowFile attributes. MapMessages will be transformed into JSONs and then into byte arrays. The other types will have their raw contents as byte array transferred into the flowfile. ## Tags consume, get, jms, message, receive ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Acknowledgement Mode The JMS Acknowledgement Mode. Using Auto Acknowledge can cause messages to be lost on restart of NiFi but may provide better performance than Client Acknowledge. Connection Client ID The client id to be set on the connection, if set. For durable non shared consumer this is mandatory, for all others it is optional, typically with shared consumers it is undesirable to be set. Please see JMS spec for further details Connection Factory Service The Controller Service that is used to obtain Connection Factory. Alternatively, the 'JNDI *' or the 'JMS *' properties can also be used to configure the Connection Factory. Destination Name The name of the JMS Destination. Usually provided by the administrator (e.g., 'topic://myTopic' or 'myTopic'). Destination Type The type of the JMS Destination. Could be one of 'QUEUE' or 'TOPIC'. Usually provided by the administrator. Defaults to 'QUEUE' Durable subscription If destination is Topic if present then make it the consumer durable. @see [https://jakarta.ee/specifications/platform/9/apidocs/jakarta/jms/session#createDurableConsumer-jakarta.jms](https://jakarta.ee/specifications/platform/9/apidocs/jakarta/jms/session#createDurableConsumer-jakarta.jms). Topic-java.lang. String- Error Queue Name The name of a JMS Queue where - if set - unprocessed messages will be routed. Usually provided by the administrator (e.g., 'queue://myErrorQueue' or 'myErrorQueue').Only applicable if 'Destination Type' is set to 'QUEUE' Maximum Batch Size The maximum number of messages to publish or consume in each invocation of the processor. Message Selector The JMS Message Selector to filter the messages that the processor will receive Password Password used for authentication and authorization. SSL Context Service The SSL Context Service used to provide client certificate information for TLS/SSL connections. Shared subscription If destination is Topic if present then make it the consumer shared. @see [https://jakarta.ee/specifications/platform/9/apidocs/jakarta/jms/session#createSharedConsumer-jakarta.jms](https://jakarta.ee/specifications/platform/9/apidocs/jakarta/jms/session#createSharedConsumer-jakarta.jms). Topic-java.lang. String- Subscription Name The name of the subscription to use if destination is Topic and is shared or durable. Timeout How long to wait to consume a message from the remote broker before giving up. User Name User Name used for authentication and authorization. broker URI pointing to the network location of the JMS Message broker. Example for ActiveMQ: '[tcp://myhost:61616](tcp://myhost:61616)'. Examples for IBM MQ: 'myhost(1414)' and 'myhost01(1414),myhost02(1414)'. cf The fully qualified name of the JMS ConnectionFactory implementation class (eg. org.apache.activemq. ActiveMQConnectionFactory). cflib Path to the directory with additional resources (eg. JARs, configuration files etc.) to be added to the classpath (defined as a comma separated list of values). Such resources typically represent target JMS client libraries for the ConnectionFactory implementation. character-set The name of the character set to use to construct or interpret TextMessages connection.factory.name The name of the JNDI Object to lookup for the Connection Factory. java.naming.factory.initial The fully qualified class name of the JNDI Initial Context Factory Class (java.naming.factory.initial). java.naming.provider.url The URL of the JNDI Provider to use as the value for java.naming.provider.url. See additional details documentation for allowed URL schemes. java.naming.security.credentials The Credentials to use when authenticating with JNDI (java.naming.security.credentials). java.naming.security.principal The Principal to use when authenticating with JNDI (java.naming.security.principal). naming.factory.libraries Specifies jar files and/or directories to add to the ClassPath in order to load the JNDI / JMS client libraries. This should be a comma-separated list of files, directories, and/or URLs. If a directory is given, any files in that directory will be included, but subdirectories will not be included (i.e., it is not recursive). output-strategy The format used to output the JMS message into a FlowFile record. record-reader The Record Reader to use for parsing received JMS Messages into Records. record-writer The Record Writer to use for serializing Records before writing them to a FlowFile.
## Restrictions
Required Permission Explanation reference remote resources Client Library Location can reference resources over HTTP
## Relationships
Name Description parse.failure If a message cannot be parsed using the configured Record Reader, the contents of the message will be routed to this Relationship as its own individual FlowFile. success All FlowFiles that are received from the JMS Destination are routed to this relationship
## Writes attributes
Name Description jms_deliveryMode The JMSDeliveryMode from the message header. jms_expiration The JMSExpiration from the message header. jms_priority The JMSPriority from the message header. jms_redelivered The JMSRedelivered from the message header. jms_timestamp The JMSTimestamp from the message header. jms_correlationId The JMSCorrelationID from the message header. jms_messageId The JMSMessageID from the message header. jms_type The JMSType from the message header. jms_replyTo The JMSReplyTo from the message header. jms_destination The JMSDestination from the message header. jms.messagetype The JMS message type, can be TextMessage, BytesMessage, ObjectMessage, MapMessage or StreamMessage). other attributes Each message property is written to an attribute.
## See also - [org.apache.nifi.jms.processors.PublishJMS](/user-guide/data-integration/openflow/processors/publishjms) --- title: ConsumeKafka 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumekafka.md section: Loading & Unloading Data --- # ConsumeKafka 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-kafka-nar ## Description Consumes messages from Apache Kafka Consumer API. The complementary NiFi processor for sending messages is PublishKafka. The Processor supports consumption of Kafka messages, optionally interpreted as NiFi records. Please note that, at this time (in read record mode), the Processor assumes that all records that are retrieved from a given partition have the same schema. For this mode, if any of the Kafka messages are pulled but cannot be parsed or written with the configured Record Reader or Record Writer, the contents of the message will be written to a separate FlowFile, and that FlowFile will be transferred to the 'parse.failure' relationship. Otherwise, each FlowFile is sent to the 'success' relationship and may contain many individual messages within the single FlowFile. A 'record.count' attribute is added to indicate how many messages are contained in the FlowFile. No two Kafka messages will be placed into the same FlowFile if they have different schemas, or if they have different values for a message header that is included by the <Headers to Add as Attributes> property. ## Tags avro, consume, csv, get, ingest, ingress, json, kafka, openflow, pubsub, record, topic ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Commit Offsets Specifies whether this Processor should commit the offsets to Kafka after receiving messages. Typically, this value should be set to true so that messages that are received are not duplicated. However, in certain scenarios, we may want to avoid committing the offsets, that the data can be processed and later acknowledged by PublishKafka in order to provide Exactly Once semantics. Content Field Specifies under what field of the record the content will be added. If not set, the content will be at the root of the record Group ID Kafka Consumer Group Identifier corresponding to Kafka group.id property Header Encoding Character encoding applied when reading Kafka Record Header values and writing FlowFile attributes Header Name Pattern Regular Expression Pattern applied to Kafka Record Header Names for selecting Header Values to be written as FlowFile attributes Headers Field Parent Specifies under what field of the record the headers field will be added. If not set, the headers field will be at the root of the record Kafka Connection Service Provides connections to Kafka Broker for publishing Kafka Records Key Attribute Encoding Encoding for value of configured FlowFile attribute containing Kafka Record Key. Key Field Parent Specifies under what field of the record the key field will be added. If not set, the key field will be at the root of the record Key Format Specifies how to represent the Kafka Record Key in the output FlowFile Key Record Reader The Record Reader to use for parsing the Kafka Record Key into a Record Max Uncommitted Time Specifies the maximum amount of time that the Processor can consume from Kafka before it must transfer FlowFiles on through the flow and commit the offsets to Kafka (if appropriate). A larger time period can result in longer latency Message Demarcator Since KafkaConsumer receives messages in batches, this Processor has an option to output FlowFiles which contains all Kafka messages in a single batch for a given topic and partition and this property allows you to provide a string (interpreted as UTF-8) to use for demarcating apart multiple Kafka messages. This is an optional property and if not provided each Kafka message received will result in a single FlowFile which time it is triggered. To enter special character such as 'new line' use CTRL+Enter or Shift+Enter depending on the OS Metadata Field Specifies under what field of the record the metadata will be added. If not set, the metadata will be at the root of the record Metadata Received Timestamp Field If specified a timestamp will be placed under the specified field in the metadata of record in the output FlowFile Output Strategy The format used to output the Kafka Record into a FlowFile Record. Processing Strategy Strategy for processing Kafka Records and writing serialized output to FlowFiles Record Reader The Record Reader to use for incoming Kafka messages Record Writer The Record Writer to use in order to serialize the outgoing FlowFiles Separate By Key When this property is enabled, two messages will only be added to the same FlowFile if both of the Kafka Messages have identical keys. Topic Format Specifies whether the Topics provided are a comma separated list of names or a single regular expression Topics The name or pattern of the Kafka Topics from which the Processor consumes Kafka Records. More than one can be supplied if comma separated. auto.offset.reset Automatic offset configuration applied when no previous consumer offset found corresponding to Kafka auto.offset.reset property
## Relationships
Name Description success FlowFiles containing one or more serialized Kafka Records
## Writes attributes
Name Description record.count The number of records received mime.type The MIME Type that is provided by the configured Record Writer kafka.count The number of messages written if more than one kafka.key The key of message if present and if single message. How the key is encoded depends on the value of the 'Key Attribute Encoding' property. kafka.offset The offset of the message in the partition of the topic. kafka.timestamp The timestamp of the message in the partition of the topic. kafka.partition The partition of the topic the message or message bundle is from kafka.topic The topic the message or message bundle is from kafka.tombstone Set to true if the consumed message is a tombstone message
## See also - [com.snowflake.openflow.runtime.processors.kafka.PublishKafka](/user-guide/data-integration/openflow/processors/publishkafka) --- title: ConsumeKinesisStream 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumekinesisstream.md section: Loading & Unloading Data --- # ConsumeKinesisStream 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Reads data from the specified AWS Kinesis stream and outputs a FlowFile for every processed Record (raw) or a FlowFile for a batch of processed records if a Record Reader and Record Writer are configured. At-least-once delivery of all Kinesis Records within the Stream while the processor is running. AWS Kinesis Client Library can take several seconds to initialise before starting to fetch data. Uses DynamoDB for check pointing and CloudWatch (optional) for metrics. Ensure that the credentials provided have access to DynamoDB and CloudWatch (optional) along with Kinesis. ## Tags amazon, aws, consume, kinesis, stream ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider Amazon Kinesis Stream Name The name of Kinesis Stream Application Name The Kinesis stream reader application name. Checkpoint Interval Interval between Kinesis checkpoints Communications Timeout DynamoDB Override DynamoDB override to use non-AWS deployments Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. Failover Timeout Kinesis Client Library failover timeout FlowFile Handling On Schema Difference The strategy used when records in a Kinesis Stream change their schema in a single batch. Graceful Shutdown Timeout Kinesis Client Library graceful shutdown timeout Initial Stream Position Initial position to read Kinesis streams. Output Strategy The format used to output the Kinesis Record into a FlowFile Record. Record Reader The Record Reader to use for reading received messages. The Kinesis Stream name can be referred to by Expression Language '$\{kinesis.name\}' to access a schema. If Record Reader/Writer are not specified, each Kinesis Record will create a FlowFile. Record Writer The Record Writer to use for serializing Records to an output FlowFile. The Kinesis Stream name can be referred to by Expression Language '$\{kinesis.name\}' to access a schema. If Record Reader/Writer are not specified, each Kinesis Record will create a FlowFile. Region Report Metrics to CloudWatch Whether to report Kinesis usage metrics to CloudWatch. Retry Count Number of times to retry a Kinesis operation (process record, checkpoint, shutdown) Retry Wait Interval between Kinesis operation retries (process record, checkpoint, shutdown) Stream Position Timestamp Timestamp position in stream from which to start reading Kinesis Records. Required if Initial position to read Kinesis streams. is AT_TIMESTAMP. Uses the Timestamp Format to parse value into a Date. Timestamp Format Format to use for parsing the Stream Position Timestamp into a Date and converting the Kinesis Record's Approximate Arrival Timestamp into a FlowFile attribute. proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description success FlowFiles are routed to success relationship
## Writes attributes
Name Description aws.kinesis.partition.key Partition key of the (last) Kinesis Record read from the Shard aws.kinesis.shard.id Shard ID from which the Kinesis Record was read aws.kinesis.sequence.number The unique identifier of the (last) Kinesis Record within its Shard aws.kinesis.approximate.arrival.timestamp Approximate arrival timestamp of the (last) Kinesis Record read from the stream mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer (if configured) record.count Number of records written to the FlowFiles by the Record Writer (if configured) record.error.message This attribute provides on failure the error message encountered by the Record Reader or Record Writer (if configured)
## See also - [org.apache.nifi.processors.aws.kinesis.stream.PutKinesisStream](/user-guide/data-integration/openflow/processors/putkinesisstream) --- title: ConsumeMQTT 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumemqtt.md section: Loading & Unloading Data --- # ConsumeMQTT 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-mqtt-nar ## Description Subscribes to a topic and receives messages from an MQTT broker ## Tags IOT, MQTT, consume, listen, subscribe ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Broker URI The URI(s) to use to connect to the MQTT broker (e.g., [tcp://localhost:1883](tcp://localhost:1883)). The 'tcp', 'ssl', 'ws' and 'wss'schemes are supported. In order to use 'ssl', the SSL Context Service property must be set. When a comma-separated URI list is set (e.g., [tcp://localhost:1883,tcp://localhost:1884](tcp://localhost:1883,tcp://localhost:1884)), the processor will use a round-robin algorithm to connect to the brokers on connection failure. Client ID MQTT client ID to use. If not set, a UUID will be generated. Connection Timeout (seconds) Maximum time interval the client will wait for the network connection to the MQTT server to be established. The default timeout is 30 seconds. A value of 0 disables timeout processing meaning the client will wait until the network connection is made successfully or fails. Group ID MQTT consumer group ID to use. If group ID not set, client will connect as individual consumer. Keep Alive Interval (seconds) Defines the maximum time interval between messages sent or received. It enables the client to detect if the server is no longer available, without having to wait for the TCP/IP timeout. The client will ensure that at least one message travels across the network within each keep alive period. In the absence of a data-related message during the time period, the client sends a very small "ping" message, which the server will acknowledge. A value of 0 disables keepalive processing in the client. Last Will Message The message to send as the client's Last Will. Last Will QoS Level QoS level to be used when publishing the Last Will Message. Last Will Retain Whether to retain the client's Last Will. Last Will Topic The topic to send the client's Last Will to. MQTT Specification Version The MQTT specification version when connecting with the broker. See the allowable value descriptions for more details. Max Queue Size The MQTT messages are always being sent to subscribers on a topic regardless of how frequently the processor is scheduled to run. If the 'Run Schedule' is significantly behind the rate at which the messages are arriving to this processor, then a back up can occur in the internal queue of this processor. This property specifies the maximum number of messages this processor will hold in memory at one time in the internal queue. This data would be lost in case of a NiFi restart. Password Password to use when connecting to the broker Quality of Service(QoS) The Quality of Service (QoS) to receive the message with. Accepts values '0', '1' or '2'; '0' for 'at most once', '1' for 'at least once', '2' for 'exactly once'. SSL Context Service The SSL Context Service used to provide client certificate information for TLS/SSL connections. Session Expiry Interval After this interval the broker will expire the client and clear the session state. Session state Whether to start a fresh or resume previous flows. See the allowable value descriptions for more details. Topic Filter The MQTT topic filter to designate the topics to subscribe to. Username Username to use when connecting to the broker add-attributes-as-fields If setting this property to true, default fields are going to be added in each record: _topic, _qos, _isDuplicate, _isRetained. message-demarcator With this property, you have an option to output FlowFiles which contains multiple messages. This property allows you to provide a string (interpreted as UTF-8) to use for demarcating apart multiple messages. This is an optional property ; if not provided, and if not defining a Record Reader/Writer, each message received will result in a single FlowFile. To enter special character such as 'new line' use CTRL+Enter or Shift+Enter depending on the OS. record-reader The Record Reader to use for parsing received MQTT Messages into Records. record-writer The Record Writer to use for serializing Records before writing them to a FlowFile.
## Relationships
Name Description Message The MQTT message output parse.failure If a message cannot be parsed using the configured Record Reader, the contents of the message will be routed to this Relationship as its own individual FlowFile.
## Writes attributes
Name Description record.count The number of records received mqtt.broker MQTT broker that was the message source mqtt.topic MQTT topic on which message was received mqtt.qos The quality of service for this message. mqtt.isDuplicate Whether or not this message might be a duplicate of one which has already been received. mqtt.isRetained Whether or not this message was from a current publisher, or was "retained" by the server as the last message published on the topic.
## See also - [org.apache.nifi.processors.mqtt.PublishMQTT](/user-guide/data-integration/openflow/processors/publishmqtt) --- title: ConsumePOP3 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumepop3.md section: Loading & Unloading Data --- # ConsumePOP3 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-email-nar ## Description Consumes messages from Email Server using POP3 protocol. The raw-bytes of each received email message are written as contents of the FlowFile ## Tags Consume, Email, Get, Ingest, Ingress, Message, POP3 ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Authorization Mode How to authorize sending email on the user's behalf. Connection Timeout The amount of time to wait to connect to Email server Delete Messages Specify whether mail messages should be deleted after retrieval. Fetch Size Specify the maximum number of Messages to fetch per call to Email Server. Folder Email folder to retrieve messages from (e.g., INBOX) Host Name Network address of Email server (e.g., pop.gmail.com, imap.gmail.com . .) OAuth2 Access Token Provider OAuth2 service that can provide access tokens. Password Password used for authentication and authorization with Email server. Port Numeric value identifying Port of Email server (e.g., 993) User Name User Name used for authentication and authorization with Email server.
## Relationships
Name Description success All messages that are the are successfully received from Email server and converted to FlowFiles are routed to this relationship
--- title: ConsumeSlack 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumeslack.md section: Loading & Unloading Data --- # ConsumeSlack 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-slack-nar ## Description Retrieves messages from one or more configured Slack channels. The messages are written out in JSON format. See Usage / Additional Details for more information about how to configure this Processor and enable it to retrieve messages from Slack. ## Tags conversation, conversation.history, slack, social media, team, text, unstructured ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Access Token OAuth Access Token used for authenticating/authorizing the Slack request sent by NiFi. This may be either a User Token or a Bot Token. It must be granted the channels:history, groups:history, im:history, or mpim:history scope, depending on the type of conversation being used. Batch Size The maximum number of messages to retrieve in a single request to Slack. The entire response will be parsed into memory, so it is important that this be kept in mind when setting this value. Channels A comma-separated list of Slack Channels to Retrieve Messages From. Each element in the list may be either a Channel ID, such as C0L9VCD47, or (for public channels only) the name of a channel, prefixed with a # sign, such as #general. If any channel name is provided instead,instead of an ID, the Access Token provided must be granted the channels:read scope in order to resolve the Channel ID. See the Processor's Additional Details for information on how to find a Channel ID. Include Message Blocks Specifies whether or not the output JSON should include the value of the 'blocks' field for each Slack Message. This field includes information such as individual parts of a message that are formatted using rich text. This may be useful, for instance, for parsing. However, it often accounts for a significant portion of the data and as such may be set to null when it is not useful to you. Include Null Fields Specifies whether or not fields that have null values should be included in the output JSON. If true, any field in a Slack Message that has a null value will be included in the JSON with a value of null. If false, the key omitted from the output JSON entirely. Omitting null values results in smaller messages that are generally more efficient to process, but including the values may provide a better understanding of the format, especially for schema inference. Reply Monitor Frequency After consuming all messages in a given channel, this Processor will periodically poll all "threaded messages", aka Replies, whose timestamp falls between now and the amount of time specified by the <Reply Monitor Window> property. This property determines how frequently those messages are polled. Setting the value to a shorter duration may result in replies to messages being captured more quickly, providing a lower latency. However, it will also result in additional resource use and could trigger Rate Limiting to occur. Reply Monitor Window After consuming all messages in a given channel, this Processor will periodically poll all "threaded messages", aka Replies, whose timestamp is between now and this amount of time in the past in order to check for any new replies. Setting this value to a larger value may result in additional resource use and may result in Rate Limiting. However, if a user replies to an old thread that was started outside of this window, the reply may not be captured. Resolve Usernames Specifies whether or not User IDs should be resolved to usernames. By default, Slack Messages provide the ID of the user that sends a message, such as U0123456789, but not the username, such as NiFiUser. The username may be resolved, but it may require additional calls to the Slack API and requires that the Token used be granted the users:read scope. If set to true, usernames will be resolved with a best-effort policy: if a username cannot be obtained, it will be skipped over. Also, note that when a username is obtained, the Message's <username> field is populated, and the <text> field is updated such that any mention will be output such as "Hi @user" instead of "Hi <@U1234567>".
## State management
Scopes Description CLUSTER Maintains a mapping of Slack Channel IDs to the timestamp of the last message that was retrieved for that channel. This allows the processor to only retrieve messages that have been posted since the last time the processor was run. This state is stored in the cluster so that if the Primary Node changes, the new node will pick up where the previous node left off.
## Relationships
Name Description success Slack messages that are successfully received will be routed to this relationship
## Writes attributes
Name Description slack.channel.id The ID of the Slack Channel from which the messages were retrieved slack.message.count The number of slack messages that are included in the FlowFile mime.type Set to application/json, as the output will always be in JSON format
## See also - [org.apache.nifi.processors.slack.ListenSlack](/user-guide/data-integration/openflow/processors/listenslack) --- title: ConsumeSlackConversation 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumeslackconversation.md section: Loading & Unloading Data --- # ConsumeSlackConversation 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-slack-processors-nar ## Description Retrieves messages from Slack conversations available to the App. New conversations are fetched based on the 'Reply Monitor Frequency'. Ingested messages are written out in JSON format. See Usage / Additional Details for more information about how to configure this Processor and enable it to retrieve messages from Slack. ## Tags conversation, conversation.history, slack, social media, team, text, unstructured ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Access Token OAuth Access Token used for authenticating/authorizing the Slack request sent by NiFi. This may be either a User Token or a Bot Token. It must be granted the channels:history, groups:history, im:history, or mpim:history scope, depending on the type of conversation being used. Batch Size The maximum number of messages to retrieve in a single request to Slack. The entire response will be parsed into memory, so it is important that this be kept in mind when setting this value. Rate Limiter Service Slack Rate Limiter Service to coordinate rate limiting across processors Reply Monitor Frequency After consuming all messages in a given channel, this Processor will periodically poll all "threaded messages", aka Replies, whose timestamp falls between now and the amount of time specified by the <Reply Monitor Window> property. This property determines how frequently those messages are polled. Setting the value to a shorter duration may result in replies to messages being captured more quickly, providing a lower latency. However, it will also result in additional resource use and could trigger Rate Limiting to occur. This also determines how frequently newly added channels are checked. Reply Monitor Window After consuming all messages in a given channel, this Processor will periodically poll all "threaded messages", aka Replies, whose timestamp is between now and this amount of time in the past in order to check for any new replies. Setting this value to a larger value may result in additional resource use and may result in Rate Limiting. However, if a user replies to an old thread that was started outside of this window, the reply may not be captured. Resolve Usernames Specifies whether or not User IDs should be resolved to usernames. By default, Slack Messages provide the ID of the user that sends a message, such as U0123456789, but not the username, such as NiFiUser. The username may be resolved, but it may require additional calls to the Slack API and requires that the Token used be granted the users:read scope. If set to true, usernames will be resolved with a best-effort policy: if a username cannot be obtained, it will be skipped over. Also, note that when a username is obtained, the Message's <username> field is populated, and the <text> field is updated such that any mention will be output such as "Hi @user" instead of "Hi <@U1234567>".
## State management
Scopes Description CLUSTER Maintains a mapping of Slack Channel IDs to the timestamp of the last message that was retrieved for that channel. This allows the processor to only retrieve messages that have been posted since the last time the processor was run. This state is stored in the cluster so that if the Primary Node changes, the new node will pick up where the previous node left off.
## Relationships
Name Description success Slack messages that are successfully received will be routed to this relationship
## Writes attributes
Name Description slack.channel.id The ID of the Slack Channel from which the messages were retrieved slack.message.count The number of slack messages that are included in the FlowFile mime.type Set to application/json, as the output will always be in JSON format
--- title: ConsumeSlackHistory 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumeslackhistory.md section: Loading & Unloading Data --- # ConsumeSlackHistory 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-slack-processors-nar ## Description Fetches historical messages from all Slack channels available to the App. This processor queries Slack's conversations.history and conversations.replies to retrieve older messages and outputs the result as records. The processor tracks the earliest retrieved message timestamp in the cluster state to allow it to continue the historical load on subsequent executions. Channels are discovered automatically, no channel ID or name needs to be configured. ## Tags consume, conversation, history, slack ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Access Token OAuth Access Token used for authenticating the Slack request. It must be granted the channels:history (and, if resolving usernames, users:read) scope. Batch Size The maximum number of messages to retrieve in a single request to Slack. Channel Refresh Frequency The frequency at which the processor refreshes the list of Slack channels accessible to the App. This helps detect newly available channels or remove channels that are no longer available. Include Message Blocks Specifies whether the output JSON should include the value of the 'blocks' field for each Slack Message. Include Null Fields Specifies whether fields that have null values should be included in the output JSON. If true, any field with a null value will be output as null; if false, it will be omitted. Rate Limiter Service Slack Rate Limiter Service to coordinate rate limiting across processors Resolve Usernames Specifies whether User IDs should be resolved to usernames. If true, usernames will be resolved with a best-effort policy; if a username cannot be obtained, it will be skipped.
## State management
Scopes Description CLUSTER Maintains a mapping of Slack Channel IDs to the earliest message timestamp that has been retrieved. When no more messages are available, a flag is set indicating that the historical load is complete for that channel. This state is stored in the cluster so that if the Primary Node changes, the new node will pick up where the previous node left off.
## Relationships
Name Description success FlowFiles containing the JSON-encoded Slack conversation history are routed to this relationship
## Writes attributes
Name Description slack.channel.id The ID of the Slack Channel from which the messages were retrieved slack.channel.name The name of the Slack Channel from which the messages were retrieved slack.message.count The number of Slack messages that are included in the FlowFile mime.type Set to application/json, the output will always be in JSON format
--- title: ConsumeSnowflakeStream 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumesnowflakestream.md section: Loading & Unloading Data --- # ConsumeSnowflakeStream 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowflake-processors-nar ## Description Fetches data from a Snowflake stream and writes it to a FlowFile. The stream must be created in the database before using this processor. The processor will consume the stream and write the records to the FlowFile using the specified Record Writer. The processor will also add an attribute to the FlowFile with the name of the stream. The processor will not work if the stream is stale. Instead it will log an error message and stop processing. Stale stream has to be recreated in the database. After the stream is recreated in the database the processor will continue to read and process CDC records. For more information on Snowflake streams, see the <a href="[https://docs.snowflake.com/en/user-guide/streams-intro](https://docs.snowflake.com/en/user-guide/streams-intro)">snowflake documentation</a>. ## Tags connection, database, jdbc, openflow, snowflake, stream, table, view ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Max Chunk Size Number of records to write into a single FlowFile. This value might be slightly exceeded. Record Writer The Record Writer to use for CDC record serialization Snowflake Connection Service Database Connection Service for accessing Snowflake Stream Name The name of the stream in the database
## Relationships
Name Description success For FlowFiles with stream CDC records
## Writes attributes
Name Description snowflake.stream.name Name of the Snowflake Stream
--- title: ConsumeTwitter 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/consumetwitter.md section: Loading & Unloading Data --- # ConsumeTwitter 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-social-media-nar ## Description Streams tweets from Twitter's streaming API v2. The stream provides a sample stream or a search stream based on previously uploaded rules. This processor also provides a pass through for certain fields of the tweet to be returned as part of the response. See [https://developer.twitter.com/en/docs/twitter-api/data-dictionary/introduction](https://developer.twitter.com/en/docs/twitter-api/data-dictionary/introduction) for more information regarding the Tweet object model. ## Tags json, social media, status, tweets, twitter ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description backfill-minutes The number of minutes (up to 5 minutes) of streaming data to be requested after a disconnect. Only available for project with academic research access. See [https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/integrate/recovery-and-redundancy-features](https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/integrate/recovery-and-redundancy-features) backoff-attempts The number of reconnection tries the processor will attempt in the event of a disconnection of the stream for any reason, before throwing an exception. To start a stream after this exception occur and the connection is fixed, please stop and restart the processor. If the valueof this property is 0, then backoff will never occur and the processor will always need to be restartedif the stream fails. backoff-time The duration to backoff before requesting a new stream ifthe current one fails for any reason. Will increase by factor of 2 every time a restart fails base-path The base path that the processor will use for making HTTP requests. The default value should be sufficient for most use cases. batch-size The maximum size of the number of Tweets to be written to a single FlowFile. Will write fewer Tweets based on the number available in the queue at the time of processor invocation. bearer-token The Bearer Token provided by Twitter. connect-timeout The maximum time in which client should establish a connection with the Twitter API before a time out. Setting the value to 0 disables connection timeouts. expansions A comma-separated list of expansions for objects in the returned tweet. See [https://developer.twitter.com/en/docs/twitter-api/expansions](https://developer.twitter.com/en/docs/twitter-api/expansions) for proper usage. Possible field values include: author_id, referenced_tweets.id, referenced_tweets.id.author_id, entities.mentions.username, attachments.poll_ids, attachments.media_keys ,in_reply_to_user_id, geo.place_id maximum-backoff-time The maximum duration to backoff to start attempting a new stream. It is recommended that this number be much higher than the 'Backoff Time' property media-fields A comma-separated list of media fields to be returned as part of the tweet. Refer to [https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/media](https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/media) for proper usage. Possible field values include: alt_text, duration_ms, height, media_key, non_public_metrics, organic_metrics, preview_image_url, promoted_metrics, public_metrics, type, url, width place-fields A comma-separated list of place fields to be returned as part of the tweet. Refer to [https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/place](https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/place) for proper usage. Possible field values include: contained_within, country, country_code, full_name, geo, id, name, place_type poll-fields A comma-separated list of poll fields to be returned as part of the tweet. Refer to [https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/poll](https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/poll) for proper usage. Possible field values include: duration_minutes, end_datetime, id, options, voting_status queue-size Maximum size of internal queue for streamed messages read-timeout The maximum time of inactivity between receiving tweets from Twitter through the API before a timeout. Setting the value to 0 disables read timeouts. stream-endpoint The source from which the processor will consume Tweets. tweet-fields A comma-separated list of tweet fields to be returned as part of the tweet. Refer to [https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet](https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet) for proper usage. Possible field values include: attachments, author_id, context_annotations, conversation_id, created_at, entities, geo, id, in_reply_to_user_id, lang, non_public_metrics, organic_metrics, possibly_sensitive, promoted_metrics, public_metrics, referenced_tweets, reply_settings, source, text, withheld user-fields A comma-separated list of user fields to be returned as part of the tweet. Refer to [https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user](https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user) for proper usage. Possible field values include: created_at, description, entities, id, location, name, pinned_tweet_id, profile_image_url, protected, public_metrics, url, username, verified, withheld
## Relationships
Name Description success FlowFiles containing an array of one or more Tweets
## Writes attributes
Name Description mime.type The MIME Type set to application/json tweets The number of Tweets in the FlowFile
--- title: ControlRate 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/controlrate.md section: Loading & Unloading Data --- # ControlRate 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Controls the rate at which data is transferred to follow-on processors. If you configure a very small Time Duration, then the accuracy of the throttle gets worse. You can improve this accuracy by decreasing the Yield Duration, at the expense of more Tasks given to the processor. ## Tags rate, rate control, throttle, throughput ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Grouping Attribute By default, a single "throttle" is used for all FlowFiles. If this value is specified, a separate throttle is used for each value specified by the attribute with this name. Changing this value resets the rate counters. Maximum Data Rate The maximum rate at which data should pass through this processor. The format of this property is expected to be a Data Size (such as '1 MB') representing bytes per Time Duration. Maximum FlowFile Rate The maximum rate at which FlowFiles should pass through this processor. The format of this property is expected to be a positive integer representing FlowFiles count per Time Duration Maximum Rate The maximum rate at which data should pass through this processor. The format of this property is expected to be a positive integer, or a Data Size (such as '1 MB') if Rate Control Criteria is set to 'data rate'. Rate Control Criteria Indicates the criteria that is used to control the throughput rate. Changing this value resets the rate counters. Rate Controlled Attribute The name of an attribute whose values build toward the rate limit if Rate Control Criteria is set to 'attribute value'. The value of the attribute referenced by this property must be a positive long, or the FlowFile will be routed to failure. This value is ignored if Rate Control Criteria is not set to 'attribute value'. Changing this value resets the rate counters. Rate Exceeded Strategy Specifies how to handle an incoming FlowFile when the maximum data rate has been exceeded. Time Duration The amount of time to which the Maximum Rate pertains. Changing this value resets the rate counters.
## Relationships
Name Description failure FlowFiles will be routed to this relationship if they are missing a necessary Rate Controlled Attribute or the attribute is not in the expected format success FlowFiles are transferred to this relationship under normal conditions
## Use cases | Limit the rate at which data is sent to a downstream system with little to no bursts | | ------------------------------------------------------------------------------------------ | | Limit the rate at which FlowFiles are sent to a downstream system with little to no bursts | | Reject requests that exceed a specific rate with little to no bursts | | Reject requests that exceed a specific rate, allowing for bursts | --- title: ConvertCharacterSet 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/convertcharacterset.md section: Loading & Unloading Data --- # ConvertCharacterSet 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Converts a FlowFile's content from one character set to another ## Tags character set, characterset, convert, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Input Character Set The name of the CharacterSet to expect for Input Output Character Set The name of the CharacterSet to convert to
## Relationships
Name Description success
--- title: ConvertRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/convertrecord.md section: Loading & Unloading Data --- # ConvertRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Converts records from one data format to another using configured Record Reader and Record Write Controller Services. The Reader and Writer must be configured with "matching" schemas. By this, we mean the schemas must have the same field names. The types of the fields do not have to be the same if a field value can be coerced from one type to another. For instance, if the input schema has a field named "balance" of type double, the output schema can have a field named "balance" with a type of string, double, or float. If any field is present in the input that is not present in the output, the field will be left out of the output. If any field is specified in the output schema but is not present in the input data/schema, then the field will not be present in the output or will have a null value, depending on the writer. ## Tags avro, convert, csv, freeform, generic, json, log, logs, record, schema, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Include Zero Record FlowFiles When converting an incoming FlowFile, if the conversion results in no data, this property specifies whether or not a FlowFile will be sent to the corresponding relationship Record Reader Specifies the Controller Service to use for reading incoming data Record Writer Specifies the Controller Service to use for writing out the records
## Relationships
Name Description failure If a FlowFile cannot be transformed from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship success FlowFiles that are successfully transformed will be routed to this relationship
## Writes attributes
Name Description mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer record.count The number of records in the FlowFile record.error.message This attribute provides on failure the error message encountered by the Reader or Writer.
## Use cases | Convert data from one record-oriented format to another | | ------------------------------------------------------- | --- title: ConvertToJournalSchema 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/converttojournalschema.md section: Loading & Unloading Data --- # ConvertToJournalSchema 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-database-cdc-processors-nar ## Description Converts the incoming database schema into the appropriate schema for a Snowflake CDC Journal table. ## Tags Snowflake, cdc, journal ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Relationships
Name Description failure FlowFiles are routed to this relationship if the schema cannot be translated. original The original FlowFile is routed to this relationship when the schema is successfully converted. success FlowFiles are routed to this relationship after the schema has been converted.
--- title: CopyAzureBlobStorage_v12 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/copyazureblobstorage_v12.md section: Loading & Unloading Data --- # CopyAzureBlobStorage_v12 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Copies a blob in Azure Blob Storage from one account/container to another. The processor uses Azure Blob Storage client library v12. ## Tags azure, blob, cloud, microsoft, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Conflict Resolution Strategy Specifies whether an existing blob will have its contents replaced upon conflict. Create Container Specifies whether to check if the container exists and to automatically create it if it does not. Permission to list containers is required. If false, this check is not made, but the Put operation will fail if the container does not exist. Destination Blob Name The full name of the destination blob defaults to the Source Blob Name when not specified Destination Container Name Name of the Azure storage container destination defaults to the Source Container Name when not specified Destination Storage Credentials Controller Service used to obtain Azure Blob Storage Credentials. Source Blob Name The full name of the source blob Source Container Name Name of the Azure storage container that will be copied Source Storage Credentials Credentials Service used to obtain Azure Blob Storage Credentials to read Source Blob information proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
## Relationships
Name Description failure Unsuccessful operations will be transferred to the failure relationship. success All successfully processed FlowFiles are routed to this relationship
## Writes attributes
Name Description azure.container The name of the Azure Blob Storage container azure.blobname The name of the blob on Azure Blob Storage azure.primaryUri Primary location of the blob azure.etag ETag of the blob azure.blobtype Type of the blob (either BlockBlob, PageBlob or AppendBlob) mime.type MIME Type of the content lang Language code for the content azure.timestamp Timestamp of the blob azure.length Length of the blob azure.error.code Error code reported during blob operation azure.ignored When Conflict Resolution Strategy is 'ignore', this property will be true/false depending on whether the blob was ignored.
## See also - [org.apache.nifi.processors.azure.storage.DeleteAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/deleteazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.FetchAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/fetchazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.ListAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/listazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.PutAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/putazureblobstorage_v12) --- title: CopyS3Object 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/copys3object.md section: Loading & Unloading Data --- # CopyS3Object 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Copies a file from one bucket and key to another in AWS S3 ## Tags AWS, Amazon, Archive, Copy, S3 ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider Canned ACL Amazon Canned ACL for an object, one of: BucketOwnerFullControl, BucketOwnerRead, LogDeliveryWrite, AuthenticatedRead, PublicReadWrite, PublicRead, Private; will be ignored if any other ACL/permission/owner property is specified Communications Timeout The amount of time to wait in order to establish a connection to AWS or receive data from AWS before timing out. Custom Signer Class Name Fully qualified class name of the custom signer class. The signer must implement com.amazonaws.auth. Signer interface. Custom Signer Module Location Comma-separated list of paths to files and/or directories which contain the custom signer's JAR file and its dependencies (if any). Destination Bucket The bucket that will receive the copy. Destination Key The target key in the target bucket Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. FullControl User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Full Control for an object Owner The Amazon ID to use for the object's owner Read ACL User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have permissions to read the Access Control List for an object Read Permission User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Read Access for an object Region The AWS Region to connect to. SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections Signer Override The AWS S3 library uses Signature Version 4 by default but this property allows you to specify the Version 2 signer to support older S3-compatible services or even to plug in your own custom signer implementation. Source Bucket The bucket that contains the file to be copied. Source Key The source key in the source bucket Write ACL User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have permissions to change the Access Control List for an object Write Permission User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Write Access for an object proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure If the Processor is unable to process a given FlowFile, it will be routed to this Relationship. success FlowFiles are routed to this Relationship after they have been successfully processed.
## See also - [org.apache.nifi.processors.aws.s3.DeleteS3Object](/user-guide/data-integration/openflow/processors/deletes3object) - [org.apache.nifi.processors.aws.s3.FetchS3Object](/user-guide/data-integration/openflow/processors/fetchs3object) - [org.apache.nifi.processors.aws.s3.GetS3ObjectMetadata](/user-guide/data-integration/openflow/processors/gets3objectmetadata) - [org.apache.nifi.processors.aws.s3.GetS3ObjectTags](/user-guide/data-integration/openflow/processors/gets3objecttags) - [org.apache.nifi.processors.aws.s3.ListS3](/user-guide/data-integration/openflow/processors/lists3) - [org.apache.nifi.processors.aws.s3.PutS3Object](/user-guide/data-integration/openflow/processors/puts3object) - [org.apache.nifi.processors.aws.s3.TagS3Object](/user-guide/data-integration/openflow/processors/tags3object) --- title: CountText 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/counttext.md section: Loading & Unloading Data --- # CountText 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Counts various metrics on incoming text. The requested results will be recorded as attributes. The resulting flowfile will not have its content modified. ## Tags character, count, line, text, word ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description ajust-immediately If true, the counter will be updated immediately, without regard to whether the ProcessSession is commit or rolled back;otherwise, the counter will be incremented only if and when the ProcessSession is committed. character-encoding Specifies a character encoding to use. split-words-on-symbols If enabled, the word count will identify strings separated by common logical delimiters [ _ - . ] as independent words (ex. split-words-on-symbols = 4 words). text-character-count If enabled, will count the number of characters (including whitespace and symbols, but not including newlines and carriage returns) present in the incoming text. text-line-count If enabled, will count the number of lines present in the incoming text. text-line-nonempty-count If enabled, will count the number of lines that contain a non-whitespace character present in the incoming text. text-word-count If enabled, will count the number of words (alphanumeric character groups bounded by whitespace) present in the incoming text. Common logical delimiters [_-.] do not bound a word unless 'Split Words on Symbols' is true.
## Relationships
Name Description failure If the flowfile text cannot be counted for some reason, the original file will be routed to this destination and nothing will be routed elsewhere success The flowfile contains the original content with one or more attributes added containing the respective counts
## Writes attributes
Name Description text.line.count The number of lines of text present in the FlowFile content text.line.nonempty.count The number of lines of text (with at least one non-whitespace character) present in the original FlowFile text.word.count The number of words present in the original FlowFile text.character.count The number of characters (given the specified character encoding) present in the original FlowFile
## See also - [org.apache.nifi.processors.standard.SplitText](/user-guide/data-integration/openflow/processors/splittext) --- title: CreateAmazonAdsReport 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/createamazonadsreport.md section: Loading & Unloading Data --- # CreateAmazonAdsReport 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-amazon-ads-processors-nar ## Description Processor which creates report configuration for Amazon Ads connector. By default it runs once a day. ## Tags Amazon, Amazon Ads, report ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Access Token Provider Service providing OAuth access token. Amazon Advertising Client ID Client ID of the Amazon Advertising user. Region Environment from which advertising data will be downloaded. Report Ad Product Type of advertising product being reported. Report Columns List of columns fetched from Reporting API. Report Filters Set of filters used to trim returned data. Report Group By Level of granularity of the report. Report Ingestion Strategy Configuration of the report ingestion. Report Ingestion Window How many days from the past should be downloaded during incremental ingestion. Report Name Unique name of the report. Report Profile ID The profile ID associated with an advertising account in a specific marketplace. Report Start Date Start date from which the ingestion should happen. Report Time Unit Date aggregation. Report Type Data type contained in the report. Web Client Service Provider Service providing client for REST request execution.
## State management
Scopes Description CLUSTER Stores information about last report definition in form of hash to detect schema changes. Incrementally loaded reports persist last ingestion date to define ingestion date ranges after initial load. Additionally start date is saved.
## Relationships
Name Description success Response FlowFiles transferred when receiving success response from Amazon Ads Reporting API.
## Writes attributes
Name Description amazon.ads.report.id Unique identifier of the currently prepared job. amazon.ads.report.name Unique name of the report. amazon.ads.ingestion.strategy Strategy which defines if the report will be downloaded as a SNAPSHOT or INCREMENTALLY. amazon.ads.run.id Unique identifier of the current ingestion process. amazon.ads.ingestion.start.date Date from which data is downloaded from Amazon Ads (including given date). amazon.ads.ingestion.end.date Date to which data is downloaded from Amazon Ads (including given date). amazon.ads.report.schema.changed Flag meaning if the report schema has changed between processor executions. avro.schema Avro schema containing set of all configured fields. fragment.identifier A unique ID of each ingestion run. Allows to identify all flow files generated during a single run. fragment.index Number representing unique identifier in batch of flowfiles generated during one ingestion run. fragment.count Amount of flowfiles generated during processor execution.
--- title: CreateAzureOpenAiEmbeddings 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/createazureopenaiembeddings.md section: Loading & Unloading Data --- # CreateAzureOpenAiEmbeddings 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-openai-nar ## Description Uses Azure OpenAI to create embeddings for text. The input text can be provided as a single FlowFile or as a record-oriented FlowFile. ## Tags azure, chatbot, embeddings, gen ai, generative ai, llm, nlp, openai, openflow, text ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description API Key The API Key for authenticating to Azure OpenAI Deployment Name The name of the OpenAI model deployment to use for creating embeddings Dimensions The number of dimensions to request the resulting output embeddings have. This is only supported in text-embedding-3 and later models. Embeddings Record Path The path to the field in the record where the embeddings are to be written. Max Batch Size The maximum number of records to include in each batch sent to OpenAI OpenAI Service Name The name of the OpenAI service to use Record Reader The record reader to use for reading record-oriented data. If the incoming data is to be treated as plaintext, this property should be left unset. Record Writer The Record Writer to use for writing the output Text Record Path The path to the field in the record that contains the text to be embedded. If the incoming data is to be treated as plaintext, this property should be left unset. User An identifier for the remote user on whose behalf the request is being made; OpenAI uses this to detect and prevent abuse. Web Client Service The Web Client Service to use for communicating with OpenAI
## Relationships
Name Description failure The original FlowFile will be routed to this relationship if the embeddings could not be created success The embeddings will be routed to this relationship
## Writes attributes
Name Description record.count The number of records written to the output mime.type The MIME type of the output data, based on the chosen Record Writer
## Use cases | Create embeddings for text using Azure OpenAI's Embeddings | | ---------------------------------------------------------- | ## See also - [com.snowflake.openflow.runtime.processors.openai.CreateOpenAiEmbeddings](/user-guide/data-integration/openflow/processors/createopenaiembeddings) - [com.snowflake.openflow.runtime.processors.openai.PromptAzureOpenAI](/user-guide/data-integration/openflow/processors/promptazureopenai) --- title: CreateBoxFileMetadataInstance 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/createboxfilemetadatainstance.md section: Loading & Unloading Data --- # CreateBoxFileMetadataInstance 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Creates a metadata instance for a Box file using a specified template with values from the flowFile content. The Box API requires newly created templates to be created with the scope set as enterprise so no scope is required. The input record should be a flat key-value object where each field name is used as the metadata key. ## Tags box, create, metadata, storage, templates ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Box Client Service Controller Service used to obtain a Box API connection. File ID The ID of the file for which to create metadata. Record Reader The Record Reader to use for parsing the incoming data Template Key The key of the metadata template to use for creation.
## Relationships
Name Description failure A FlowFile is routed to this relationship if an error occurs during metadata creation. file not found FlowFiles for which the specified Box file was not found will be routed to this relationship. success A FlowFile is routed to this relationship after metadata has been successfully created. template not found FlowFiles for which the specified metadata template was not found will be routed to this relationship.
## Writes attributes
Name Description box.id The ID of the file for which metadata was created box.template.key The template key used for metadata creation error.code The error code returned by Box error.message The error message returned by Box
## See also - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) - [org.apache.nifi.processors.box.ListBoxFileMetadataTemplates](/user-guide/data-integration/openflow/processors/listboxfilemetadatatemplates) - [org.apache.nifi.processors.box.UpdateBoxFileMetadataInstance](/user-guide/data-integration/openflow/processors/updateboxfilemetadatainstance) --- title: CreateBoxMetadataTemplate 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/createboxmetadatatemplate.md section: Loading & Unloading Data --- # CreateBoxMetadataTemplate 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Creates a Box metadata template using field specifications from the flowFile content. Expects a schema with fields: " 'type' (required), 'key' (required), 'displayName' (optional), 'description' (optional), 'hidden' (optional, boolean). ## Tags box, create, metadata, storage, templates ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Box Client Service Controller Service used to obtain a Box API connection. Hidden Whether the template should be hidden in the Box UI. Record Reader The Record Reader to use for parsing the incoming data Template Key The key of the metadata template to create (used for API calls). Template Name The display name of the metadata template to create.
## Relationships
Name Description failure A FlowFile is routed to this relationship if an error occurs during template creation. success A FlowFile is routed to this relationship after a template has been successfully created.
## Writes attributes
Name Description box.template.name The template name that was created box.template.key The template key that was created box.template.scope The template scope. box.template.fields.count Number of fields created for the template error.code The error code returned by Box error.message The error message returned by Box
## See also - [org.apache.nifi.processors.box.ListBoxFileMetadataTemplates](/user-guide/data-integration/openflow/processors/listboxfilemetadatatemplates) - [org.apache.nifi.processors.box.UpdateBoxFileMetadataInstance](/user-guide/data-integration/openflow/processors/updateboxfilemetadatainstance) --- title: CreateCohereEmbeddings 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/createcohereembeddings.md section: Loading & Unloading Data --- # CreateCohereEmbeddings 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-cohere-nar ## Description Uses Cohere to create embeddings for text. The input text can be provided as a single FlowFile or as a record-oriented FlowFile. ## Tags chatbot, cohere, embeddings, gen ai, generative ai, llm, nlp, openflow, text ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Cohere API Key The API Key for authenticating to Cohere Embedding Type Specifies the types of embeddings you want to get back. Embeddings Model The model to use for embeddings, available models are listed at [https://docs.cohere.com/reference/embed](https://docs.cohere.com/reference/embed) Embeddings Record Path The path to the field in the record where the embeddings are to be written. Input Type Specifies the type of input passed to the model. Required for embedding models v3 and higher. Max Batch Size The maximum number of records to include in each batch sent to Cohere Record Reader The record reader to use for reading record-oriented data. If the incoming data is to be treated as plaintext, this property should be left unset. Record Writer The Record Writer to use for writing the output Text Record Path The path to the field in the record that contains the text to be embedded. If the incoming data is to be treated as plaintext, this property should be left unset. Truncate Policy One of NONE%start%END to specify how the API will handle inputs longer than the maximum token length. User An identifier for the remote user on whose behalf the request is being made.
## Relationships
Name Description failure The original FlowFile will be routed to this relationship if the embeddings could not be created success The embeddings will be routed to this relationship
## Writes attributes
Name Description record.count The number of records written to the output mime.type The MIME type of the output data, based on the chosen Record Writer
## Use cases | Create embeddings for text using Cohere's Embedding model | | --------------------------------------------------------- | --- title: CreateMetaAdsReport 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/createmetaadsreport.md section: Loading & Unloading Data --- # CreateMetaAdsReport 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-meta-ads-processors-nar ## Description Processor which creates report configuration for Meta Ads connector. By default it runs once a day. ## Tags Facebook, Meta, Meta Ads, report ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Access Token Token required to request Meta Ads Marketing API. It must match pattern 'Bearer <Access Token Value>'. Action Report Time Determine the report time of action stats. Click Attribution Window Attribution window for the click action. Meta Ads API Version Version of Meta Ads API which is used for report generation. Report Breakdowns List of values which determine how to break down the result. Multiple breakdowns can be picked, but only some combinations work. Report Fields List of fields fetched from Marketing API. If non are selected most used fields will be downloaded. Report Ingestion Strategy Configuration of the report ingestion. Report Level Granularity of the report. Report Name Unique name of the report. Report Object ID ID of the object from which data will be fetched. It can be Account, Campaign, Ad or Ad Set ID. Report Start Date Start date from which the ingestion should happen. Report Time Increment Value of aggregation in days. View Attribution Window Attribution window for the view action. Web Client Service Provider Service providing client for REST request execution.
## State management
Scopes Description CLUSTER Stores information about last report definition in form of hash to detect schema changes. Incrementally loaded reports persist last ingestion date to define ingestion date ranges after initial load. Additionally start date is saved.
## Relationships
Name Description success Response FlowFiles transferred when receiving success response from Meta Ads Marketing API.
## Writes attributes
Name Description meta.ads.report.id Unique identifier of the currently prepared job. meta.ads.report.name Unique name of the report. meta.ads.report.ingestion.strategy Strategy which defines if the report will be downloaded as a SNAPSHOT or INCREMENTALLY. meta.ads.run.id Unique identifier of the current ingestion process. meta.ads.ingestion.start.date Date from which data is downloaded from Meta Ads (including given date). meta.ads.ingestion.end.date Date to which data is downloaded from Meta Ads (including given date). meta.ads.report.schema.changed Flag meaning if the report schema has changed between processor executions. avro.schema Avro schema containing set of all configured fields.
--- title: CreateOpenAiEmbeddings 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/createopenaiembeddings.md section: Loading & Unloading Data --- # CreateOpenAiEmbeddings 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-openai-nar ## Description Uses OpenAI to create embeddings for text. The input text can be provided as a single FlowFile or as a record-oriented FlowFile. ## Tags chatbot, embeddings, gen ai, generative ai, llm, nlp, openai, openflow, text ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Dimensions The number of dimensions to request the resulting output embeddings have. This is only supported in text-embedding-3 and later models. Embeddings Model The model to use for embeddings Embeddings Record Path The path to the field in the record where the embeddings are to be written. Max Batch Size The maximum number of records to include in each batch sent to OpenAI OpenAI API Key The API Key for authenticating to OpenAI OpenAI Organization The organization to use for OpenAI Record Reader The record reader to use for reading record-oriented data. If the incoming data is to be treated as plaintext, this property should be left unset. Record Writer The Record Writer to use for writing the output Text Record Path The path to the field in the record that contains the text to be embedded. If the incoming data is to be treated as plaintext, this property should be left unset. User An identifier for the remote user on whose behalf the request is being made; OpenAI uses this to detect and prevent abuse. Web Client Service The Web Client Service to use for communicating with OpenAI
## Relationships
Name Description failure The original FlowFile will be routed to this relationship if the embeddings could not be created success The embeddings will be routed to this relationship
## Writes attributes
Name Description record.count The number of records written to the output mime.type The MIME type of the output data, based on the chosen Record Writer
## Use cases | Create embeddings for text using OpenAI's Embeddings | | ---------------------------------------------------- | ## See also - [com.snowflake.openflow.runtime.processors.openai.CreateAzureOpenAiEmbeddings](/user-guide/data-integration/openflow/processors/createazureopenaiembeddings) - [com.snowflake.openflow.runtime.processors.openai.PromptOpenAI](/user-guide/data-integration/openflow/processors/promptopenai) --- title: CreateSnowflakeEmbeddings 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/createsnowflakeembeddings.md section: Loading & Unloading Data --- # CreateSnowflakeEmbeddings 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowflake-processors-nar ## Description Create vector embeddings using Snowflake Cortex Large Language Model functions ## Tags chatbot, embeddings, gen ai, generative ai, llm, nlp, openflow, snowflake, text ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Dimensions The number of dimensions to request the resulting output embeddings have. Embeddings Model The model to use for embeddings Record Writer The Record Writer to use for writing the output Snowflake Connection Service Database Connection Service for accessing Snowflake
## Relationships
Name Description failure The original FlowFile will be routed to this relationship if the embeddings could not be created success The embeddings will be routed to this relationship
## Writes attributes
Name Description record.count The number of records written to the output mime.type The MIME type of the output data, based on the chosen Record Writer
--- title: CreateVertexAIEmbeddings 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/createvertexaiembeddings.md section: Loading & Unloading Data --- # CreateVertexAIEmbeddings 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-vertexai-nar ## Description Uses VertexAI to create embeddings for text. The input text can be provided as a single FlowFile or as a record-oriented FlowFile. ## Tags chatbot, cloud, embeddings, gcp, gen ai, generative ai, google, llm, nlp, openflow, text, vertex ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Auto Truncate If set to false, text that exceeds the token limit causes the request to fail. Embeddings Model The model to use for embeddings, available models are listed at [https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#models](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#models) Embeddings Record Path The path to the field in the record where the embeddings are to be written. GCP Credentials Service The Controller Service used to obtain Google Cloud Platform credentials. GCP Location The location to configure the Vertex client with GCP Project ID The project ID to configure the Vertex client with Max Batch Size The maximum number of records to include in each batch sent to VertexAI Model Publisher The publisher of the model Output Dimensionality Used to specify output embedding size. If set, output embeddings will be truncated to the size specified. Record Reader The record reader to use for reading record-oriented data. If the incoming data is to be treated as plaintext, this property should be left unset. Record Writer The Record Writer to use for writing the output Task Type Used to convey intended downstream application of embeddings to help the model tune embeddings for a specific purpose. Text Record Path The path to the field in the record that contains the text to be embedded. If the incoming data is to be treated as plaintext, this property should be left unset. User An identifier for the remote user on whose behalf the request is being made.
## Relationships
Name Description failure The original FlowFile will be routed to this relationship if the embeddings could not be created success The embeddings will be routed to this relationship
## Writes attributes
Name Description record.count The number of records written to the output mime.type The MIME type of the output data, based on the chosen Record Writer
## Use cases | Create embeddings for text using VertexAI's Embedding model | | ----------------------------------------------------------- | --- title: CryptographicHashContent 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/cryptographichashcontent.md section: Loading & Unloading Data --- # CryptographicHashContent 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Calculates a cryptographic hash value for the flowfile content using the given algorithm and writes it to an output attribute. Please refer to [https://csrc.nist.gov/Projects/Hash-Functions/NIST-Policy-on-Hash-Functions](https://csrc.nist.gov/Projects/Hash-Functions/NIST-Policy-on-Hash-Functions) for help to decide which algorithm to use. ## Tags blake2, content, cryptography, hash, md5, sha ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description fail_when_empty Route to failure if the content is empty. While hashing an empty value is valid, some flows may want to detect empty input. hash_algorithm The hash algorithm to use. Note that not all of the algorithms available are recommended for use (some are provided for legacy compatibility). There are many things to consider when picking an algorithm; it is recommended to use the most secure algorithm possible.
## Relationships
Name Description failure Used for flowfiles that have no content if the 'fail on empty' setting is enabled success Used for flowfiles that have a hash value added
## Writes attributes
Name Description content_<algorithm> This processor adds an attribute whose value is the result of hashing the flowfile content. The name of this attribute is specified by the value of the algorithm, e.g. 'content_SHA-256'.
--- title: CSVReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/csvreader.md section: Loading & Unloading Data --- # CSVReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Parses CSV-formatted data, returning each row in the CSV file as a separate record. This reader allows for inferring a schema based on the first line of the CSV, if a 'header line' is present, or providing an explicit schema for interpreting the values. See Controller Service's Usage for further documentation. ## Tags comma, csv, delimited, parse, reader, record, row, separated, values ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Allow Duplicate Header Names Allow Duplicate Header Names true - true - false Whether duplicate header names are allowed. Header names are case-sensitive, for example "name" and "Name" are treated as separate fields.Handling of duplicate header names is CSV Parser specific (where applicable):* Apache Commons CSV - duplicate headers will result in column data "shifting" right with new fields created for "unknown_field_index_X" where "X" is the CSV column index number* Jackson CSV - duplicate headers will be de-duplicated with the field value being that of the right-most duplicate CSV column* FastCSV - duplicate headers will be de-duplicated with the field value being that of the left-most duplicate CSV column CSV Format * CSV Format custom - Custom Format - RFC 4180 - Microsoft Excel - Tab-Delimited - MySQL Format - Informix Unload - Informix Unload Escape Disabled Specifies which "format" the CSV data is in, or specifies if custom formatting should be used. Character Set * Character Set UTF-8 The Character Encoding that is used to encode/decode the CSV file Comment Marker Comment Marker The character that is used to denote the start of a comment. Any line that begins with this comment will be ignored. Date Format Date Format Specifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/01/2017). Escape Character * Escape Character The character that is used to escape characters that would otherwise have a specific meaning to the CSV Parser. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Escape Character at runtime, then it will be skipped and the default Escape Character will be used. Setting it to an empty string means no escape character should be used. Ignore CSV Header Column Names Ignore CSV Header Column Names false - true - false If the first line of a CSV is a header, and the configured schema does not match the fields named in the header line, this controls how the Reader will interpret the fields. If this property is true, then the field names mapped to each column are driven only by the configured schema and any fields not in the schema will be ignored. If this property is false, then the field names found in the CSV Header will be used as the names of the fields. Null String Null String Specifies a String that, if present as a value in the CSV, should be considered a null field instead of using the literal value. Quote Character * Quote Character " The character that is used to quote values so that escape characters do not have to be used. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Quote Character at runtime, then it will be skipped and the default Quote Character will be used. Record Separator * Record Separator n Specifies the characters to use in order to separate CSV Records Schema Access Strategy * Schema Access Strategy infer-schema - Use 'Schema Name' Property - Use 'Schema Text' Property - Schema Reference Reader - Use String Fields From Header - Infer Schema Specifies how to obtain the schema that is to be used for interpreting the data. Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored. Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved. Time Format Time Format Specifies the format to use when reading/writing Time fields. If not specified, Time fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15). Timestamp Format Timestamp Format Specifies the format to use when reading/writing Timestamp fields. If not specified, Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/01/2017 18:04:15). Treat First Line as Header * Treat First Line as Header false - true - false Specifies whether or not the first line of CSV should be considered a Header or should be considered a record. If the Schema Access Strategy indicates that the columns must be defined in the header, then this property will be ignored, since the header must always be present and won't be processed as a Record. Otherwise, if 'true', then the first line of CSV data will not be processed as a record and if 'false',then the first line will be interpreted as a record. Trim Fields * Trim Fields true - true - false Whether or not white space should be removed from the beginning and end of fields Trim double quote * Trim double quote true - true - false Whether or not to trim starting and ending double quotes. For example: with trim string '"test"' would be parsed to 'test', without trim would be parsed to '"test"'.If set to 'false' it means full compliance with RFC-4180. Default value is true, with trim. Value Separator * Value Separator , The character that is used to separate values/fields in a CSV Record. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Value Separator at runtime, then it will be skipped and the default Value Separator will be used. CSV Parser * csv-reader-csv-parser commons-csv - Apache Commons CSV - Jackson CSV - FastCSV Specifies which parser to use to read CSV records. NOTE: Different parsers may support different subsets of functionality and may also exhibit different levels of performance.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: CSVRecordLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/csvrecordlookupservice.md section: Loading & Unloading Data --- # CSVRecordLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A reloadable CSV file-based lookup service. When the lookup key is found in the CSV file, the columns are returned as a Record. All returned fields will be strings. The first line of the csv file is considered as header. ## Tags cache, csv, enrich, join, key, lookup, record, reloadable, value ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description CSV Format * CSV Format default - Custom Format - RFC 4180 - Microsoft Excel - Tab-Delimited - MySQL Format - Informix Unload - Informix Unload Escape Disabled - Default Format - RFC4180 Specifies which "format" the CSV data is in, or specifies if custom formatting should be used. Character Set * Character Set UTF-8 The Character Encoding that is used to decode the CSV file. Comment Marker Comment Marker The character that is used to denote the start of a comment. Any line that begins with this comment will be ignored. Escape Character * Escape Character The character that is used to escape characters that would otherwise have a specific meaning to the CSV Parser. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Escape Character at runtime, then it will be skipped and the default Escape Character will be used. Setting it to an empty string means no escape character should be used. Quote Character * Quote Character " The character that is used to quote values so that escape characters do not have to be used. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Quote Character at runtime, then it will be skipped and the default Quote Character will be used. Quote Mode * Quote Mode MINIMAL - Quote All Values - Quote Minimal - Quote Non-Numeric Values - Do Not Quote Values Specifies how fields should be quoted when they are written Trim Fields * Trim Fields true - true - false Whether or not white space should be removed from the beginning and end of fields Value Separator * Value Separator , The character that is used to separate values/fields in a CSV Record. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Value Separator at runtime, then it will be skipped and the default Value Separator will be used. CSV File * csv-file Path to a CSV File in which the key value pairs can be looked up. Ignore Duplicates * ignore-duplicates true - true - false Ignore duplicate keys for records in the CSV file. Lookup Key Column * lookup-key-column The field in the CSV file that will serve as the lookup key. This is the field that will be matched against the property specified in the lookup processor.
## State management This component does not store state. ## Restricted ## Restrictions
Required Permission Explanation read filesystem Provides operator the ability to read from any file that NiFi has access to.
## System Resource Considerations This component does not specify system resource considerations. --- title: CSVRecordSetWriter source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/csvrecordsetwriter.md section: Loading & Unloading Data --- # CSVRecordSetWriter This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Writes the contents of a RecordSet as CSV data. The first line written will be the column names (unless the 'Include Header Line' property is false). All subsequent lines will be the values corresponding to the record fields. ## Tags csv, delimited, record, recordset, result, row, separated, serializer, set, tab, tsv, writer ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description CSV Format * CSV Format custom - Custom Format - RFC 4180 - Microsoft Excel - Tab-Delimited - MySQL Format - Informix Unload - Informix Unload Escape Disabled Specifies which "format" the CSV data is in, or specifies if custom formatting should be used. Character Set * Character Set UTF-8 The Character Encoding that is used to encode/decode the CSV file Comment Marker Comment Marker The character that is used to denote the start of a comment. Any line that begins with this comment will be ignored. Date Format Date Format Specifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/01/2017). Escape Character * Escape Character The character that is used to escape characters that would otherwise have a specific meaning to the CSV Parser. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Escape Character at runtime, then it will be skipped and the default Escape Character will be used. Setting it to an empty string means no escape character should be used. Include Header Line * Include Header Line true - true - false Specifies whether or not the CSV column names should be written out as the first line. Include Trailing Delimiter * Include Trailing Delimiter false - true - false If true, a trailing delimiter will be added to each CSV Record that is written. If false, the trailing delimiter will be omitted. Null String Null String Specifies a String that, if present as a value in the CSV, should be considered a null field instead of using the literal value. Quote Character * Quote Character " The character that is used to quote values so that escape characters do not have to be used. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Quote Character at runtime, then it will be skipped and the default Quote Character will be used. Quote Mode * Quote Mode MINIMAL - Quote All Values - Quote Minimal - Quote Non-Numeric Values - Do Not Quote Values Specifies how fields should be quoted when they are written Record Separator * Record Separator n Specifies the characters to use in order to separate CSV Records Schema Access Strategy * Schema Access Strategy inherit-record-schema - Inherit Record Schema - Use 'Schema Name' Property - Use 'Schema Text' Property Specifies how to obtain the schema that is to be used for interpreting the data. Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored. Schema Cache Schema Cache Specifies a Schema Cache to add the Record Schema to so that Record Readers can quickly lookup the schema. Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier Schema Reference Writer * Schema Reference Writer Service implementation responsible for writing FlowFile attributes or content header with Schema reference information Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved. Schema Write Strategy * Schema Write Strategy no-schema - Do Not Write Schema - Set 'schema.name' Attribute - Set 'avro.schema' Attribute - Schema Reference Writer Specifies how the schema for a Record should be added to the data. Time Format Time Format Specifies the format to use when reading/writing Time fields. If not specified, Time fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15). Timestamp Format Timestamp Format Specifies the format to use when reading/writing Timestamp fields. If not specified, Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/01/2017 18:04:15). Trim Fields * Trim Fields true - true - false Whether or not white space should be removed from the beginning and end of fields Value Separator * Value Separator , The character that is used to separate values/fields in a CSV Record. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Value Separator at runtime, then it will be skipped and the default Value Separator will be used. CSV Writer * csv-writer commons-csv - Apache Commons CSV - FastCSV Specifies which writer implementation to use to write CSV records. NOTE: Different writers may support different subsets of functionality and may also exhibit different levels of performance.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: DatabaseLookup source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/databaselookup.md section: Loading & Unloading Data --- # DatabaseLookup This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A Lookup Service that allows for enrichment with a database using a user-specified SQL statement. The SQL statement may reference any value from the FlowFile's Record that is provided by the calling Processor. ## Tags database, enrich, join, lookup, openflow, rdbms, record, sql ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Connection Pooling Service * Connection Pooling Service The Connection Pooling Service that is used to obtain a connection to the database Max Array Size * Max Array Size 1000 The maximum number of records to include in the array. This is a mechanism to ensure that the returned results due not cause memory issues. If the result set contains more records than this value, the lookup will fail. If the desire is instead to limit the number of rows returned, a LIMIT clause should be added to the SQL. Multiple Result Field Name * Multiple Result Field Name results If multiple results are returned, they will be combined into an array. This property dictates the name of the field in the returned record. Multiple Result Strategy * Multiple Result Strategy Fail - Use Array - Use First Only - Fail Specifies how to handle the situation where the lookup results in multiple records. SQL * SQL The SQL statement to execute against the database in order to lookup the value. The statement may reference any attributes or values from the incoming Record that are provided by the calling Processor via Expression Language. The processor is will extract any Expression Language expressions and replace them with parameterized values so that the SQL can be safely executed, avoiding SQL Injection attacks.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: DatabaseRecordLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/databaserecordlookupservice.md section: Loading & Unloading Data --- # DatabaseRecordLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A relational-database-based lookup service. When the lookup key is found in the database, the specified columns (or all if Lookup Value Columns are not specified) are returned as a Record. Only one row will be returned for each lookup, duplicate database entries are ignored. ## Tags cache, database, enrich, join, key, lookup, rdbms, record, reloadable, value ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Cache Expiration Cache Expiration Time interval to clear all cache entries. If the Cache Size is zero then this property is ignored. Default Decimal Precision * Default Decimal Precision 10 When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'precision' denoting number of available digits is required. Generally, precision is defined by column data type definition or database engines default. However undefined precision (0) can be returned from some database engines. 'Default Decimal Precision' is used when writing those undefined precision numbers. Default Decimal Scale * Default Decimal Scale 0 When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'scale' denoting number of available decimal digits is required. Generally, scale is defined by column data type definition or database engines default. However when undefined precision (0) is returned, scale can also be uncertain with some database engines. 'Default Decimal Scale' is used when writing those undefined numbers. If a value has more decimals than specified scale, then the value will be rounded-up, e.g. 1.53 becomes 2 with scale 0, and 1.5 with scale 1. Cache Size * dbrecord-lookup-cache-size 0 Specifies how many lookup values/records should be cached. The cache is shared for all tables and keeps a map of lookup values to records. Setting this property to zero means no caching will be done and the table will be queried for each lookup value in each record. If the lookup table changes often or the most recent data must be retrieved, do not use the cache. Clear Cache on Enabled * dbrecord-lookup-clear-cache-on-enabled true - true - false Whether to clear the cache when this service is enabled. If the Cache Size is zero then this property is ignored. Clearing the cache when the service is enabled ensures that the service will first go to the database to get the most recent data. Database Connection Pooling Service * dbrecord-lookup-dbcp-service The Controller Service that is used to obtain connection to database Lookup Key Column * dbrecord-lookup-key-column The column in the table that will serve as the lookup key. This is the column that will be matched against the property specified in the lookup processor. Note that this may be case-sensitive depending on the database. Table Name * dbrecord-lookup-table-name The name of the database table to be queried. Note that this may be case-sensitive depending on the database. Lookup Value Columns dbrecord-lookup-value-columns A comma-delimited list of columns in the table that will be returned when the lookup key matches. Note that this may be case-sensitive depending on the database.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: DatabaseRecordSink source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/databaserecordsink.md section: Loading & Unloading Data --- # DatabaseRecordSink This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a service to write records using a configured database connection. ## Tags connection, database, db, jdbc, record ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Catalog Name db-record-sink-catalog-name The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty Database Connection Pooling Service * db-record-sink-dcbp-service The Controller Service that is used to obtain a connection to the database for sending records. Max Wait Time * db-record-sink-query-timeout 0 seconds The maximum amount of time allowed for a running SQL statement , zero means there is no limit. Max time less than 1 second will be equal to zero. Quote Column Identifiers db-record-sink-quoted-identifiers false - true - false Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables. Quote Table Identifiers db-record-sink-quoted-table-identifiers false - true - false Enabling this option will cause the table name to be quoted to support the use of special characters in the table name. Schema Name db-record-sink-schema-name The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty Table Name * db-record-sink-table-name The name of the table that the statement should affect. Translate Field Names db-record-sink-translate-field-names true - true - false If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. If false, the field names must match the column names exactly, or the column will not be updated Unmatched Column Behavior db-record-sink-unmatched-column-behavior Fail on Unmatched Columns - Ignore Unmatched Columns - Warn on Unmatched Columns - Fail on Unmatched Columns If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation Unmatched Field Behavior db-record-sink-unmatched-field-behavior Ignore Unmatched Fields - Ignore Unmatched Fields - Fail on Unmatched Fields If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: DBCPConnectionPool source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/dbcpconnectionpool.md section: Loading & Unloading Data --- # DBCPConnectionPool This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides Database Connection Pooling Service. Connections can be asked from pool and returned after usage. ## Tags connection, database, dbcp, jdbc, pooling, store ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Database Connection URL * Database Connection URL A database connection URL used to connect to a database. May contain database system name, host, port, database name and some parameters. The exact syntax of a database connection URL is specified by your DBMS. Database Driver Class Name * Database Driver Class Name Database driver class name Database Driver Location(s) Database Driver Location(s) Comma-separated list of files/folders and/or URLs containing the driver JAR and its dependencies (if any). For example '/var/tmp/mariadb-java-client-1.1.7.jar' Database User Database User Database user name Kerberos User Service Kerberos User Service Specifies the Kerberos User Controller Service that should be used for authenticating with Kerberos Max Total Connections * Max Total Connections 8 The maximum number of active connections that can be allocated from this pool at the same time, or negative for no limit. Max Wait Time * Max Wait Time 500 millis The maximum amount of time that the pool will wait (when there are no available connections) for a connection to be returned before failing, or -1 to wait indefinitely. Maximum Connection Lifetime Maximum Connection Lifetime -1 The maximum lifetime of a connection. After this time is exceeded the connection will fail the next activation, passivation or validation test. A value of zero or less means the connection has an infinite lifetime. Maximum Idle Connections Maximum Idle Connections 8 The maximum number of connections that can remain idle in the pool without extra ones being released. Set to any negative value to allow unlimited idle connections. Minimum Evictable Idle Time Minimum Evictable Idle Time 30 mins The minimum amount of time a connection may sit idle in the pool before it is eligible for eviction. Minimum Idle Connections Minimum Idle Connections 0 The minimum number of connections that can remain idle in the pool without extra ones being created. Set to or zero to allow no idle connections. Password Password The password for the database user Soft Minimum Evictable Idle Time Soft Minimum Evictable Idle Time -1 The minimum amount of time a connection may sit idle in the pool before it is eligible for eviction by the idle connection evictor, with the extra condition that at least a minimum number of idle connections remain in the pool. When the not-soft version of this option is set to a positive value, it is examined first by the idle connection evictor: when idle connections are visited by the evictor, idle time is first compared against it (without considering the number of idle connections in the pool) and then against this soft option, including the minimum idle connections constraint. Time Between Eviction Runs Time Between Eviction Runs -1 The time period to sleep between runs of the idle connection evictor thread. When non-positive, no idle connection evictor thread will be run. Validation Query Validation Query Validation query used to validate connections before returning them. When connection is invalid, it gets dropped and new valid connection will be returned. Note!! Using validation might have some performance penalty.
## State management This component does not store state. ## Restricted ## Restrictions
Required Permission Explanation reference remote resources Database Driver Location can reference resources over HTTP
## System Resource Considerations This component does not specify system resource considerations. --- title: DBCPConnectionPoolLookup source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/dbcpconnectionpoollookup.md section: Loading & Unloading Data --- # DBCPConnectionPoolLookup This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a DBCPService that can be used to dynamically select another DBCPService. This service requires an attribute named 'database.name' to be passed in when asking for a connection, and will throw an exception if the attribute is missing. The value of 'database.name' will be used to select the DBCPService that has been registered with that name. This will allow multiple DBCPServices to be defined and registered, and then selected dynamically at runtime by tagging flow files with the appropriate 'database.name' attribute. ## Tags connection, database, dbcp, jdbc, pooling, store ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: DebugFlow 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/debugflow.md section: Loading & Unloading Data --- # DebugFlow 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description The DebugFlow processor aids testing and debugging the FlowFile framework by allowing various responses to be explicitly triggered in response to the receipt of a FlowFile or a timer event without a FlowFile if using timer or cron based scheduling. It can force responses needed to exercise or test various failure modes that can occur when a processor runs. ## Tags FlowFile, debug, flow, processor, test, utility ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description @OnScheduled Pause Time Specifies how long the processor should sleep in the @OnScheduled method, so that the processor can be forced to take a long time to start up @OnStopped Pause Time Specifies how long the processor should sleep in the @OnStopped method, so that the processor can be forced to take a long time to shutdown @OnUnscheduled Pause Time Specifies how long the processor should sleep in the @OnUnscheduled method, so that the processor can be forced to take a long time to respond when user clicks stop Content Size The number of bytes to write each time that the FlowFile is written to CustomValidate Pause Time Specifies how long the processor should sleep in the customValidate() method Fail When @OnScheduled called Specifies whether or not the Processor should throw an Exception when the methods annotated with @OnScheduled are called Fail When @OnStopped called Specifies whether or not the Processor should throw an Exception when the methods annotated with @OnStopped are called Fail When @OnUnscheduled called Specifies whether or not the Processor should throw an Exception when the methods annotated with @OnUnscheduled are called FlowFile Exception Class Exception class to be thrown (must extend java.lang. RuntimeException). FlowFile Exception Iterations Number of FlowFiles to throw exception. FlowFile Failure Iterations Number of FlowFiles to forward to failure relationship. FlowFile Rollback Iterations Number of FlowFiles to roll back (without penalty). FlowFile Rollback Penalty Iterations Number of FlowFiles to roll back with penalty. FlowFile Rollback Yield Iterations Number of FlowFiles to roll back and yield. FlowFile Success Iterations Number of FlowFiles to forward to success relationship. Ignore Interrupts When Paused If the Processor's thread(s) are sleeping (due to one of the "Pause Time" properties above), and the thread is interrupted, this indicates whether the Processor should ignore the interrupt and continue sleeping or if it should allow itself to be interrupted. No FlowFile Exception Class Exception class to be thrown if no FlowFile (must extend java.lang. RuntimeException). No FlowFile Exception Iterations Number of times to throw NPE exception if no FlowFile. No FlowFile Skip Iterations Number of times to skip onTrigger if no FlowFile. No FlowFile Yield Iterations Number of times to yield if no FlowFile. OnTrigger Pause Time Specifies how long the processor should sleep in the onTrigger() method, so that the processor can be forced to take a long time to perform its task Write Iterations Number of times to write to the FlowFile
## Relationships
Name Description failure FlowFiles that failed to process. success FlowFiles processed successfully.
--- title: DecryptContentAge 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/decryptcontentage.md section: Loading & Unloading Data --- # DecryptContentAge 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-cipher-nar ## Description Decrypt content using the age-encryption.org/v1 specification. Detects binary or ASCII armored content encoding using the initial file header bytes. The age standard uses ChaCha20-Poly1305 for authenticated encryption of the payload. The age-keygen command supports generating X25519 key pairs for encryption and decryption operations. ## Tags ChaCha20-Poly1305, X25519, age, age-encryption.org, encryption ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Private Key Identities One or more X25519 Private Key Identities, separated with newlines, encoded according to the age specification, starting with AGE-SECRET-KEY-1 Private Key Identity Resources One or more files or URLs containing X25519 Private Key Identities, separated with newlines, encoded according to the age specification, starting with AGE-SECRET-KEY-1 Private Key Source Source of information determines the loading strategy for X25519 Private Key Identities
## Relationships
Name Description failure Decryption Failed success Decryption Completed
## See also - [org.apache.nifi.processors.cipher.EncryptContentAge](/user-guide/data-integration/openflow/processors/encryptcontentage) --- title: DecryptContentPGP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/decryptcontentpgp.md section: Loading & Unloading Data --- # DecryptContentPGP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-pgp-nar ## Description Decrypt contents of OpenPGP messages. Using the Packaged Decryption Strategy preserves OpenPGP encoding to support subsequent signature verification. ## Tags Encryption, GPG, OpenPGP, PGP, RFC 4880 ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description decryption-strategy Strategy for writing files to success after decryption passphrase Passphrase used for decrypting data encrypted with Password-Based Encryption private-key-service PGP Private Key Service for decrypting data encrypted with Public Key Encryption
## Relationships
Name Description failure Decryption Failed success Decryption Succeeded
## Writes attributes
Name Description pgp.literal.data.filename Filename from decrypted Literal Data pgp.literal.data.modified Modified Date from decrypted Literal Data pgp.symmetric.key.algorithm.block.cipher Symmetric-Key Algorithm Block Cipher pgp.symmetric.key.algorithm.id Symmetric-Key Algorithm Identifier
## See also - [org.apache.nifi.processors.pgp.EncryptContentPGP](/user-guide/data-integration/openflow/processors/encryptcontentpgp) - [org.apache.nifi.processors.pgp.SignContentPGP](/user-guide/data-integration/openflow/processors/signcontentpgp) - [org.apache.nifi.processors.pgp.VerifyContentPGP](/user-guide/data-integration/openflow/processors/verifycontentpgp) --- title: DeduplicateRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deduplicaterecord.md section: Loading & Unloading Data --- # DeduplicateRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description This processor de-duplicates individual records within a record set. It can operate on a per-file basis using an in-memory hashset or bloom filter. When configured with a distributed map cache, it de-duplicates records across multiple files. ## Tags change, dedupe, distinct, dupe, duplicate, filter, hash, modify, record, replace, text, unique, update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description bloom-filter-certainty The desired false positive probability when using the BloomFilter type. Using a value of .05 for example, guarantees a five-percent probability that the result is a false positive. The closer to 1 this value is set, the more precise the result at the expense of more storage space utilization. cache-identifier An optional expression language field that overrides the record's computed cache key. This field has an additional attribute available: $\{record.hash.value\}, which contains the cache key derived from dynamic properties (if set) or record fields. deduplication-strategy The strategy to use for detecting and routing duplicate records. The option for detecting duplicates across a single FlowFile operates in-memory, whereas detection spanning multiple FlowFiles utilises a distributed map cache. distributed-map-cache This property is required when the deduplication strategy is set to 'multiple files.' The map cache will for each record, atomically check whether the cache key exists and if not, set it. filter-capacity-hint An estimation of the total number of unique records to be processed. The more accurate this number is will lead to fewer false negatives on a BloomFilter. filter-type The filter used to determine whether a record has been seen before based on the matching RecordPath criteria. If hash set is selected, a Java HashSet object will be used to deduplicate all encountered records. If the bloom filter option is selected, a bloom filter will be used. The bloom filter option is less memory intensive, but has a chance of having false positives. include-zero-record-flowfiles If a FlowFile sent to either the duplicate or non-duplicate relationships contains no records, a value of _false_ in this property causes the FlowFile to be dropped. Otherwise, the empty FlowFile is emitted. put-cache-identifier For each record, check whether the cache identifier exists in the distributed map cache. If it doesn't exist and this property is true, put the identifier to the cache. record-hashing-algorithm The algorithm used to hash the cache key. record-reader Specifies the Controller Service to use for reading incoming data record-writer Specifies the Controller Service to use for writing out the records
## Relationships
Name Description duplicate Records detected as duplicates are routed to this relationship. failure If unable to communicate with the cache, the FlowFile will be penalized and routed to this relationship non-duplicate Records not found in the cache are routed to this relationship. original The original input FlowFile is sent to this relationship unless a fatal error occurs.
## Writes attributes
Name Description record.count Number of records written to the destination FlowFile.
## See also - [org.apache.nifi.processors.standard.DetectDuplicate](/user-guide/data-integration/openflow/processors/detectduplicate) --- title: DeleteAzureBlobStorage_v12 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deleteazureblobstorage_v12.md section: Loading & Unloading Data --- # DeleteAzureBlobStorage_v12 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Deletes the specified blob from Azure Blob Storage. The processor uses Azure Blob Storage client library v12. ## Tags azure, blob, cloud, microsoft, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Blob Name The full name of the blob Container Name Name of the Azure storage container. In case of PutAzureBlobStorage processor, container can be created if it does not exist. Delete Snapshots Option Specifies the snapshot deletion options to be used when deleting a blob. Storage Credentials Controller Service used to obtain Azure Blob Storage Credentials. proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
## Relationships
Name Description failure Unsuccessful operations will be transferred to the failure relationship. success All successfully processed FlowFiles are routed to this relationship
## See also - [org.apache.nifi.processors.azure.storage.CopyAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/copyazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.FetchAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/fetchazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.ListAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/listazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.PutAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/putazureblobstorage_v12) --- title: DeleteAzureDataLakeStorage 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deleteazuredatalakestorage.md section: Loading & Unloading Data --- # DeleteAzureDataLakeStorage 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Deletes the provided file from Azure Data Lake Storage ## Tags adlsgen2, azure, cloud, datalake, microsoft, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description ADLS Credentials Controller Service used to obtain Azure Credentials. Directory Name Name of the Azure Storage Directory. The Directory Name cannot contain a leading '/'. The root directory can be designated by the empty string value. In case of the PutAzureDataLakeStorage processor, the directory will be created if not already existing. File Name The filename Filesystem Name Name of the Azure Storage File System (also called Container). It is assumed to be already existing. Filesystem Object Type They type of the file system object to be deleted. It can be either folder or file. proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
## Relationships
Name Description failure Files that could not be written to Azure storage for some reason are transferred to this relationship success Files that have been successfully written to Azure storage are transferred to this relationship
## See also - [org.apache.nifi.processors.azure.storage.FetchAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/fetchazuredatalakestorage) - [org.apache.nifi.processors.azure.storage.ListAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/listazuredatalakestorage) - [org.apache.nifi.processors.azure.storage.PutAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/putazuredatalakestorage) --- title: DeleteBoxFileMetadataInstance 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deleteboxfilemetadatainstance.md section: Loading & Unloading Data --- # DeleteBoxFileMetadataInstance 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Deletes a metadata instance from a Box file using the specified template key ## Tags box, delete, metadata, storage, templates ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Box Client Service Controller Service used to obtain a Box API connection. File ID The ID of the file from which to delete metadata. Template Key The key of the metadata template instance to delete.
## Relationships
Name Description failure A FlowFile is routed to this relationship if an error occurs during metadata deletion. file not found FlowFiles for which the specified Box file was not found will be routed to this relationship. success A FlowFile is routed to this relationship after metadata has been successfully deleted. template not found FlowFiles for which the specified metadata template was not found will be routed to this relationship.
## Writes attributes
Name Description box.id The ID of the file from which metadata was deleted box.template.key The template key used for metadata deletion error.code The error code returned by Box error.message The error message returned by Box
## See also - [org.apache.nifi.processors.box.CreateBoxFileMetadataInstance](/user-guide/data-integration/openflow/processors/createboxfilemetadatainstance) - [org.apache.nifi.processors.box.FetchBoxFileMetadataInstance](/user-guide/data-integration/openflow/processors/fetchboxfilemetadatainstance) - [org.apache.nifi.processors.box.UpdateBoxFileMetadataInstance](/user-guide/data-integration/openflow/processors/updateboxfilemetadatainstance) --- title: DeleteByQueryElasticsearch 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deletebyqueryelasticsearch.md section: Loading & Unloading Data --- # DeleteByQueryElasticsearch 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-elasticsearch-restapi-nar ## Description Delete from an Elasticsearch index using a query. The query can be loaded from a flowfile body or from the Query parameter. ## Tags delete, elastic, elasticsearch, elasticsearch7, elasticsearch8, elasticsearch9, query ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Client Service An Elasticsearch client service to use for running queries. Index The name of the index to use. Max JSON Field String Length The maximum allowed length of a string value when parsing a JSON document or attribute. Query A query in JSON syntax, not Lucene syntax. Ex: \{"query":\{"match":\{"somefield":"somevalue"\}\}\}. If this parameter is not set, the query will be read from the flowfile content. If the query (property and flowfile content) is empty, a default empty JSON Object will be used, which will result in a "match_all" query in Elasticsearch. Query Attribute If set, the executed query will be set on each result flowfile in the specified attribute. Query Clause A "query" clause in JSON syntax, not Lucene syntax. Ex: \{"match":\{"somefield":"somevalue"\}\}. If the query is empty, a default JSON Object will be used, which will result in a "match_all" query in Elasticsearch. Query Definition Style How the JSON Query will be defined for use by the processor. Type The type of this document (used by Elasticsearch for indexing and searching).
## Relationships
Name Description failure If the "by query" operation fails, and a flowfile was read, it will be sent to this relationship. retry All flowfiles that fail due to server/cluster availability go to this relationship. success If the "by query" operation succeeds, and a flowfile was read, it will be sent to this relationship.
## Writes attributes
Name Description elasticsearch.delete.took The amount of time that it took to complete the delete operation in ms. elasticsearch.delete.error The error message provided by Elasticsearch if there is an error running the delete.
--- title: DeleteDBFSResource 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deletedbfsresource.md section: Loading & Unloading Data --- # DeleteDBFSResource 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-databricks-processors-nar ## Description Delete a DBFS files and directories. ## Tags databricks, dbfs, openflow ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description DBFS File Path DBFS file path e.g. /directory/file.txt Databricks Client Databricks Client Service.
## Relationships
Name Description failure Databricks failure relationship success Databricks success relationship
## Writes attributes
Name Description error.code The error code for the SQL statement if an error occurred. error.message The error message for the SQL statement if an error occurred.
--- title: DeleteDynamoDB 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deletedynamodb.md section: Loading & Unloading Data --- # DeleteDynamoDB 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Deletes a document from DynamoDB based on hash and range key. The key can be string or number. The request requires all the primary keys for the operation (hash or hash and range key) ## Tags AWS, Amazon, Delete, DynamoDB, Remove ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider Batch items for each request (between 1 and 50) The items to be retrieved in one batch Communications Timeout Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. Hash Key Name The hash key name of the item Hash Key Value The hash key value of the item Hash Key Value Type The hash key value type of the item Range Key Name The range key name of the item Range Key Value Range Key Value Type The range key value type of the item Region SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections Table Name The DynamoDB table name proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure FlowFiles are routed to failure relationship success FlowFiles are routed to success relationship unprocessed FlowFiles are routed to unprocessed relationship when DynamoDB is not able to process all the items in the request. Typical reasons are insufficient table throughput capacity and exceeding the maximum bytes per request. Unprocessed FlowFiles can be retried with a new request.
## Writes attributes
Name Description dynamodb.key.error.unprocessed DynamoDB unprocessed keys dynmodb.range.key.value.error DynamoDB range key error dynamodb.key.error.not.found DynamoDB key not found dynamodb.error.exception.message DynamoDB exception message dynamodb.error.code DynamoDB error code dynamodb.error.message DynamoDB error message dynamodb.error.service DynamoDB error service dynamodb.error.retryable DynamoDB error is retryable dynamodb.error.request.id DynamoDB error request id dynamodb.error.status.code DynamoDB status code
## See also - [org.apache.nifi.processors.aws.dynamodb.GetDynamoDB](/user-guide/data-integration/openflow/processors/getdynamodb) - [org.apache.nifi.processors.aws.dynamodb.PutDynamoDB](/user-guide/data-integration/openflow/processors/putdynamodb) - [org.apache.nifi.processors.aws.dynamodb.PutDynamoDBRecord](/user-guide/data-integration/openflow/processors/putdynamodbrecord) --- title: DeleteFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deletefile.md section: Loading & Unloading Data --- # DeleteFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Deletes a file from the filesystem. ## Tags delete, file, files, filesystem, local, remove ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Directory Path The path to the directory the file to delete is located in. Filename The name of the file to delete.
## Restrictions
Required Permission Explanation read filesystem Provides operator the ability to read from any file that NiFi has access to. write filesystem Provides operator the ability to delete any file that NiFi has access to.
## Relationships
Name Description failure All FlowFiles, for which an existing file could not be deleted, are routed to this relationship not found All FlowFiles, for which the file to delete did not exist, are routed to this relationship success All FlowFiles, for which an existing file has been deleted, are routed to this relationship
## Use cases | Delete source file only after its processing completed | | ------------------------------------------------------ | --- title: DeleteGCSObject 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deletegcsobject.md section: Loading & Unloading Data --- # DeleteGCSObject 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Deletes objects from a Google Cloud Bucket. If attempting to delete a file that does not exist, FlowFile is routed to success. ## Tags delete, gcs, google, google cloud, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description GCP Credentials Provider Service The Controller Service used to obtain Google Cloud Platform credentials. gcp-project-id Google Cloud Project ID gcp-retry-count How many retry attempts should be made before routing to the failure relationship. gcs-bucket Bucket of the object. gcs-generation The generation of the object to be deleted. If null, will use latest version of the object. gcs-key Name of the object. proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. storage-api-url Overrides the default storage URL. Configuring an alternative Storage API URL also overrides the HTTP Host header on requests as described in the Google documentation for Private Service Connections.
## Relationships
Name Description failure FlowFiles are routed to this relationship if the Google Cloud Storage operation fails. success FlowFiles are routed to this relationship after a successful Google Cloud Storage operation.
## See also - [org.apache.nifi.processors.gcp.storage.FetchGCSObject](/user-guide/data-integration/openflow/processors/fetchgcsobject) - [org.apache.nifi.processors.gcp.storage.ListGCSBucket](/user-guide/data-integration/openflow/processors/listgcsbucket) - [org.apache.nifi.processors.gcp.storage.PutGCSObject](/user-guide/data-integration/openflow/processors/putgcsobject) --- title: DeleteGridFS 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deletegridfs.md section: Loading & Unloading Data --- # DeleteGridFS 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-mongodb-nar ## Description Deletes a file from GridFS using a file name or a query. ## Tags delete, gridfs, mongodb ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description delete-gridfs-query A valid MongoDB query to use to find and delete one or more files from GridFS. gridfs-bucket-name The GridFS bucket where the files will be stored. If left blank, it will use the default value 'fs' that the MongoDB client driver uses. gridfs-client-service The MongoDB client service to use for database connections. gridfs-database-name The name of the database to use gridfs-file-name The name of the file in the bucket that is the target of this processor. GridFS file names do not include path information because GridFS does not sort files into folders within a bucket. mongo-query-attribute If set, the query will be written to a specified attribute on the output flowfiles.
## Relationships
Name Description failure When there is a failure processing the flowfile, it goes to this relationship. success When the operation succeeds, the flowfile is sent to this relationship.
--- title: DeleteMilvus 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deletemilvus.md section: Loading & Unloading Data --- # DeleteMilvus 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-milvus-processors-nar ## Description Deletes vectors from Milvus database from a collection by ID. Unmatched IDs are ignored by Milvus and not deleted. ## Tags chatbot, delete, embeddings, gen ai, genai, generative ai, llm, metadata, milvus, openflow, text, vector ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Collection Name The name of the Milvus collection name to use Delete Filter The filter to use in the delete request. Example: id like "prefix%" Delete Strategy The strategy to use for deleting vectors in Milvus ID Record Path The path to the ID field in the record Milvus Connection Service Connection Service for accessing Milvus Database Partition Partition of the vector database that you want to perform operations in. If the database has only one partition leave empty. Record Reader The Record Reader to use for reading the FlowFile
## Relationships
Name Description failure FlowFiles that cannot be sent to Milvus, and for which a retry is not expected to be successful, are routed to this relationship retry FlowFiles that fail to be sent to Milvus, but for which a retry may help, are routed to this relationship success FlowFiles that are successfully sent to Milvus are routed to this relationship
## See also - [com.snowflake.openflow.runtime.processors.milvus.UpsertMilvus](/user-guide/data-integration/openflow/processors/upsertmilvus) --- title: DeleteMongo 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deletemongo.md section: Loading & Unloading Data --- # DeleteMongo 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-mongodb-nar ## Description Executes a delete query against a MongoDB collection. The query is provided in the body of the flowfile and the user can select whether it will delete one or many documents that match it. ## Tags delete, mongo, mongodb ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Mongo Collection Name The name of the collection to use Mongo Database Name The name of the database to use delete-mongo-delete-mode Choose between deleting one document by query or many documents by query. delete-mongo-fail-on-no-delete Determines whether to send the flowfile to the success or failure relationship if nothing is successfully deleted. mongo-client-service If configured, this property will use the assigned client service for connection pooling.
## Relationships
Name Description failure All FlowFiles that cannot be written to MongoDB are routed to this relationship success All FlowFiles that are written to MongoDB are routed to this relationship
--- title: DeletePinecone 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deletepinecone.md section: Loading & Unloading Data --- # DeletePinecone 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-pinecone-nar ## Description Deletes vectors from a Pinecone index. ## Tags delete, embeddings, genai, generative ai, openflow, pinecone, rag, retrieval augmented generation, vector store ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description ID Prefix The Pinecone vector ID prefix. If specified, only the vectors whose IDs start with the given value will be deleted. Pinecone API Key The API key for the Pinecone service Pinecone Index The name of the Pinecone index to use Pinecone Namespace The name of the Pinecone namespace to use Web Client Service The Web Client Service to use for communicating with Pinecone
## Relationships
Name Description failure FlowFiles that cannot be sent to Pinecone, and for which a retry is not expected to be successful, are routed to this relationship retry FlowFiles that fail to be sent to Pinecone, but for which a retry may help, are routed to this relationship success FlowFiles that are successfully sent to Pinecone are routed to this relationship
## Use cases | Delete all vectors from a Pinecone index. | | ------------------------------------------------------------------------- | | Delete a namespace, along with all of its vectors, from a Pinecone index. | | Delete all vectors for a particular document from a Pinecone index. | --- title: DeleteQueryJob 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deletequeryjob.md section: Loading & Unloading Data --- # DeleteQueryJob 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-salesforce-processors-nar ## Description Deletes a Query Job in Salesforce using the Bulk API 2.0. ## Tags bulk, delete, job, preview, query, salesforce ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Job ID The ID of the job for which the status is checked. Salesforce Client Salesforce Client to interact with the APIs
## Relationships
Name Description comms.failure A FlowFile is routed to this relationship if the Query Job status could not be retrieved but the operation might be retried failure A FlowFile is routed to this relationship if the Query Job status could not be retrieved success If the Query Job has been successfully deleted, the FlowFile is routed to this relationship
## See also - [com.snowflake.openflow.runtime.processors.salesforce.AbortQueryJob](/user-guide/data-integration/openflow/processors/abortqueryjob) - [com.snowflake.openflow.runtime.processors.salesforce.GetQueryJobResult](/user-guide/data-integration/openflow/processors/getqueryjobresult) - [com.snowflake.openflow.runtime.processors.salesforce.GetQueryJobStatus](/user-guide/data-integration/openflow/processors/getqueryjobstatus) - [com.snowflake.openflow.runtime.processors.salesforce.SubmitQueryJob](/user-guide/data-integration/openflow/processors/submitqueryjob) --- title: DeleteS3Object 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deletes3object.md section: Loading & Unloading Data --- # DeleteS3Object 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Deletes a file from an Amazon S3 Bucket. If attempting to delete a file that does not exist, FlowFile is routed to success. ## Tags AWS, Amazon, Archive, Delete, S3 ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider Bucket The S3 Bucket to interact with Communications Timeout The amount of time to wait in order to establish a connection to AWS or receive data from AWS before timing out. Custom Signer Class Name Fully qualified class name of the custom signer class. The signer must implement com.amazonaws.auth. Signer interface. Custom Signer Module Location Comma-separated list of paths to files and/or directories which contain the custom signer's JAR file and its dependencies (if any). Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. FullControl User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Full Control for an object Object Key The S3 Object Key to use. This is analogous to a filename for traditional file systems. Owner The Amazon ID to use for the object's owner Read ACL User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have permissions to read the Access Control List for an object Read Permission User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Read Access for an object Region The AWS Region to connect to. SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections Signer Override The AWS S3 library uses Signature Version 4 by default but this property allows you to specify the Version 2 signer to support older S3-compatible services or even to plug in your own custom signer implementation. Version The Version of the Object to delete Write ACL User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have permissions to change the Access Control List for an object Write Permission User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Write Access for an object proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure If the Processor is unable to process a given FlowFile, it will be routed to this Relationship. success FlowFiles are routed to this Relationship after they have been successfully processed.
## Writes attributes
Name Description s3.exception The class name of the exception thrown during processor execution s3.additionalDetails The S3 supplied detail from the failed operation s3.statusCode The HTTP error code (if available) from the failed operation s3.errorCode The S3 moniker of the failed operation s3.errorMessage The S3 exception message from the failed operation
## See also - [org.apache.nifi.processors.aws.s3.CopyS3Object](/user-guide/data-integration/openflow/processors/copys3object) - [org.apache.nifi.processors.aws.s3.FetchS3Object](/user-guide/data-integration/openflow/processors/fetchs3object) - [org.apache.nifi.processors.aws.s3.GetS3ObjectMetadata](/user-guide/data-integration/openflow/processors/gets3objectmetadata) - [org.apache.nifi.processors.aws.s3.GetS3ObjectTags](/user-guide/data-integration/openflow/processors/gets3objecttags) - [org.apache.nifi.processors.aws.s3.ListS3](/user-guide/data-integration/openflow/processors/lists3) - [org.apache.nifi.processors.aws.s3.PutS3Object](/user-guide/data-integration/openflow/processors/puts3object) - [org.apache.nifi.processors.aws.s3.TagS3Object](/user-guide/data-integration/openflow/processors/tags3object) --- title: DeleteSFTP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deletesftp.md section: Loading & Unloading Data --- # DeleteSFTP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Deletes a file residing on an SFTP server. ## Tags delete, remote, remove, sftp ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Algorithm Negotiation Configuration strategy for SSH algorithm negotiation Batch Size The maximum number of FlowFiles to send in a single connection Ciphers Allowed A comma-separated list of Ciphers allowed for SFTP connections. Leave unset to allow all. Available options are: 3des-cbc, aes128-cbc, aes128-ctr, [aes128-gcm@openssh.com](mailto:aes128-gcm@openssh.com), aes192-cbc, aes192-ctr, aes256-cbc, aes256-ctr, [aes256-gcm@openssh.com](mailto:aes256-gcm@openssh.com), arcfour128, arcfour256, blowfish-cbc, [chacha20-poly1305@openssh.com](mailto:chacha20-poly1305@openssh.com), none Connection Timeout Amount of time to wait before timing out while creating a connection Data Timeout When transferring a file between the local and remote system, this value specifies how long is allowed to elapse without any data being transferred between systems Directory Path The path to the directory the file to delete is located in. Filename The name of the file to delete. Host Key File If supplied, the given file will be used as the Host Key; otherwise, if 'Strict Host Key Checking' property is applied (set to true) then uses the 'known_hosts' and 'known_hosts2' files from ~/.ssh directory else no host key file will be used Hostname The fully qualified hostname or IP address of the remote system Key Algorithms Allowed A comma-separated list of Key Algorithms allowed for SFTP connections. Leave unset to allow all. Available options are: ecdsa-sha2-nistp256, [ecdsa-sha2-nistp256-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp256-cert-v01@openssh.com), ecdsa-sha2-nistp384, [ecdsa-sha2-nistp384-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp384-cert-v01@openssh.com), ecdsa-sha2-nistp521, [ecdsa-sha2-nistp521-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp521-cert-v01@openssh.com), rsa-sha2-256, [rsa-sha2-256-cert-v01@openssh.com](mailto:rsa-sha2-256-cert-v01@openssh.com), rsa-sha2-512, [rsa-sha2-512-cert-v01@openssh.com](mailto:rsa-sha2-512-cert-v01@openssh.com), [sk-ecdsa-sha2-nistp256@openssh.com](mailto:sk-ecdsa-sha2-nistp256@openssh.com), [sk-ssh-ed25519@openssh.com](mailto:sk-ssh-ed25519@openssh.com), ssh-dss, [ssh-dss-cert-v01@openssh.com](mailto:ssh-dss-cert-v01@openssh.com), ssh-ed25519, [ssh-ed25519-cert-v01@openssh.com](mailto:ssh-ed25519-cert-v01@openssh.com), ssh-rsa, [ssh-rsa-cert-v01@openssh.com](mailto:ssh-rsa-cert-v01@openssh.com) Key Exchange Algorithms Allowed A comma-separated list of Key Exchange Algorithms allowed for SFTP connections. Leave unset to allow all. Available options are: curve25519-sha256, [curve25519-sha256@libssh.org](mailto:curve25519-sha256@libssh.org), curve448-sha512, diffie-hellman-group-exchange-sha1, diffie-hellman-group-exchange-sha256, diffie-hellman-group1-sha1, diffie-hellman-group14-sha1, diffie-hellman-group14-sha256, diffie-hellman-group15-sha512, diffie-hellman-group16-sha512, diffie-hellman-group17-sha512, diffie-hellman-group18-sha512, ecdh-sha2-nistp256, ecdh-sha2-nistp384, ecdh-sha2-nistp521, mlkem1024nistp384-sha384, mlkem768nistp256-sha256, mlkem768x25519-sha256, sntrup761x25519-sha512, [sntrup761x25519-sha512@openssh.com](mailto:sntrup761x25519-sha512@openssh.com) Message Authentication Codes Allowed A comma-separated list of Message Authentication Codes allowed for SFTP connections. Leave unset to allow all. Available options are: hmac-md5, hmac-md5-96, hmac-sha1, hmac-sha1-96, [hmac-sha1-etm@openssh.com](mailto:hmac-sha1-etm@openssh.com), hmac-sha2-256, [hmac-sha2-256-etm@openssh.com](mailto:hmac-sha2-256-etm@openssh.com), hmac-sha2-512, [hmac-sha2-512-etm@openssh.com](mailto:hmac-sha2-512-etm@openssh.com) Password Password for the user account Port The port that the remote system is listening on for file transfers Private Key Passphrase Password for the private key Private Key Path The fully qualified path to the Private Key file Send Keep Alive On Timeout Send a Keep Alive message every 5 seconds up to 5 times for an overall timeout of 25 seconds. Strict Host Key Checking Indicates whether or not strict enforcement of hosts keys should be applied Use Compression Indicates whether or not ZLIB compression should be used when transferring files Username Username proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure All FlowFiles, for which an existing file could not be deleted, are routed to this relationship not found All FlowFiles, for which the file to delete did not exist, are routed to this relationship success All FlowFiles, for which an existing file has been deleted, are routed to this relationship
## Use cases | Delete source file only after its processing completed | | ------------------------------------------------------ | --- title: DeleteSQS 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deletesqs.md section: Loading & Unloading Data --- # DeleteSQS 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Deletes a message from an Amazon Simple Queuing Service Queue ## Tags AWS, Amazon, Delete, Queue, SQS ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider Communications Timeout Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. Queue URL The URL of the queue delete from Receipt Handle The identifier that specifies the receipt of the message Region SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure FlowFiles are routed to failure relationship success FlowFiles are routed to success relationship
## See also - [org.apache.nifi.processors.aws.sqs.GetSQS](/user-guide/data-integration/openflow/processors/getsqs) - [org.apache.nifi.processors.aws.sqs.PutSQS](/user-guide/data-integration/openflow/processors/putsqs) --- title: DeleteUnityCatalogResource 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/deleteunitycatalogresource.md section: Loading & Unloading Data --- # DeleteUnityCatalogResource 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-databricks-processors-nar ## Description Delete a Unity Catalog file or directory. ## Tags databricks, openflow, unity catalog ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Databricks Client Databricks Client Service. Missing Resource Policy What to action to take if the resource is not found. Unity Catalog Resource Path Unity Catalog resource path e.g. /Volumes/catalog/schema/volume_name/path
## Relationships
Name Description failure Databricks failure relationship success Databricks success relationship
## Writes attributes
Name Description error.code The error code for the SQL statement if an error occurred. error.message The error message for the SQL statement if an error occurred.
--- title: DescribeDataShare 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/describedatashare.md section: Loading & Unloading Data --- # DescribeDataShare 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-salesforce-processors-nar ## Description Describe the specified data share metadata in Salesforce Data Cloud. ## Tags daas, data cloud, describe, object, preview, salesforce, sfdc ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Data Share Name The name of the Data Share to describe. Salesforce Data Cloud Client Salesforce Data Cloud Client to interact with the APIs
## Relationships
Name Description comms.failure A FlowFile is routed to this relationship if the data share metadata could not be retrieved but the operation might be retried failure A FlowFile is routed to this relationship if the data share metadata could not be retrieved success FlowFile containing the data share metadata will be routed to this relationship
## Writes attributes
Name Description explicitDataLakeObjects Comma-separated list of the names of the explicit data lake objects. implicitDataLakeObjects Comma-separated list of the names of the implicit data lake objects. dataModelObjects Comma-separated list of the names of the data model objects. calculatedInsightObjects Comma-separated list of the names of the calculated insights objects.
## See also - [com.snowflake.openflow.runtime.processors.salesforce.ListSFDCDataShares](/user-guide/data-integration/openflow/processors/listsfdcdatashares) --- title: DescribeSFDCObject 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/describesfdcobject.md section: Loading & Unloading Data --- # DescribeSFDCObject 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-salesforce-processors-nar ## Description Describe the specified object metadata in Salesforce. ## Tags describe, object, preview, salesforce, sfdc ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Object Fields Filter JSON JSON representation describing which fields to include or exclude for Salesforce objects. Object Name The name of the object to describe. Salesforce Client Salesforce Client to interact with the APIs
## Relationships
Name Description comms.failure A FlowFile is routed to this relationship if the object metadata could not be retrieved but the operation might be retried failure A FlowFile is routed to this relationship if the object metadata could not be retrieved success FlowFile containing the object metadata will be routed to this relationship
## Writes attributes
Name Description sObjectFields Comma-separated list of the fields of the object (without non-queryable fields). sObjectExcludedFields Comma-separated list of the non-queryable fields of the object. sObjectSchema The schema associated to the object based on its fields (without non-queryable fields).
## See also - [com.snowflake.openflow.runtime.processors.salesforce.AbortQueryJob](/user-guide/data-integration/openflow/processors/abortqueryjob) - [com.snowflake.openflow.runtime.processors.salesforce.DeleteQueryJob](/user-guide/data-integration/openflow/processors/deletequeryjob) - [com.snowflake.openflow.runtime.processors.salesforce.GetQueryJobResult](/user-guide/data-integration/openflow/processors/getqueryjobresult) - [com.snowflake.openflow.runtime.processors.salesforce.ListSFDCObjects](/user-guide/data-integration/openflow/processors/listsfdcobjects) --- title: DetectDuplicate 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/detectduplicate.md section: Loading & Unloading Data --- # DetectDuplicate 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Caches a value, computed from FlowFile attributes, for each incoming FlowFile and determines if the cached value has already been seen. If so, routes the FlowFile to 'duplicate' with an attribute named 'original.identifier' that specifies the original FlowFile 's "description", which is specified in the <FlowFile Description> property. If the FlowFile is not determined to be a duplicate, the Processor routes the FlowFile to' non-duplicate' ## Tags dedupe, dupe, duplicate, hash ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Age Off Duration Time interval to age off cached FlowFiles Cache Entry Identifier A FlowFile attribute, or the results of an Attribute Expression Language statement, which will be evaluated against a FlowFile in order to determine the value used to identify duplicates; it is this value that is cached Cache The Entry Identifier When true this cause the processor to check for duplicates and cache the Entry Identifier. When false, the processor would only check for duplicates and not cache the Entry Identifier, requiring another processor to add identifiers to the distributed cache. Distributed Cache Service The Controller Service that is used to cache unique identifiers, used to determine duplicates FlowFile Description When a FlowFile is added to the cache, this value is stored along with it so that if a duplicate is found, this description of the original FlowFile will be added to the duplicate's "original.flowfile.description" attribute
## Relationships
Name Description duplicate If a FlowFile has been detected to be a duplicate, it will be routed to this relationship failure If unable to communicate with the cache, the FlowFile will be penalized and routed to this relationship non-duplicate If a FlowFile's Cache Entry Identifier was not found in the cache, it will be routed to this relationship
## Writes attributes
Name Description original.flowfile.description All FlowFiles routed to the duplicate relationship will have an attribute added named original.flowfile.description. The value of this attribute is determined by the attributes of the original copy of the data and by the FlowFile Description property.
## See also --- title: DeveloperBoxClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/developerboxclientservice.md section: Loading & Unloading Data --- # DeveloperBoxClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides Box client objects through which Box API calls can be used. This using a developer token and is for testing only. ## Tags box, client, provider ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Developer Token * Developer Token The Developer Token to use to interact with the Box API. This is for testing only and should not be used in production.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: DistributedMapCacheLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/distributedmapcachelookupservice.md section: Loading & Unloading Data --- # DistributedMapCacheLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Allows to choose a distributed map cache client to retrieve the value associated to a key. The coordinates that are passed to the lookup must contain the key 'key'. ## Tags cache, distributed, enrich, key, lookup, map, value ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Character Encoding * character-encoding UTF-8 - ISO-8859-1 - UTF-8 - UTF-16 - UTF-16LE - UTF-16BE - US-ASCII Specifies a character encoding to use. Distributed Cache Service * distributed-map-cache-service The Controller Service that is used to get the cached values.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: DistributeLoad 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/distributeload.md section: Loading & Unloading Data --- # DistributeLoad 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Distributes FlowFiles to downstream processors based on a Distribution Strategy. If using the Round Robin strategy, the default is to assign each destination a weighting of 1 (evenly distributed). However, optional properties can be added to the change this; adding a property with the name '5' and value '10' means that the relationship with name '5' will be receive 10 FlowFiles in each iteration instead of 1. ## Tags distribute, load balance, round robin, route, weighted ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Distribution Strategy Determines how the load will be distributed. Relationship weight is in numeric order where '1' has the greatest weight. Number of Relationships Determines the number of Relationships to which the load should be distributed
## Relationships
Name Description 1 Where to route flowfiles for this relationship index
## Writes attributes
Name Description distribute.load.relationship The name of the specific relationship the FlowFile has been routed through
--- title: DuplicateFlowFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/duplicateflowfile.md section: Loading & Unloading Data --- # DuplicateFlowFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Intended for load testing, this processor will create the configured number of copies of each incoming FlowFile. The original FlowFile as well as all generated copies are sent to the 'success' relationship. In addition, each FlowFile gets an attribute 'copy.index'set to the copy number, where the original FlowFile gets a value of zero, and all copies receive incremented integer values. ## Tags duplicate, load, test ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Number of Copies Specifies how many copies of each incoming FlowFile will be made
## Relationships
Name Description success The original FlowFile and all copies will be sent to this relationship
## Writes attributes
Name Description copy.index A zero-based incrementing integer value based on which copy the FlowFile is.
--- title: ElasticSearchClientServiceImpl source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/elasticsearchclientserviceimpl.md section: Loading & Unloading Data --- # ElasticSearchClientServiceImpl This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A controller service for accessing an Elasticsearch client, using the Elasticsearch (low-level) REST Client. ## Tags client, elasticsearch, elasticsearch6, elasticsearch7, elasticsearch8 ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description API Key * API Key Encoded API key. API Key ID * API Key ID Unique identifier of the API key. Authorization Scheme * Authorization Scheme BASIC - None - PKI - Basic - API Key - JWT Authorization Scheme used for optional authentication to Elasticsearch. Character Set * Character Set UTF-8 The charset to use for interpreting the response from Elasticsearch. Connect timeout * Connect timeout 5000 Controls the amount of time, in milliseconds, before a timeout occurs when trying to connect. Enable Compression * Enable Compression false - true - false Whether the REST client should compress requests using gzip content encoding and add the "Accept-Encoding: gzip" header to receive compressed responses HTTP Hosts * HTTP Hosts A comma-separated list of HTTP hosts that host Elasticsearch query nodes.The HTTP Hosts should be valid URIs including protocol, domain and port for each entry.For example "[https://elasticsearch1:9200](https://elasticsearch1:9200), [https://elasticsearch2:9200](https://elasticsearch2:9200)".Note that the Host is included in requests as a header (typically including domain and port, e.g. elasticsearch:9200). JWT Shared Secret * JWT Shared Secret JWT realm Shared Secret. Node Selector * Node Selector ANY - Any - Skip Dedicated Masters Selects Elasticsearch nodes that can receive requests. Used to keep requests away from dedicated Elasticsearch master nodes OAuth2 Access Token Provider * OAuth2 Access Token Provider The OAuth2 Access Token Provider used to provide JWTs for Bearer Token Authorization with Elasticsearch. Password * Password The password to use with XPack security. Path Prefix Path Prefix Sets the path's prefix for every request used by the http client. For example, if this is set to "/my/path", then any client request will become "/my/path/" + endpoint. In essence, every request's endpoint is prefixed by this pathPrefix. The path prefix is useful for when Elasticsearch is behind a proxy that provides a base path or a proxy that requires all paths to start with '/'; it is not intended for other purposes and it should not be supplied in other scenarios Read Timeout * Read Timeout 60000 Controls the amount of time, in milliseconds, before a timeout occurs when waiting for a response. Run As User Run As User The username to impersonate within Elasticsearch. SSL Context Service SSL Context Service The SSL Context Service used to provide client certificate information for TLS/SSL connections. This service only applies if the Elasticsearch endpoint(s) have been secured with TLS/SSL. Send Meta Header * Send Meta Header true - true - false Whether to send a "X-Elastic-Client-Meta" header that describes the runtime environment. It contains information that is similar to what could be found in User-Agent. Using a separate header allows applications to use User-Agent for their own needs, e.g. to identify application version or other environment information Sniff Cluster Nodes * Sniff Cluster Nodes false - true - false Periodically sniff for nodes within the Elasticsearch cluster via the Elasticsearch Node Info API. If Elasticsearch security features are enabled (default to "true" for 8.x+), the Elasticsearch user must have the "monitor" or "manage" cluster privilege to use this API.Note that all HTTP Hosts (and those that may be discovered within the cluster using the Sniffer) must use the same protocol, e.g. http or https, and be contactable using the same client settings. Finally the Elasticsearch "network.publish_host" must match one of the "network.bind_host" list entries see [https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html) for more information Sniff on Failure * Sniff on Failure false - true - false Enable sniffing on failure, meaning that after each failure the Elasticsearch nodes list gets updated straight away rather than at the following ordinary sniffing round Sniffer Failure Delay * Sniffer Failure Delay 1 min Delay between an Elasticsearch request failure and updating available Cluster nodes using the Sniffer Sniffer Interval * Sniffer Interval 5 mins Interval between Cluster sniffer operations Sniffer Request Timeout * Sniffer Request Timeout 1 sec Cluster sniffer timeout for node info requests Strict Deprecation * Strict Deprecation false - true - false Whether the REST client should return any response containing at least one warning header as a failure Suppress Null and Empty Values * Suppress Null and Empty Values always-suppress - Never Suppress - Always Suppress Specifies how the writer should handle null and empty fields (including objects and arrays) Username * Username The username to use with XPack security. Proxy Configuration Service proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: ElasticSearchLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/elasticsearchlookupservice.md section: Loading & Unloading Data --- # ElasticSearchLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Lookup a record from Elasticsearch Server associated with the specified document ID. The coordinates that are passed to the lookup must contain the key 'id'. ## Tags elasticsearch, enrich, lookup, record ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Client Service * Client Service An ElasticSearch client service to use for running queries. Index * Index The name of the index to read from Schema Access Strategy * Schema Access Strategy infer - Use 'Schema Name' Property - Use 'Schema Text' Property - Infer from Result Specifies how to obtain the schema that is to be used for interpreting the data. Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored. Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved. Type Type The type of this document (used by Elasticsearch for indexing and searching)
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: ElasticSearchStringLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/elasticsearchstringlookupservice.md section: Loading & Unloading Data --- # ElasticSearchStringLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Lookup a string value from Elasticsearch Server associated with the specified document ID. The coordinates that are passed to the lookup must contain the key 'id'. ## Tags elasticsearch, enrich, key, lookup, value ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Client Service * Client Service An ElasticSearch client service to use for running queries. Index * Index The name of the index to read from Type Type The type of this document (used by Elasticsearch for indexing and searching)
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: EmailRecordSink source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/emailrecordsink.md section: Loading & Unloading Data --- # EmailRecordSink This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a RecordSinkService that can be used to send records in email using the specified writer for formatting. ## Tags email, record, send, sink, smtp, write ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description BCC bcc The recipients to include in the BCC-Line of the email. Comma separated sequence of addresses following RFC822 syntax. CC cc The recipients to include in the CC-Line of the email. Comma separated sequence of addresses following RFC822 syntax. From * from Specifies the Email address to use as the sender. Comma separated sequence of addresses following RFC822 syntax. Record Writer * record-sink-record-writer Specifies the Controller Service to use for writing out the records. SMTP Auth * smtp-auth true Flag indicating whether authentication should be used SMTP Hostname * smtp-hostname The hostname of the SMTP Server that is used to send Email Notifications SMTP Password smtp-password Password for the SMTP account SMTP Port * smtp-port 25 The Port used for SMTP communications SMTP SSL * smtp-ssl false Flag indicating whether SSL should be enabled SMTP STARTTLS * smtp-starttls false Flag indicating whether STARTTLS should be enabled. If the server does not support STARTTLS, the connection continues without the use of TLS SMTP Username smtp-username Username for the SMTP account SMTP X-Mailer Header * smtp-xmailer-header NiFi X-Mailer used in the header of the outgoing email Subject * subject Message from NiFi The email subject To to The recipients to include in the To-Line of the email. Comma separated sequence of addresses following RFC822 syntax.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: EmbeddedHazelcastCacheManager source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/embeddedhazelcastcachemanager.md section: Loading & Unloading Data --- # EmbeddedHazelcastCacheManager This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A service that runs embedded Hazelcast and provides cache instances backed by that. The server does not ask for authentication, it is recommended to run it within secured network. ## Tags cache, hazelcast ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Hazelcast Cluster Name * hazelcast-cluster-name nifi Name of the Hazelcast cluster. Hazelcast Clustering Strategy * hazelcast-clustering-strategy none - None - All Nodes - Explicit Specifies with what strategy the Hazelcast cluster should be created. Hazelcast Instances hazelcast-instances Only used with "Explicit" Clustering Strategy! List of NiFi instance host names which should be part of the Hazelcast cluster. Host names are separated by comma. The port specified in the "Hazelcast Port" property will be used as server port. The list must contain every instance that will be part of the cluster. Other instances will join the Hazelcast cluster as clients. Hazelcast Port * hazelcast-port 5701 Port for the Hazelcast instance to use.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: EncodeContent 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/encodecontent.md section: Loading & Unloading Data --- # EncodeContent 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Encode or decode the contents of a FlowFile using Base64, Base32, or hex encoding schemes ## Tags base32, base64, decode, encode, hex ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Encoded Line Length Each line of encoded data will contain up to the configured number of characters, rounded down to the nearest multiple of 4. Encoding Specifies the type of encoding used. Line Output Mode Controls the line formatting for encoded content based on selected property values. Mode Specifies whether the content should be encoded or decoded.
## Relationships
Name Description failure Any FlowFile that cannot be encoded or decoded will be routed to failure success Any FlowFile that is successfully encoded or decoded will be routed to success
--- title: EncryptContentAge 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/encryptcontentage.md section: Loading & Unloading Data --- # EncryptContentAge 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-cipher-nar ## Description Encrypt content using the age-encryption.org/v1 specification. Supports binary or ASCII armored content encoding using configurable properties. The age standard uses ChaCha20-Poly1305 for authenticated encryption of the payload. The age-keygen command supports generating X25519 key pairs for encryption and decryption operations. ## Tags ChaCha20-Poly1305, X25519, age, age-encryption.org, encryption ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description File Encoding Output encoding for encrypted files. Binary encoding provides optimal processing performance. Public Key Recipient Resources One or more files or URLs containing X25519 Public Key Recipients, separated with newlines, encoded according to the age specification, starting with age1 Public Key Recipients One or more X25519 Public Key Recipients, separated with newlines, encoded according to the age specification, starting with age1 Public Key Source Source of information determines the loading strategy for X25519 Public Key Recipients
## Relationships
Name Description failure Encryption Failed success Encryption Completed
## See also - [org.apache.nifi.processors.cipher.DecryptContentAge](/user-guide/data-integration/openflow/processors/decryptcontentage) --- title: EncryptContentPGP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/encryptcontentpgp.md section: Loading & Unloading Data --- # EncryptContentPGP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-pgp-nar ## Description Encrypt contents using OpenPGP. The processor reads input and detects OpenPGP messages to avoid unnecessary additional wrapping in Literal Data packets. ## Tags Encryption, GPG, OpenPGP, PGP, RFC 4880 ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description file-encoding File Encoding for encryption passphrase Passphrase used for encrypting data with Password-Based Encryption public-key-search PGP Public Key Search will be used to match against the User ID or Key ID when formatted as uppercase hexadecimal string of 16 characters public-key-service PGP Public Key Service for encrypting data with Public Key Encryption symmetric-key-algorithm Symmetric-Key Algorithm for encryption
## Relationships
Name Description failure Encryption Failed success Encryption Succeeded
## Writes attributes
Name Description pgp.symmetric.key.algorithm Symmetric-Key Algorithm pgp.symmetric.key.algorithm.block.cipher Symmetric-Key Algorithm Block Cipher pgp.symmetric.key.algorithm.key.size Symmetric-Key Algorithm Key Size pgp.symmetric.key.algorithm.id Symmetric-Key Algorithm Identifier pgp.file.encoding File Encoding pgp.compression.algorithm Compression Algorithm pgp.compression.algorithm.id Compression Algorithm Identifier
## See also - [org.apache.nifi.processors.pgp.DecryptContentPGP](/user-guide/data-integration/openflow/processors/decryptcontentpgp) - [org.apache.nifi.processors.pgp.SignContentPGP](/user-guide/data-integration/openflow/processors/signcontentpgp) - [org.apache.nifi.processors.pgp.VerifyContentPGP](/user-guide/data-integration/openflow/processors/verifycontentpgp) --- title: EnforceOrder 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/enforceorder.md section: Loading & Unloading Data --- # EnforceOrder 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Enforces expected ordering of FlowFiles that belong to the same data group within a single node. Although PriorityAttributePrioritizer can be used on a connection to ensure that flow files going through that connection are in priority order, depending on error-handling, branching, and other flow designs, it is possible for FlowFiles to get out-of-order. EnforceOrder can be used to enforce original ordering for those FlowFiles. [IMPORTANT] In order to take effect of EnforceOrder, FirstInFirstOutPrioritizer should be used at EVERY downstream relationship UNTIL the order of FlowFiles physically get FIXED by operation such as MergeContent or being stored to the final destination. ## Tags order, sort ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description batch-count The maximum number of FlowFiles that EnforceOrder can process at an execution. group-id EnforceOrder is capable of multiple ordering groups. 'Group Identifier' is used to determine which group a FlowFile belongs to. This property will be evaluated with each incoming FlowFile. If evaluated result is empty, the FlowFile will be routed to failure. inactive-timeout Indicates the duration after which state for an inactive group will be cleared from managed state. Group is determined as inactive if any new incoming FlowFile has not seen for a group for specified duration. Inactive Timeout must be longer than Wait Timeout. If a FlowFile arrives late after its group is already cleared, it will be treated as a brand new group, but will never match the order since expected preceding FlowFiles are already gone. The FlowFile will eventually timeout for waiting and routed to 'overtook'. To avoid this, group states should be kept long enough, however, shorter duration would be helpful for reusing the same group identifier again. initial-order When the first FlowFile of a group arrives, initial target order will be computed and stored in the managed state. After that, target order will start being tracked by EnforceOrder and stored in the state management store. If Expression Language is used but evaluated result was not an integer, then the FlowFile will be routed to failure, and initial order will be left unknown until consecutive FlowFiles provide a valid initial order. maximum-order If specified, any FlowFiles that have larger order will be routed to failure. This property is computed only once for a given group. After a maximum order is computed, it will be persisted in the state management store and used for other FlowFiles belonging to the same group. If Expression Language is used but evaluated result was not an integer, then the FlowFile will be routed to failure, and maximum order will be left unknown until consecutive FlowFiles provide a valid maximum order. order-attribute A name of FlowFile attribute whose value will be used to enforce order of FlowFiles within a group. If a FlowFile does not have this attribute, or its value is not an integer, the FlowFile will be routed to failure. wait-timeout Indicates the duration after which waiting FlowFiles will be routed to the 'overtook' relationship.
## State management
Scopes Description LOCAL EnforceOrder uses following states per ordering group: '<groupId>.target' is a order number which is being waited to arrive next. When a FlowFile with a matching order arrives, or a FlowFile overtakes the FlowFile being waited for because of wait timeout, target order will be updated to (FlowFile.order + 1). '<groupId>.max is the maximum order number for a group. '<groupId>.updatedAt' is a timestamp when the order of a group was updated last time. These managed states will be removed automatically once a group is determined as inactive, see 'Inactive Timeout' for detail.
## Relationships
Name Description failure A FlowFiles which does not have required attributes, or fails to compute those will be routed to this relationship overtook A FlowFile that waited for preceding FlowFiles longer than Wait Timeout and overtook those FlowFiles, will be routed to this relationship. skipped A FlowFile that has an order younger than current, which means arrived too late and skipped, will be routed to this relationship. success A FlowFile with a matching order number will be routed to this relationship. wait A FlowFile with non matching order will be routed to this relationship
## Writes attributes
Name Description EnforceOrder.startedAt All FlowFiles going through this processor will have this attribute. This value is used to determine wait timeout. EnforceOrder.result All FlowFiles going through this processor will have this attribute denoting which relationship it was routed to. EnforceOrder.detail FlowFiles routed to 'failure' or 'skipped' relationship will have this attribute describing details. EnforceOrder.expectedOrder FlowFiles routed to 'wait' or 'skipped' relationship will have this attribute denoting expected order when the FlowFile was processed.
--- title: EnrichAttributes 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/enrichattributes.md section: Loading & Unloading Data --- # EnrichAttributes 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-enrichment-nar ## Description Looks up a value using the configured Lookup Service and adds the results to the FlowFile as one or more attributes. Frequently, this is used in conjunction with the DatabaseLookup Service in order to enrich a FlowFile by querying a database and adding the results as attributes. ## Tags attributes, database, enrichment, json, lookup, openflow ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Attribute Name The name of the attribute to add, whose contents will be the JSON representation of the Record returned from the Lookup Service. Attribute Prefix A prefix to apply to all attribute names that are added. Flattening Strategy When a Record is returned from the Lookup Service, this property specifies how the Record should be flattened into the FlowFile's attributes Lookup Service The Lookup Service to use for enrichment
## Relationships
Name Description failure If unable to enrich a given FlowFile for any reason, the FlowFile will be routed to this relationship. matched FlowFiles that are successfully enriched with the Record from the Lookup Service are routed to this relationship. unmatched FlowFiles for which the Lookup Service did not find a match are routed to this relationship.
## Use cases | Query a database to retrieve information based on the attributes of a FlowFile | | ------------------------------------------------------------------------------ | ## See also --- title: EnrichCdcStream 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/enrichcdcstream.md section: Loading & Unloading Data --- # EnrichCdcStream 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-database-cdc-processors-nar ## Description Enriches incoming FlowFiles that come from CaptureChangePostgreSQL, etc. with information pertaining to which Journal Table to write to and relevant schema information. This Processor manages the schema versions for each table being processed in order to ensure that the correct Journal Table is used for each FlowFile. ## Tags ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description CDC Schema Registry Specifies the CDC Schema Registry to use for managing the schemas of the CDC data Record Reader Specifies the Record Reader to use for reading the incoming data Record Writer Specifies the Record Writer to use for writing the outgoing data Table State Service Holds the state of replicated tables
## State management
Scopes Description CLUSTER Tracks the current journal table version for each table being processed.
## Relationships
Name Description failure If any FlowFile is unable to be read, it will be routed to this Relationship. schema update If any schema update is required in order to handle incoming Records, a FlowFile is routed to this relationship. The FlowFile will include the schema information to indicate what changes are required. skipped ddl event This Relationship will be used for any DDL / Schema Change events that do not result in a change to the destination table's schema. success Rows to be inserted into the Snowflake table will be routed to this Relationship. table not in state Used when a FlowFile references a table that does not exist in the state of replicated tables, probably after it was removed from replication.
## Writes attributes
Name Description table.schema.generation The index of the journal table for incremental processing. table.schema.initial Marks the initial generation of a journal table. destination.table.schema The updated schema for the destination table. This attribute is only written for DDL events.
## See also - [com.snowflake.openflow.runtime.processors.database.CaptureChangePostgreSQL](/user-guide/data-integration/openflow/processors/capturechangepostgresql) --- title: EvaluateJsonPath 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/evaluatejsonpath.md section: Loading & Unloading Data --- # EvaluateJsonPath 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Evaluates one or more JsonPath expressions against the content of a FlowFile. The results of those expressions are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. JsonPaths are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is flowfile-attribute; otherwise, the property name is ignored). The value of the property must be a valid JsonPath expression. A Return Type of 'auto-detect' will make a determination based off the configured destination. When 'Destination' is set to 'flowfile-attribute,' a return type of 'scalar' will be used. When 'Destination' is set to 'flowfile-content,' a return type of 'JSON' will be used. If the JsonPath evaluates to a JSON array or JSON object and the Return Type is set to 'scalar' the FlowFile will be unmodified and will be routed to failure. A Return Type of JSON can return scalar values if the provided JsonPath evaluates to the specified value and will be routed as a match. If Destination is 'flowfile-content' and the JsonPath does not evaluate to a defined path, the FlowFile will be routed to 'unmatched' without having its contents modified. If Destination is 'flowfile-attribute' and the expression matches nothing, attributes will be created with empty strings as the value unless 'Path Not Found Behaviour' is set to 'skip', and the FlowFile will always be routed to 'matched.' ## Tags JSON, JsonPath, evaluate ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Destination Indicates whether the results of the JsonPath evaluation are written to the FlowFile content or a FlowFile attribute; if using attribute, must specify the Attribute Name property. If set to flowfile-content, only one JsonPath may be specified, and the property name is ignored. Max String Length The maximum allowed length of a string value when parsing the JSON document Null Value Representation Indicates the desired representation of JSON Path expressions resulting in a null value. Path Not Found Behavior Indicates how to handle missing JSON path expressions when destination is set to 'flowfile-attribute'. Selecting 'warn' will generate a warning when a JSON path expression is not found. Selecting 'skip' will omit attributes for any unmatched JSON path expressions. Return Type Indicates the desired return type of the JSON Path expressions. Selecting 'auto-detect' will set the return type to 'json' for a Destination of 'flowfile-content', and 'scalar' for a Destination of 'flowfile-attribute'.
## Relationships
Name Description failure FlowFiles are routed to this relationship when the JsonPath cannot be evaluated against the content of the FlowFile; for instance, if the FlowFile is not valid JSON matched FlowFiles are routed to this relationship when the JsonPath is successfully evaluated and the FlowFile is modified as a result unmatched FlowFiles are routed to this relationship when the JsonPath does not match the content of the FlowFile and the Destination is set to flowfile-content
--- title: EvaluateRagAnswerCorrectness 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/evaluateraganswercorrectness.md section: Loading & Unloading Data --- # EvaluateRagAnswerCorrectness 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-rag-evaluation-processors-nar ## Description Evaluates the correctness of generated answers in a Retrieval-Augmented Generation (RAG) context by computing metrics such as F1 score, cosine similarity, and answer correctness. The processor uses an LLM (e.g., OpenAI's GPT) to assess the generated answer against the ground truth. ## Tags ai, answer correctness, evaluation, llm, nlp, openai, openflow, rag ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Cosine Similarity Weight The weight to apply to the cosine similarity when calculating answer correctness (between 0.0 and 1.0) Evaluation Results Record Path The RecordPath to write the results of the evaluation to. F1 Score Weight The weight to apply to the F1 score when calculating answer correctness (between 0.0 and 1.0) Generated Answer Record Path The path to the answer field in the record Generated Answer Vector Record Path The path to the answer vector field in the record. Ground Truth Record Path The RecordPath to the ground truth field in the record. Ground Truth Vector Record Path The path to the ground truth vector field in the record. LLM Provider Service The provider service for sending evaluation prompts to LLM Question Record Path The RecordPath to the question field in the record. Record Reader The Record Reader to use for reading the FlowFile. Record Writer The Record Writer to use for writing the results.
## Relationships
Name Description failure FlowFiles that cannot be processed are routed to this relationship success FlowFiles that are successfully processed are routed to this relationship
## Writes attributes
Name Description average.f1Score The average F1 score computed over all records. average.cosineSim The average cosine similarity between the ground truth and answer embeddings. average.answerCorrectness The average answer correctness score computed over all records. json.parse.failures Number of JSON parse failures encountered.
## Use cases | Use this processor to assess the quality of answers generated by an LLM in comparison to ground truth answers, providing metrics that can be used for monitoring and improving the performance of RAG systems. | | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --- title: EvaluateRagFaithfulness 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/evaluateragfaithfulness.md section: Loading & Unloading Data --- # EvaluateRagFaithfulness 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-rag-evaluation-processors-nar ## Description Evaluates the faithfulness of generated answers in a Retrieval-Augmented Generation (RAG) system by analyzing responses using an LLM (e.g., OpenAI's GPT). The processor enriches each FlowFile record with faithfulness metrics and detailed analysis. ## Tags ai, evaluation, faithfulness, llm, nlp, openai, openflow, rag ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Context Identifier Record Path The RecordPath to the array of contexts IDs in the record. Context Record Path The RecordPath to the array of contexts in the record. Evaluation Results Record Path The RecordPath to write the results of the evaluation to. Generated Answer Record Path The path to the answer field in the record LLM Provider Service The provider service for sending evaluation prompts to LLM Question Record Path The RecordPath to the question field in the record. Record Reader The Record Reader to use for reading the FlowFile. Record Writer The Record Writer to use for writing the results.
## Relationships
Name Description failure FlowFiles that cannot be processed are routed to this relationship success FlowFiles that are successfully processed are routed to this relationship
## Writes attributes
Name Description average.answer.faithfulness The average faithfulness score computed over all records. json.parse.failures Number of JSON parse failures encountered.
## Use cases | Use this processor to assess the faithfulness of answers generated by an LLM compared to the provided context. It provides metrics that can be used for monitoring and improving the performance of RAG systems. | | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --- title: EvaluateRagRetrieval 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/evaluateragretrieval.md section: Loading & Unloading Data --- # EvaluateRagRetrieval 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-rag-evaluation-processors-nar ## Description Calculates retrieval metrics (Precision@N, Recall@N, FScore@N, MAP@N, MRR) for a RAG system using an LLM as a judge. For each record, it uses both Precision and Recall prompts to evaluate the response, and adds the metrics as attributes to the FlowFile. ## Tags evaluation, fscore, llm, metrics, mrr, openai, openflow, precision, rag, recall, retrieval ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Context Identifier Record Path The RecordPath to the array of contexts IDs in the record. Context Record Path The RecordPath to the array of contexts in the record. Evaluation Results Record Path The RecordPath to write the results of the evaluation to. Ground Truth Record Path The RecordPath to the ground truth field in the record. LLM Provider Service The provider service for sending evaluation prompts to LLM Question Record Path The RecordPath to the question field in the record. Record Reader The Record Reader to use for reading the FlowFile. Record Writer The Record Writer to use for writing the results.
## Relationships
Name Description failure FlowFiles that cannot be processed are routed to this relationship success FlowFiles that are successfully processed are routed to this relationship
## Writes attributes
Name Description n The average number of retrieved documents per query. precision.at.n The average precision at N over all queries. recall.at.n The average recall at N over all queries. fscore.at.n The average F-Score at N over all queries. mrr The Mean Reciprocal Rank. retrieval.eval.failures Number of records where the eval could not be calculated. json.parse.failures Number of JSON parse failures encountered.
--- title: EvaluateXPath 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/evaluatexpath.md section: Loading & Unloading Data --- # EvaluateXPath 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Evaluates one or more XPaths against the content of a FlowFile. The results of those XPaths are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. XPaths are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is flowfile-attribute; otherwise, the property name is ignored). The value of the property must be a valid XPath expression. If the XPath evaluates to more than one node and the Return Type is set to 'nodeset' (either directly, or via 'auto-detect' with a Destination of 'flowfile-content'), the FlowFile will be unmodified and will be routed to failure. If the XPath does not evaluate to a Node, the FlowFile will be routed to 'unmatched' without having its contents modified. If Destination is flowfile-attribute and the expression matches nothing, attributes will be created with empty strings as the value, and the FlowFile will always be routed to 'matched' ## Tags XML, XPath, evaluate ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Destination Indicates whether the results of the XPath evaluation are written to the FlowFile content or a FlowFile attribute; if using attribute, must specify the Attribute Name property. If set to flowfile-content, only one XPath may be specified, and the property name is ignored. Return Type Indicates the desired return type of the Xpath expressions. Selecting 'auto-detect' will set the return type to 'nodeset' for a Destination of 'flowfile-content', and 'string' for a Destination of 'flowfile-attribute'. Validate DTD Allow embedded Document Type Declaration in XML. This feature should be disabled to avoid XML entity expansion vulnerabilities.
## Relationships
Name Description failure FlowFiles are routed to this relationship when the XPath cannot be evaluated against the content of the FlowFile; for instance, if the FlowFile is not valid XML, or if the Return Type is 'nodeset' and the XPath evaluates to multiple nodes matched FlowFiles are routed to this relationship when the XPath is successfully evaluated and the FlowFile is modified as a result unmatched FlowFiles are routed to this relationship when the XPath does not match the content of the FlowFile and the Destination is set to flowfile-content
## Writes attributes
Name Description user-defined This processor adds user-defined attributes if the <Destination> property is set to flowfile-attribute.
--- title: EvaluateXQuery 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/evaluatexquery.md section: Loading & Unloading Data --- # EvaluateXQuery 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Evaluates one or more XQueries against the content of a FlowFile. The results of those XQueries are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. XQueries are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is 'flowfile-attribute'; otherwise, the property name is ignored). The value of the property must be a valid XQuery. If the XQuery returns more than one result, new attributes or FlowFiles (for Destinations of 'flowfile-attribute' or 'flowfile-content' respectively) will be created for each result (attributes will have a '.n' one-up number appended to the specified attribute name). If any provided XQuery returns a result, the FlowFile(s) will be routed to 'matched'. If no provided XQuery returns a result, the FlowFile will be routed to 'unmatched'. If the Destination is 'flowfile-attribute' and the XQueries matche nothing, no attributes will be applied to the FlowFile. ## Tags XML, XPath, XQuery, evaluate ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Destination Indicates whether the results of the XQuery evaluation are written to the FlowFile content or a FlowFile attribute. If set to <flowfile-content>, only one XQuery may be specified and the property name is ignored. If set to <flowfile-attribute> and the XQuery returns more than one result, multiple attributes will be added to theFlowFile, each named with a '.n' one-up number appended to the specified attribute name Output: Indent Specifies whether the processor may add additional whitespace when outputting a result tree. Output: Method Identifies the overall method that should be used for outputting a result tree. Output: Omit XML Declaration Specifies whether the processor should output an XML declaration when transforming a result tree. Validate DTD Allow embedded Document Type Declaration in XML. This feature should be disabled to avoid XML entity expansion vulnerabilities.
## Relationships
Name Description failure FlowFiles are routed to this relationship when the XQuery cannot be evaluated against the content of the FlowFile. matched FlowFiles are routed to this relationship when the XQuery is successfully evaluated and the FlowFile is modified as a result unmatched FlowFiles are routed to this relationship when the XQuery does not match the content of the FlowFile and the Destination is set to flowfile-content
## Writes attributes
Name Description user-defined This processor adds user-defined attributes if the <Destination> property is set to flowfile-attribute .
--- title: ExcelReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/excelreader.md section: Loading & Unloading Data --- # ExcelReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Parses a Microsoft Excel document returning each row in each sheet as a separate record. This reader allows for inferring a schema from all the required sheets or providing an explicit schema for interpreting the values. See Controller Service 's Usage for further documentation. This reader is capable of processing both password and non password protected .xlsx (XSSF 2007 OOXML file format) and older .xls (HSSF'97(-2007) file format) Excel documents. ## Tags cell, excel, parse, reader, record, row, spreadsheet, values, xls, xlsx ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Date Format Date Format Specifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/01/2017). Input File Type * Input File Type XLSX - XLS - XLSX Specifies type of Excel input file. Password * Password The password for a password protected Excel spreadsheet Protection Type * Protection Type UNPROTECTED - Unprotected - Password Protected Specifies whether an Excel spreadsheet is protected by a password or not. Required Sheets Required Sheets Comma-separated list of Excel document sheet names whose rows should be extracted from the excel document. If this property is left blank then all the rows from all the sheets will be extracted from the Excel document. The list of names is case sensitive. Any sheets not specified in this value will be ignored. An exception will be thrown if a specified sheet(s) are not found. Row Evaluation Strategy * Row Evaluation Strategy STANDARD - Standard - All Rows A strategy to select how many rows after the starting row to use for determining the schema. Schema Access Strategy * Schema Access Strategy Use Starting Row - Use 'Schema Name' Property - Use 'Schema Text' Property - Schema Reference Reader - Use Starting Row - Infer Schema Specifies how to obtain the schema that is to be used for interpreting the data. Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored. Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved. Starting Row * Starting Row 1 The row number of the first row to start processing (One based). Use this to skip over rows of data at the top of a worksheet that are not part of the dataset. When using the 'Use Starting Row' strategy this should be the column header row. Time Format Time Format Specifies the format to use when reading/writing Time fields. If not specified, Time fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15). Timestamp Format Timestamp Format Specifies the format to use when reading/writing Timestamp fields. If not specified, Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/01/2017 18:04:15).
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: ExecuteGroovyScript 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/executegroovyscript.md section: Loading & Unloading Data --- # ExecuteGroovyScript 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-groovyx-nar ## Description Experimental Extended Groovy script processor. The script is responsible for handling the incoming flow file (transfer to SUCCESS or remove, e.g.) as well as any flow files created by the script. If the handling is incomplete or incorrect, the session will be rolled back. ## Tags groovy, groovyx, script ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties true ## Properties
Property Description groovyx-additional-classpath Classpath list separated by semicolon or comma. You can use masks like _*_, _*.jar_ in file name. groovyx-failure-strategy What to do with unhandled exceptions. If you want to manage exception by code then keep the default value _rollback_. If _transfer to failure_ selected and unhandled exception occurred then all flowFiles received from incoming queues in this session will be transferred to _failure_ relationship with additional attributes set: ERROR_MESSAGE and ERROR_STACKTRACE. If _rollback_ selected and unhandled exception occurred then all flowFiles received from incoming queues will be penalized and returned. If the processor has no incoming connections then this parameter has no effect. groovyx-script-body Body of script to execute. Only one of Script File or Script Body may be used groovyx-script-file Path to script file to execute. Only one of Script File or Script Body may be used
## State management
Scopes Description LOCAL Scripts can store and retrieve state using the State Management APIs. Consult the State Manager section of the Developer's Guide for more details. CLUSTER Scripts can store and retrieve state using the State Management APIs. Consult the State Manager section of the Developer's Guide for more details.
## Restrictions
Required Permission Explanation execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
## Relationships
Name Description failure FlowFiles that failed to be processed success FlowFiles that were successfully processed
## See also - [org.apache.nifi.processors.script.ExecuteScript](/user-guide/data-integration/openflow/processors/executescript) --- title: ExecuteProcess 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/executeprocess.md section: Loading & Unloading Data --- # ExecuteProcess 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Runs an operating system command specified by the user and writes the output of that command to a FlowFile. If the command is expected to be long-running, the Processor can output the partial data on a specified interval. When this option is used, the output is expected to be in textual format, as it typically does not make sense to split binary data on arbitrary time-based intervals. ## Tags command, external, invoke, process, script, source ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Argument Delimiter Delimiter to use to separate arguments for a command [default: space]. Must be a single character. Batch Duration If the process is expected to be long-running and produce textual output, a batch duration can be specified so that the output will be captured for this amount of time and a FlowFile will then be sent out with the results and a new FlowFile will be started, rather than waiting for the process to finish before sending out the results Command Specifies the command to be executed; if just the name of an executable is provided, it must be in the user's environment PATH. Command Arguments The arguments to supply to the executable delimited by white space. White space can be escaped by enclosing it in double-quotes. Output MIME type Specifies the value to set for the "mime.type" attribute. This property is ignored if 'Batch Duration' is set. Redirect Error Stream If true will redirect any error stream output of the process to the output stream. This is particularly helpful for processes which write extensively to the error stream or for troubleshooting. Working Directory The directory to use as the current working directory when executing the command
## Restrictions
Required Permission Explanation execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
## Relationships
Name Description success All created FlowFiles are routed to this relationship
## Writes attributes
Name Description command Executed command command.arguments Arguments of the command mime.type Sets the MIME type of the output if the 'Output MIME Type' property is set and 'Batch Duration' is not set
--- title: ExecuteScript 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/executescript.md section: Loading & Unloading Data --- # ExecuteScript 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-scripting-nar ## Description Experimental - Executes a script given the flow file and a process session. The script is responsible for handling the incoming flow file (transfer to SUCCESS or remove, e.g.) as well as any flow files created by the script. If the handling is incomplete or incorrect, the session will be rolled back. Experimental: Impact of sustained usage not yet verified. ## Tags clojure, execute, groovy, script ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties true ## Properties
Property Description Module Directory Comma-separated list of paths to files and/or directories which contain modules required by the script. Script Body Body of script to execute. Only one of Script File or Script Body may be used Script Engine Language Engine for executing scripts Script File Path to script file to execute. Only one of Script File or Script Body may be used
## State management
Scopes Description LOCAL Scripts can store and retrieve state using the State Management APIs. Consult the State Manager section of the Developer's Guide for more details. CLUSTER Scripts can store and retrieve state using the State Management APIs. Consult the State Manager section of the Developer's Guide for more details.
## Restrictions
Required Permission Explanation execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
## Relationships
Name Description failure FlowFiles that failed to be processed success FlowFiles that were successfully processed
## See also - [org.apache.nifi.processors.script.InvokeScriptedProcessor](/user-guide/data-integration/openflow/processors/invokescriptedprocessor) --- title: ExecuteSQL 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/executesql.md section: Loading & Unloading Data --- # ExecuteSQL 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Executes provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query, and the query may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args. N.type and sql.args. N.value, where N is a positive integer. The sql.args. N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format. FlowFile attribute 'executesql.row.count' indicates how many rows were selected. ## Tags database, jdbc, query, select, sql ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties true ## Properties
Property Description Content Output Strategy Specifies the strategy for writing FlowFile content when processing input FlowFiles. The strategy applies when handling queries that do not produce results. Database Connection Pooling Service The Controller Service that is used to obtain connection to database Default Decimal Precision When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'precision' denoting number of available digits is required. Generally, precision is defined by column data type definition or database engines default. However undefined precision (0) can be returned from some database engines. 'Default Decimal Precision' is used when writing those undefined precision numbers. Default Decimal Scale When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'scale' denoting number of available decimal digits is required. Generally, scale is defined by column data type definition or database engines default. However when undefined precision (0) is returned, scale can also be uncertain with some database engines. 'Default Decimal Scale' is used when writing those undefined numbers. If a value has more decimals than specified scale, then the value will be rounded-up, e.g. 1.53 becomes 2 with scale 0, and 1.5 with scale 1. Max Wait Time The maximum amount of time allowed for a running SQL select query , zero means there is no limit. Max time less than 1 second will be equal to zero. Normalize Table and Column Names Whether to change non-Avro-compatible characters in column names to Avro-compatible characters. For example, colons and periods will be changed to underscores in order to build a valid Avro record. SQL Query The SQL query to execute. The query can be empty, a constant value, or built from attributes using Expression Language. If this property is specified, it will be used regardless of the content of incoming flowfiles. If this property is empty, the content of the incoming flow file is expected to contain a valid SQL select query, to be issued by the processor to the database. Note that Expression Language is not evaluated for flow file contents. Use Avro Logical Types Whether to use Avro Logical Types for DECIMAL/NUMBER, DATE, TIME and TIMESTAMP columns. If disabled, written as string. If enabled, Logical types are used and written as its underlying type, specifically, DECIMAL/NUMBER as logical 'decimal': written as bytes with additional precision and scale meta data, DATE as logical 'date-millis': written as int denoting days since Unix epoch (1970-01-01), TIME as logical 'time-millis': written as int denoting milliseconds since Unix epoch, and TIMESTAMP as logical 'timestamp-millis': written as long denoting milliseconds since Unix epoch. If a reader of written Avro records also knows these logical types, then these values can be deserialized with more context depending on reader implementation. compression-format Compression type to use when writing Avro files. Default is None. esql-auto-commit Enables or disables the auto commit functionality of the DB connection. Default value is 'true'. The default value can be used with most of the JDBC drivers and this functionality doesn't have any impact in most of the cases since this processor is used to read data. However, for some JDBC drivers such as PostgreSQL driver, it is required to disable the auto committing functionality to limit the number of result rows fetching at a time. When auto commit is enabled, postgreSQL driver loads whole result set to memory at once. This could lead for a large amount of memory usage when executing queries which fetch large data sets. More Details of this behaviour in PostgreSQL driver can be found in [https://jdbc.postgresql.org//documentation/head/query.html](https://jdbc.postgresql.org//documentation/head/query.html). esql-fetch-size The number of result rows to be fetched from the result set at a time. This is a hint to the database driver and may not be honored and/or exact. If the value specified is zero, then the hint is ignored. esql-max-rows The maximum number of result rows that will be included in a single FlowFile. This will allow you to break up very large result sets into multiple FlowFiles. If the value specified is zero, then all rows are returned in a single FlowFile. esql-output-batch-size The number of output FlowFiles to queue before committing the process session. When set to zero, the session will be committed when all result set rows have been processed and the output FlowFiles are ready for transfer to the downstream relationship. For large result sets, this can cause a large burst of FlowFiles to be transferred at the end of processor execution. If this property is set, then when the specified number of FlowFiles are ready for transfer, then the session will be committed, thus releasing the FlowFiles to the downstream relationship. NOTE: The fragment.count attribute will not be set on FlowFiles when this property is set. sql-post-query A semicolon-delimited list of queries executed after the main SQL query is executed. Example like setting session properties after main query. It 's possible to include semicolons in the statements themselves by escaping them with a backslash (';'). Results/outputs from these queries will be suppressed if there are no errors. sql-pre-query A semicolon-delimited list of queries executed before the main SQL query is executed. For example, set session properties before main query. It 's possible to include semicolons in the statements themselves by escaping them with a backslash (';'). Results/outputs from these queries will be suppressed if there are no errors.
## Relationships
Name Description failure SQL query execution failed. Incoming FlowFile will be penalized and routed to this relationship success Successfully created FlowFile from SQL query result set.
## Writes attributes
Name Description executesql.row.count Contains the number of rows returned by the query. If 'Max Rows Per Flow File' is set, then this number will reflect the number of rows in the Flow File instead of the entire result set. executesql.query.duration Combined duration of the query execution time and fetch time in milliseconds. If 'Max Rows Per Flow File' is set, then this number will reflect only the fetch time for the rows in the Flow File instead of the entire result set. executesql.query.executiontime Duration of the query execution time in milliseconds. This number will reflect the query execution time regardless of the 'Max Rows Per Flow File' setting. executesql.query.fetchtime Duration of the result set fetch time in milliseconds. If 'Max Rows Per Flow File' is set, then this number will reflect only the fetch time for the rows in the Flow File instead of the entire result set. executesql.resultset.index Assuming multiple result sets are returned, the zero based index of this result set. executesql.error.message If processing an incoming flow file causes an Exception, the Flow File is routed to failure and this attribute is set to the exception message. fragment.identifier If 'Max Rows Per Flow File' is set then all FlowFiles from the same query result set will have the same value for the fragment.identifier attribute. This can then be used to correlate the results. fragment.count If 'Max Rows Per Flow File' is set then this is the total number of FlowFiles produced by a single ResultSet. This can be used in conjunction with the fragment.identifier attribute in order to know how many FlowFiles belonged to the same incoming ResultSet. If Output Batch Size is set, then this attribute will not be populated. fragment.index If 'Max Rows Per Flow File' is set then the position of this FlowFile in the list of outgoing FlowFiles that were all derived from the same result set FlowFile. This can be used in conjunction with the fragment.identifier attribute to know which FlowFiles originated from the same query result set and in what order FlowFiles were produced input.flowfile.uuid If the processor has an incoming connection, outgoing FlowFiles will have this attribute set to the value of the input FlowFile's UUID. If there is no incoming connection, the attribute will not be added.
--- title: ExecuteSQLRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/executesqlrecord.md section: Loading & Unloading Data --- # ExecuteSQLRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Executes provided SQL select query. Query result will be converted to the format specified by a Record Writer. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query, and the query may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args. N.type and sql.args. N.value, where N is a positive integer. The sql.args. N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format. FlowFile attribute 'executesql.row.count' indicates how many rows were selected. ## Tags database, jdbc, query, record, select, sql ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties true ## Properties
Property Description Database Connection Pooling Service The Controller Service that is used to obtain connection to database Default Decimal Precision When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'precision' denoting number of available digits is required. Generally, precision is defined by column data type definition or database engines default. However undefined precision (0) can be returned from some database engines. 'Default Decimal Precision' is used when writing those undefined precision numbers. Default Decimal Scale When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'scale' denoting number of available decimal digits is required. Generally, scale is defined by column data type definition or database engines default. However when undefined precision (0) is returned, scale can also be uncertain with some database engines. 'Default Decimal Scale' is used when writing those undefined numbers. If a value has more decimals than specified scale, then the value will be rounded-up, e.g. 1.53 becomes 2 with scale 0, and 1.5 with scale 1. Max Wait Time The maximum amount of time allowed for a running SQL select query , zero means there is no limit. Max time less than 1 second will be equal to zero. SQL Query The SQL query to execute. The query can be empty, a constant value, or built from attributes using Expression Language. If this property is specified, it will be used regardless of the content of incoming flowfiles. If this property is empty, the content of the incoming flow file is expected to contain a valid SQL select query, to be issued by the processor to the database. Note that Expression Language is not evaluated for flow file contents. Use Avro Logical Types Whether to use Avro Logical Types for DECIMAL/NUMBER, DATE, TIME and TIMESTAMP columns. If disabled, written as string. If enabled, Logical types are used and written as its underlying type, specifically, DECIMAL/NUMBER as logical 'decimal': written as bytes with additional precision and scale meta data, DATE as logical 'date-millis': written as int denoting days since Unix epoch (1970-01-01), TIME as logical 'time-millis': written as int denoting milliseconds since Unix epoch, and TIMESTAMP as logical 'timestamp-millis': written as long denoting milliseconds since Unix epoch. If a reader of written Avro records also knows these logical types, then these values can be deserialized with more context depending on reader implementation. esql-auto-commit Enables or disables the auto commit functionality of the DB connection. Default value is 'true'. The default value can be used with most of the JDBC drivers and this functionality doesn't have any impact in most of the cases since this processor is used to read data. However, for some JDBC drivers such as PostgreSQL driver, it is required to disable the auto committing functionality to limit the number of result rows fetching at a time. When auto commit is enabled, postgreSQL driver loads whole result set to memory at once. This could lead for a large amount of memory usage when executing queries which fetch large data sets. More Details of this behaviour in PostgreSQL driver can be found in [https://jdbc.postgresql.org//documentation/head/query.html](https://jdbc.postgresql.org//documentation/head/query.html). esql-fetch-size The number of result rows to be fetched from the result set at a time. This is a hint to the database driver and may not be honored and/or exact. If the value specified is zero, then the hint is ignored. esql-max-rows The maximum number of result rows that will be included in a single FlowFile. This will allow you to break up very large result sets into multiple FlowFiles. If the value specified is zero, then all rows are returned in a single FlowFile. esql-output-batch-size The number of output FlowFiles to queue before committing the process session. When set to zero, the session will be committed when all result set rows have been processed and the output FlowFiles are ready for transfer to the downstream relationship. For large result sets, this can cause a large burst of FlowFiles to be transferred at the end of processor execution. If this property is set, then when the specified number of FlowFiles are ready for transfer, then the session will be committed, thus releasing the FlowFiles to the downstream relationship. NOTE: The fragment.count attribute will not be set on FlowFiles when this property is set. esqlrecord-normalize Whether to change characters in column names. For example, colons and periods will be changed to underscores. esqlrecord-record-writer Specifies the Controller Service to use for writing results to a FlowFile. The Record Writer may use Inherit Schema to emulate the inferred schema behavior, i.e. an explicit schema need not be defined in the writer, and will be supplied by the same logic used to infer the schema from the column types. sql-post-query A semicolon-delimited list of queries executed after the main SQL query is executed. Example like setting session properties after main query. It 's possible to include semicolons in the statements themselves by escaping them with a backslash (';'). Results/outputs from these queries will be suppressed if there are no errors. sql-pre-query A semicolon-delimited list of queries executed before the main SQL query is executed. For example, set session properties before main query. It 's possible to include semicolons in the statements themselves by escaping them with a backslash (';'). Results/outputs from these queries will be suppressed if there are no errors.
## Relationships
Name Description failure SQL query execution failed. Incoming FlowFile will be penalized and routed to this relationship success Successfully created FlowFile from SQL query result set.
## Writes attributes
Name Description executesql.row.count Contains the number of rows returned in the select query executesql.query.duration Combined duration of the query execution time and fetch time in milliseconds executesql.query.executiontime Duration of the query execution time in milliseconds executesql.query.fetchtime Duration of the result set fetch time in milliseconds executesql.resultset.index Assuming multiple result sets are returned, the zero based index of this result set. executesql.error.message If processing an incoming flow file causes an Exception, the Flow File is routed to failure and this attribute is set to the exception message. fragment.identifier If 'Max Rows Per Flow File' is set then all FlowFiles from the same query result set will have the same value for the fragment.identifier attribute. This can then be used to correlate the results. fragment.count If 'Max Rows Per Flow File' is set then this is the total number of FlowFiles produced by a single ResultSet. This can be used in conjunction with the fragment.identifier attribute in order to know how many FlowFiles belonged to the same incoming ResultSet. If Output Batch Size is set, then this attribute will not be populated. fragment.index If 'Max Rows Per Flow File' is set then the position of this FlowFile in the list of outgoing FlowFiles that were all derived from the same result set FlowFile. This can be used in conjunction with the fragment.identifier attribute to know which FlowFiles originated from the same query result set and in what order FlowFiles were produced input.flowfile.uuid If the processor has an incoming connection, outgoing FlowFiles will have this attribute set to the value of the input FlowFile's UUID. If there is no incoming connection, the attribute will not be added. mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer. record.count The number of records output by the Record Writer.
--- title: ExecuteSQLStatement 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/executesqlstatement.md section: Loading & Unloading Data --- # ExecuteSQLStatement 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-database-processors-nar ## Description Executes a SQL DDL or DML Statement against a database. This Processor allows Expression Language to be evaluated against FlowFile attributes in order to parameterize the SQL for each FlowFile. ## Tags database, delete, insert, jdbc, openflow, sql, update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Connection Pooling Service The Connection Pooling Service that is used to obtain a connection to the database Max Batch Size The maximum number of FlowFiles to process in a single batch Max Content Reference Size If the SQL property references $\{flowfile_content\}, this property specifies the maximum size of the FlowFile that is allowed to be read into memory. If the FlowFile is larger than this value, the FlowFile will be routed to failure. If the SQL property does not reference $\{flowfile_content\}, this value has no effect. SQL The SQL statement to execute. The SQL may make use of Expression Language to reference attributes. In this case, the Processor will rewrite the query using parameters in order to avoid SQL Injection attacks. When referencing Expression Language, the entire value must be a single Expression. For example, _INSERT INTO TABLE X (name) VALUES ( '$\{name\}')_ is valid, but _INSERT INTO TABLE X (name) VALUES ( 'Mr. $\{name\}')_ is not because Expression Language is used within a String value. The SQL may also reference _$\{flowfile_content\}_ in order to reference the content of the FlowFile as UTF-8 encoded text.
## Relationships
Name Description failure The SQL statement could not be executed success The SQL statement was successfully executed
--- title: ExecuteStreamCommand 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/executestreamcommand.md section: Loading & Unloading Data --- # ExecuteStreamCommand 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description The ExecuteStreamCommand processor provides a flexible way to integrate external commands and scripts into NiFi data flows. ExecuteStreamCommand can pass the incoming FlowFile's content to the command that it executes similarly how piping works. ## Tags command, command execution, execute, stream ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties true ## Properties
Property Description Argument Delimiter Delimiter to use to separate arguments for a command [default: ;]. Must be a single character Command Arguments The arguments to supply to the executable delimited by the ';' character. Command Path Specifies the command to be executed; if just the name of an executable is provided, it must be in the user's environment PATH. Ignore STDIN If true, the contents of the incoming flowfile will not be passed to the executing command Max Attribute Length If routing the output of the stream command to an attribute, the number of characters put to the attribute value will be at most this amount. This is important because attributes are held in memory and large attributes will quickly cause out of memory issues. If the output goes longer than this value, it will truncated to fit. Consider making this smaller if able. Output Destination Attribute If set, the output of the stream command will be put into an attribute of the original FlowFile instead of a separate FlowFile. There will no longer be a relationship for 'output stream' or 'nonzero status'. The value of this property will be the key for the output attribute. Output MIME Type Specifies the value to set for the "mime.type" attribute. This property is ignored if 'Output Destination Attribute' is set. Working Directory The directory to use as the current working directory when executing the command argumentsStrategy Strategy for configuring arguments to be supplied to the command.
## Restrictions
Required Permission Explanation execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
## Relationships
Name Description nonzero status The destination path for the flow file created from the command's output, if the returned status code is non-zero. All flow files routed to this relationship will be penalized. original The original FlowFile will be routed. It will have new attributes detailing the result of the script execution. output stream The destination path for the flow file created from the command's output, if the returned status code is zero.
## Writes attributes
Name Description execution.command The name of the command executed execution.command.args The semi-colon delimited list of arguments. Sensitive properties will be masked execution.status The exit status code returned from executing the command execution.error Any error messages returned from executing the command mime.type Sets the MIME type of the output if the 'Output MIME Type' property is set and 'Output Destination Attribute' is not set
--- title: Explore Data Products from SAP® BDC Connect for Snowflake source: https://docs.snowflake.com/en/user-guide/data-integration/zero-copy/sap-sql/explore-data-products.md section: Loading & Unloading Data --- # Explore Data Products from %sapbdc% - [](/user-guide/data-integration/zero-copy/about-sap-snowflake) - [](/user-guide/data-integration/zero-copy/sap-sql/setup) - [](/user-guide/data-integration/zero-copy/sap-sql/security) This topic describes how to use a Zerocopy Connector to list available SAP® data products, create catalog-linked databases, and query the shared data in Snowflake. The connector must be in `CONNECTED` state before performing any of the steps in this topic. ## In SAP® BDC, choose data products to share with Snowflake To search for and share data products with Snowflake, users must use the central SAP Business Data Cloud catalog and have a global role that grants them the following privileges: - BDC Data Packages (read) - To access SAP Business Data Cloud. - Catalog Asset (read) - To access the catalog and view objects in the Assets and Data Products collections. - Cloud Data Product (share) - To share data products to target systems. Users with these privileges can share data products from the SAP Business Data Cloud catalog with the desired SAP Snowflake account to make them available for consumption to specific roles in that account. To share data products with Snowflake: 1. In the central SAP Business Data Cloud catalog, select data products to share with an SAP Snowflake account 2. From **Catalog & Marketplace**, search for (or use filters) to find the data products to be shared 3. From the search results, select **Share** in the data product to be shared (for example, customer) to open the **Manage Share Access** dialog 4. In the **Overview** section, learn more about the data product by reviewing its details and available objects. 5. Under **Target System**: 1. Choose the Snowflake account with the enrolled Zerocopy Connector to share with (if there is more than one). 2. Select **Update**. A message confirms the share process has started. After it finishes, a notification shows the result. ## In Snowflake, list shared data products To list the data products that SAP® BDC has shared with your Snowflake account, call the `SYSTEM$ZEROCOPY_CONNECTOR_LIST_SHARES` function: ```sql SELECT SYSTEM$ZEROCOPY_CONNECTOR_LIST_SHARES('my_db.my_schema.my_sap_connector'); ``` The function returns a JSON array. Each element represents one shared data product: ```text [ { "name": "usid:b077d21c-b7a2-479a-a20e-bba1dbe91034:ns:sap.s4pce:r:SalesOrder:v:1", "id": "25c0de58-6e61-4bcc-ba68-c2c15b7a2d4b", "display_name": "Sales Order (BDF730, sap.s4pce:apiResource:SalesOrder:v1)", "comment": "An agreement between a vendor and a customer to provide products on a specific date.", "status": "MOUNTED", "catalog_linked_databases": [ { "name": "SALES_ORDER_CLD" } ], "properties": { "sap.ord.apiResource.ordId": "sap.s4pce:apiResource:SalesOrder:v1", "sap.ord.systemInstance.name": "BDF730", "sap.ord.systemInstance.id": "30f962e7-791c-41d7-9e72-1534823e8b21" } } ] ``` To filter and search more easily, parse the JSON output into a tabular format using `PARSE_JSON` and `LATERAL FLATTEN`: ```sql WITH raw AS ( SELECT PARSE_JSON( SYSTEM$ZEROCOPY_CONNECTOR_LIST_SHARES('my_db.my_schema.my_sap_connector') ) AS json_data ) SELECT f.value:name::STRING AS name, f.value:id::STRING AS id, f.value:display_name::STRING AS display_name, f.value:comment::STRING AS comment, f.value:properties['sap.ord.apiResource.ordId']::STRING AS api_resource_ord_id, f.value:properties['sap.ord.systemInstance.name']::STRING AS system_instance_name, f.value:properties['sap.ord.systemInstance.id']::STRING AS system_instance_id FROM raw, LATERAL FLATTEN(INPUT => json_data) f; ``` ## Create a Catalog-Linked Database To mount a shared SAP® data product in Snowflake, create a catalog-linked database using the `LINKED_ZEROCOPY_CONNECTOR` clause. The role requires `CREATE DATABASE` on the account and `USAGE` on the connector. The owner of the catalog-linked database can be different from the owner of the connector. ```sql CREATE DATABASE my_sales_order LINKED_ZEROCOPY_CONNECTOR = ( CONNECTOR_NAME = 'my_db.my_schema.my_sap_connector', SHARE_NAME = 'usid:b077d21c-b7a2-479a-a20e-bba1dbe91034:ns:sap.s4pce:r:SalesOrder:v:1', SYNC_INTERVAL_SECONDS = 86400 ); ``` When a catalog-linked database is created, a read-only schema named `snowflake$` is automatically created within it. This schema contains [Semantic Views](/user-guide/views-semantic/overview) generated from the SAP® Core Schema Notation (CSN). Semantic Views add business meaning to the incoming shared data by defining metrics, entities, and relationships — enabling consistent business definitions and powering AI capabilities such as [Cortex Analyst](/user-guide/snowflake-cortex/cortex-analyst) directly on top of the SAP® data in Snowflake. Use `SYNC_INTERVAL_SECONDS` to control how frequently Snowflake automatically discovers schema and table changes from the shared data product. The value can range from 30 to 86400 seconds (1 day). The default value for SAP® BDC is 86400 seconds. You can create multiple catalog-linked databases from the same connector, one per data product shared from SAP® BDC. To confirm the database was created, use [](/sql-reference/sql/show-databases): ```sql SHOW DATABASES LIKE 'MY_SALES_ORDER%'; ``` ## Explore the Data List the schemas and tables available in the catalog-linked database: ```sql SHOW SCHEMAS IN DATABASE my_sales_order; SHOW TABLES IN DATABASE my_sales_order; ``` Query the data: ```sql SELECT * FROM my_sales_order.salesorder.salesorder LIMIT 100; ``` You can join tables across multiple catalog-linked databases. For example, to find the top customers by revenue using data from two shared data products: ```sql SELECT s.salesorder, s.soldtoparty, c.customername, c.country, s.totalnetamount FROM my_sales_order.salesorder.salesorder s JOIN my_customers.customer.customer c ON s.soldtoparty = c.customer WHERE s.overallsdprocessingstatus != 'C' ORDER BY s.totalnetamount DESC LIMIT 10; ``` ## Create Table As Select (CTAS) To persist query results as a native Snowflake table, use CREATE TABLE AS SELECT (CTAS). Create a new database to hold the results: ```sql CREATE DATABASE IF NOT EXISTS my_ctas_db; USE DATABASE my_ctas_db; CREATE OR REPLACE TABLE top_customers_by_revenue AS SELECT c.customer, c.customername, c.country, c.region, c.businesstype, COUNT(DISTINCT s.salesorder) AS num_orders, SUM(s.totalnetamount) AS total_revenue, AVG(s.totalnetamount) AS avg_order_amount FROM my_customers.customer.customer c JOIN my_sales_order.salesorder.salesorder s ON c.customer = s.soldtoparty WHERE c.deletionindicator = FALSE GROUP BY 1, 2, 3, 4, 5; -- Query the result table SELECT * FROM top_customers_by_revenue LIMIT 10; ``` ## Drop a Catalog-Linked Database All catalog-linked databases must be dropped before you can disconnect or drop the connector. Catalog-linked databases do not support `UNDROP`. ```sql DROP DATABASE my_sales_order; ``` --- title: ExternalHazelcastCacheManager source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/externalhazelcastcachemanager.md section: Loading & Unloading Data --- # ExternalHazelcastCacheManager This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A service that provides cache instances backed by Hazelcast running outside of NiFi. ## Tags cache, hazelcast ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Hazelcast Cluster Name * hazelcast-cluster-name nifi Name of the Hazelcast cluster. Hazelcast Connection Timeout * hazelcast-connection-timeout 20 secs The maximum amount of time the client tries to connect or reconnect before giving up. Hazelcast Initial Backoff * hazelcast-retry-backoff-initial 1 secs The amount of time the client waits before it tries to reestablish connection for the first time. Hazelcast Maximum Backoff * hazelcast-retry-backoff-maximum 5 secs The maximum amount of time the client waits before it tries to reestablish connection. Hazelcast Backoff Multiplier * hazelcast-retry-backoff-multiplier 1.5 A multiplier by which the wait time is increased before each attempt to reestablish connection. Hazelcast Server Address * hazelcast-server-address Addresses of one or more the Hazelcast instances, using \{host:port\} format, separated by comma.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: ExtractAvroMetadata 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/extractavrometadata.md section: Loading & Unloading Data --- # ExtractAvroMetadata 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-avro-nar ## Description Extracts metadata from the header of an Avro datafile. ## Tags avro, metadata, schema ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Count Items If true the number of items in the datafile will be counted and stored in a FlowFile attribute 'item.count'. The counting is done by reading blocks and getting the number of items for each block, thus avoiding de-serializing. The items being counted will be the top-level items in the datafile. For example, with a schema of type record the items will be the records, and for a schema of type Array the items will be the arrays (not the number of entries in each array). Fingerprint Algorithm The algorithm used to generate the schema fingerprint. Available choices are based on the Avro recommended practices for fingerprint generation. Metadata Keys A comma-separated list of keys indicating key/value pairs to extract from the Avro file header. The key 'avro.schema' can be used to extract the full schema in JSON format, and 'avro.codec' can be used to extract the codec name if one exists.
## Relationships
Name Description failure A FlowFile is routed to this relationship if it cannot be parsed as Avro or metadata cannot be extracted for any reason success A FlowFile is routed to this relationship after metadata has been extracted.
## Writes attributes
Name Description schema.type The type of the schema (i.e. record, enum, etc.). schema.name Contains the name when the type is a record, enum or fixed, otherwise contains the name of the primitive type. schema.fingerprint The result of the Fingerprint Algorithm as a Hex string. item.count The total number of items in the datafile, only written if Count Items is set to true.
--- title: ExtractEmailAttachments 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/extractemailattachments.md section: Loading & Unloading Data --- # ExtractEmailAttachments 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-email-nar ## Description Extract attachments from a mime formatted email file, splitting them into individual flowfiles. ## Tags email, split ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Relationships
Name Description attachments Each individual attachment will be routed to the attachments relationship failure FlowFiles that could not be parsed original The original file
## Writes attributes
Name Description filename The filename of the attachment email.attachment.parent.filename The filename of the parent FlowFile email.attachment.parent.uuid The UUID of the original FlowFile. mime.type The mime type of the attachment.
--- title: ExtractEmailHeaders 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/extractemailheaders.md section: Loading & Unloading Data --- # ExtractEmailHeaders 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-email-nar ## Description Using the flowfile content as source of data, extract header from an RFC compliant email file adding the relevant attributes to the flowfile. This processor does not perform extensive RFC validation but still requires a bare minimum compliance with RFC 2822 ## Tags email, split ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Additional Header List COLON separated list of additional headers to be extracted from the flowfile content. NOTE the header key is case insensitive and will be matched as lower-case. Values will respect email contents. Email Address Parsing If "strict", strict address format parsing rules are applied to mailbox and mailbox list fields, such as "to" and "from" headers, and FlowFiles with poorly formed addresses will be routed to the failure relationship, similar to messages that fail RFC compliant format validation. If "non-strict", the processor will extract the contents of mailbox list headers as comma-separated values without attempting to parse each value as well-formed Internet mailbox addresses. This is optional and defaults to Strict Address Parsing
## Relationships
Name Description failure Flowfiles that could not be parsed as a RFC-2822 compliant message success Extraction was successful
## Writes attributes
Name Description email.headers.bcc.* Each individual BCC recipient (if available) email.headers.cc.* Each individual CC recipient (if available) email.headers.from.* Each individual mailbox contained in the From of the Email (array as per RFC-2822) email.headers.message-id The value of the Message-ID header (if available) email.headers.received_date The Received-Date of the message (if available) email.headers.sent_date Date the message was sent email.headers.subject Subject of the message (if available) email.headers.to.* Each individual TO recipient (if available) email.attachment_count Number of attachments of the message
--- title: ExtractGrok 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/extractgrok.md section: Loading & Unloading Data --- # ExtractGrok 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Evaluates one or more Grok Expressions against the content of a FlowFile, adding the results as attributes or replacing the content of the FlowFile with a JSON notation of the matched content ## Tags delimit, extract, grok, log, parse, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Character Set The Character Set in which the file is encoded Destination Control if Grok output value is written as a new flowfile attributes, in this case each of the Grok identifier that is matched in the flowfile will be added as an attribute, prefixed with "grok." or written in the flowfile content. Writing to flowfile content will overwrite any existing flowfile content. Grok Expression Grok expression. If other Grok expressions are referenced in this expression, they must be provided in the Grok Pattern File if set or exist in the default Grok patterns Grok Pattern file Custom Grok pattern definitions. These definitions will be loaded after the default Grok patterns. The Grok Parser will use the default Grok patterns when this property is not configured. Keep Empty Captures If true, then empty capture values will be included in the returned capture map. Maximum Buffer Size Specifies the maximum amount of data to buffer (per file) in order to apply the Grok expressions. Files larger than the specified maximum will not be fully evaluated. Named captures only Only store named captures from grok
## Restrictions
Required Permission Explanation reference remote resources Patterns can reference resources over HTTP
## Relationships
Name Description matched FlowFiles are routed to this relationship when the Grok Expression is successfully evaluated and the FlowFile is modified as a result unmatched FlowFiles are routed to this relationship when no provided Grok Expression matches the content of the FlowFile
## Writes attributes
Name Description grok.XXX When operating in flowfile-attribute mode, each of the Grok identifier that is matched in the flowfile will be added as an attribute, prefixed with "grok." For example,if the grok identifier "timestamp" is matched, then the value will be added to an attribute named "grok.timestamp"
--- title: ExtractRecordSchema 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/extractrecordschema.md section: Loading & Unloading Data --- # ExtractRecordSchema 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Extracts the record schema from the FlowFile using the supplied Record Reader and writes it to the *avro.schema* attribute. ## Tags avro, csv, freeform, generic, json, record, schema, text, xml ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description cache-size Specifies the number of schemas to cache. This value should reflect the expected number of different schemas that may be in the incoming FlowFiles. This ensures more efficient retrieval of the schemas and thus the processor performance. record-reader Specifies the Controller Service to use for reading incoming data
## Relationships
Name Description failure If a FlowFile's record schema cannot be extracted from the configured input format, the FlowFile will be routed to this relationship success FlowFiles whose record schemas are successfully extracted will be routed to this relationship
## Writes attributes
Name Description record.error.message This attribute provides on failure the error message encountered by the Reader. avro.schema This attribute provides the schema extracted from the input FlowFile using the provided RecordReader.
--- title: ExtractSchemaColumns 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/extractschemacolumns.md section: Loading & Unloading Data --- # ExtractSchemaColumns 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-record-schema-nar ## Description Extracts the record schema columns from the FlowFile using the supplied Record Reader and writes it to the *schema.columns* attribute. ## Tags avro, csv, freeform, generic, json, record, schema, text, xml ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description End Column Index Specifies index of the column in schema to which columns should be taken. Record Reader Specifies the Controller Service to use for reading incoming data Start Column Index Specifies index of the column (numbered from 1) in schema from which columns should be taken.
## Relationships
Name Description failure If a FlowFile's record schema cannot be extracted from the configured input format, the FlowFile will be routed to this relationship success FlowFiles whose record schemas are successfully extracted will be routed to this relationship
## Writes attributes
Name Description record.error.message This attribute provides on failure the error message encountered by the Reader. schema.columns This attribute provides columns extracted from the input FlowFile using the provided RecordReader.
--- title: ExtractStructuredBoxFileMetadata 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/extractstructuredboxfilemetadata.md section: Loading & Unloading Data --- # ExtractStructuredBoxFileMetadata 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Extracts metadata from a Box file using Box AI. The extraction can use either a template or a list of fields. The extracted metadata is written to the FlowFile content as JSON. ## Tags ai, box, extract, metadata, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Box Client Service Controller Service used to obtain a Box API connection. Extraction Method The method to use for extracting metadata. TEMPLATE uses a Box metadata template for extraction. FIELDS uses a JSON schema of fields (read from FlowFile content) for extraction. File ID The ID of the file from which to extract metadata. Record Reader The Record Reader to use for parsing the incoming data. Required when Extraction Method is FIELDS. Template Key The key of the metadata template to use for extraction. Required when Extraction Method is TEMPLATE.
## Relationships
Name Description failure A FlowFile is routed to this relationship if an error occurs during metadata extraction. file not found FlowFiles for which the specified Box file was not found will be routed to this relationship. success A FlowFile is routed to this relationship after metadata has been successfully extracted. template not found FlowFiles for which the specified metadata template was not found will be routed to this relationship.
## Writes attributes
Name Description box.id The ID of the file from which metadata was extracted box.ai.template.key The template key used for extraction (when using TEMPLATE extraction method) box.ai.extraction.method The extraction method used (TEMPLATE or FIELDS) box.ai.completion.reason The completion reason from the AI extraction mime.type Set to 'application/json' for the JSON content error.code The error code returned by Box error.message The error message returned by Box
## See also - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) - [org.apache.nifi.processors.box.ListBoxFileMetadataTemplates](/user-guide/data-integration/openflow/processors/listboxfilemetadatatemplates) - [org.apache.nifi.processors.box.UpdateBoxFileMetadataInstance](/user-guide/data-integration/openflow/processors/updateboxfilemetadatainstance) --- title: ExtractText 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/extracttext.md section: Loading & Unloading Data --- # ExtractText 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Evaluates one or more Regular Expressions against the content of a FlowFile. The results of those Regular Expressions are assigned to FlowFile Attributes. Regular Expressions are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed. The attributes are generated differently based on the enabling of named capture groups. If named capture groups are not enabled: The first capture group, if any found, will be placed into that attribute name. But all capture groups, including the matching string sequence itself will also be provided at that attribute name with an index value provided, with the exception of a capturing group that is optional and does not match - for example, given the attribute name "regex" and expression "abc(def)?(g)" we would add an attribute "regex.1" with a value of "def" if the "def" matched. If the "def" did not match, no attribute named "regex.1" would be added but an attribute named "regex.2" with a value of "g" will be added regardless. If named capture groups are enabled: Each named capture group, if found will be placed into the attributes name with the name provided. If enabled the matching string sequence itself will be placed into the attribute name. If multiple matches are enabled, and index will be applied after the first set of matches. The exception is a capturing group that is optional and does not match For example, given the attribute name "regex" and expression "abc(?<NAMED>def)?(?<NAMED-TWO>g)" we would add an attribute "regex. NAMED" with the value of "def" if the "def" matched. We would add an attribute "regex. NAMED-TWO" with the value of "g" if the "g" matched regardless. The value of the property must be a valid Regular Expressions with one or more capturing groups. If named capture groups are enabled, all capture groups must be named. If they are not, then the processor configuration will fail validation. If the Regular Expression matches more than once, only the first match will be used unless the property enabling repeating capture group is set to true. If any provided Regular Expression matches, the FlowFile(s) will be routed to 'matched'. If no provided Regular Expression matches, the FlowFile will be routed to 'unmatched' and no attributes will be applied to the FlowFile. ## Tags Regular Expression, Text, evaluate, extract, regex ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Character Set The Character Set in which the file is encoded Enable Canonical Equivalence Indicates that two characters match only when their full canonical decompositions match. Enable Case-insensitive Matching Indicates that two characters match even if they are in a different case. Can also be specified via the embedded flag (?i). Enable DOTALL Mode Indicates that the expression '.' should match any character, including a line terminator. Can also be specified via the embedded flag (?s). Enable Literal Parsing of the Pattern Indicates that Metacharacters and escape characters should be given no special meaning. Enable Multiline Mode Indicates that '^' and '$' should match just after and just before a line terminator or end of sequence, instead of only the beginning or end of the entire input. Can also be specified via the embeded flag (?m). Enable Unicode Predefined Character Classes Specifies conformance with the Unicode Technical Standard #18: Unicode Regular Expression Annex C: Compatibility Properties. Can also be specified via the embedded flag (?U). Enable Unicode-aware Case Folding When used with 'Enable Case-insensitive Matching', matches in a manner consistent with the Unicode Standard. Can also be specified via the embedded flag (?u). Enable Unix Lines Mode Indicates that only the 'line terminator is recognized in the behavior of'. ','^ ', and'$'. Can also be specified via the embedded flag (?d). Enable named group support If set to true, when named groups are present in the regular expression, the name of the group will be used in the attribute name as opposed to the group index. All capturing groups must be named, if the number of groups (not including capture group 0) does not equal the number of named groups validation will fail. Enable repeating capture group If set to true, every string matching the capture groups will be extracted. Otherwise, if the Regular Expression matches more than once, only the first match will be extracted. Include Capture Group 0 Indicates that Capture Group 0 should be included as an attribute. Capture Group 0 represents the entirety of the regular expression match, is typically not used, and could have considerable length. Maximum Buffer Size Specifies the maximum amount of data to buffer (per FlowFile) in order to apply the regular expressions. FlowFiles larger than the specified maximum will not be fully evaluated. Maximum Capture Group Length Specifies the maximum number of characters a given capture group value can have. Any characters beyond the max will be truncated. Permit Whitespace and Comments in Pattern In this mode, whitespace is ignored, and embedded comments starting with # are ignored until the end of a line. Can also be specified via the embedded flag (?x).
## Relationships
Name Description matched FlowFiles are routed to this relationship when the Regular Expression is successfully evaluated and the FlowFile is modified as a result unmatched FlowFiles are routed to this relationship when no provided Regular Expression matches the content of the FlowFile
--- title: FetchAzureBlobStorage_v12 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchazureblobstorage_v12.md section: Loading & Unloading Data --- # FetchAzureBlobStorage_v12 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Retrieves the specified blob from Azure Blob Storage and writes its content to the content of the FlowFile. The processor uses Azure Blob Storage client library v12. ## Tags azure, blob, cloud, microsoft, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Blob Name The full name of the blob Client-Side Encryption Key ID Specifies the ID of the key to use for client-side encryption. Client-Side Encryption Key Type Specifies the key type to use for client-side encryption. Client-Side Encryption Local Key When using local client-side encryption, this is the raw key, encoded in hexadecimal Container Name Name of the Azure storage container. In case of PutAzureBlobStorage processor, container can be created if it does not exist. Range Length The number of bytes to download from the blob, starting from the Range Start. An empty value or a value that extends beyond the end of the blob will read to the end of the blob. Range Start The byte position at which to start reading from the blob. An empty value or a value of zero will start reading at the beginning of the blob. Storage Credentials Controller Service used to obtain Azure Blob Storage Credentials. proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
## Relationships
Name Description failure Unsuccessful operations will be transferred to the failure relationship. success All successfully processed FlowFiles are routed to this relationship
## Writes attributes
Name Description azure.container The name of the Azure Blob Storage container azure.blobname The name of the blob on Azure Blob Storage azure.primaryUri Primary location of the blob azure.etag ETag of the blob azure.blobtype Type of the blob (either BlockBlob, PageBlob or AppendBlob) mime.type MIME Type of the content lang Language code for the content azure.timestamp Timestamp of the blob azure.length Length of the blob
## Use Cases Involving Other Components | Retrieve all files in an Azure Blob Storage container | | ----------------------------------------------------- | ## See also - [org.apache.nifi.processors.azure.storage.DeleteAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/deleteazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.ListAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/listazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.PutAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/putazureblobstorage_v12) --- title: FetchAzureDataLakeStorage 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchazuredatalakestorage.md section: Loading & Unloading Data --- # FetchAzureDataLakeStorage 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Fetch the specified file from Azure Data Lake Storage ## Tags adlsgen2, azure, cloud, datalake, microsoft, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description ADLS Credentials Controller Service used to obtain Azure Credentials. Directory Name Name of the Azure Storage Directory. The Directory Name cannot contain a leading '/'. The root directory can be designated by the empty string value. In case of the PutAzureDataLakeStorage processor, the directory will be created if not already existing. File Name The filename Filesystem Name Name of the Azure Storage File System (also called Container). It is assumed to be already existing. Number of Retries The number of automatic retries to perform if the download fails. Range Length The number of bytes to download from the object, starting from the Range Start. An empty value or a value that extends beyond the end of the object will read to the end of the object. Range Start The byte position at which to start reading from the object. An empty value or a value of zero will start reading at the beginning of the object. proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
## Relationships
Name Description failure Files that could not be written to Azure storage for some reason are transferred to this relationship success Files that have been successfully written to Azure storage are transferred to this relationship
## Writes attributes
Name Description azure.datalake.storage.statusCode The HTTP error code (if available) from the failed operation azure.datalake.storage.errorCode The Azure Data Lake Storage moniker of the failed operation azure.datalake.storage.errorMessage The Azure Data Lake Storage error message from the failed operation
## Use Cases Involving Other Components | Retrieve all files in an Azure DataLake Storage directory | | --------------------------------------------------------- | ## See also - [org.apache.nifi.processors.azure.storage.DeleteAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/deleteazuredatalakestorage) - [org.apache.nifi.processors.azure.storage.ListAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/listazuredatalakestorage) - [org.apache.nifi.processors.azure.storage.PutAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/putazuredatalakestorage) --- title: FetchBoxFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchboxfile.md section: Loading & Unloading Data --- # FetchBoxFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Fetches files from a Box Folder. Designed to be used in tandem with ListBoxFile. ## Tags box, fetch, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Box Client Service Controller Service used to obtain a Box API connection. File ID The ID of the File to fetch
## Relationships
Name Description failure A FlowFile will be routed here for each File for which fetch was attempted but failed. success A FlowFile will be routed here for each successfully fetched File.
## Writes attributes
Name Description box.id The id of the file filename The name of the file path The folder path where the file is located box.size The size of the file box.timestamp The last modified time of the file error.code The error code returned by Box error.message The error message returned by Box
## See also - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) - [org.apache.nifi.processors.box.PutBoxFile](/user-guide/data-integration/openflow/processors/putboxfile) --- title: FetchBoxFileInfo 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchboxfileinfo.md section: Loading & Unloading Data --- # FetchBoxFileInfo 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Fetches metadata for files from Box and adds it to the FlowFile's attributes. ## Tags box, fetch, metadata, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Box Client Service Controller Service used to obtain a Box API connection. File ID The ID of the File to fetch metadata for
## Relationships
Name Description failure A FlowFile will be routed here if fetching the file metadata fails. not.found FlowFiles for which the specified Box file was not found. success A FlowFile will be routed here after successfully fetching the file metadata.
## Writes attributes
Name Description box.id The id of the file filename The name of the file path The folder path where the file is located box.path.folder.ids A comma separated list of file path_collection IDs box.size The size of the file box.timestamp The last modified time of the file box.created.at The creation date of the file box.owner The name of the file owner box.owner.id The ID of the file owner box.owner.login The login of the file owner box.description The description of the file box.etag The etag of the file box.sha1 The SHA-1 hash of the file box.content.created.at The date the content was created box.content.modified.at The date the content was modified box.item.status The status of the file (active, trashed, etc.) box.sequence_id The sequence ID of the file box.parent.folder.id The ID of the parent folder box.trashed.at The date the file was trashed, if applicable box.purged.at The date the file was purged, if applicable box.shared.link The shared link of the file, if any error.code The error code returned by Box error.message The error message returned by Box
## See also - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) - [org.apache.nifi.processors.box.PutBoxFile](/user-guide/data-integration/openflow/processors/putboxfile) --- title: FetchBoxFileMetadataInstance 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchboxfilemetadatainstance.md section: Loading & Unloading Data --- # FetchBoxFileMetadataInstance 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Retrieves specific metadata instance associated with a Box file using template key and scope. ## Tags box, instance, metadata, storage, template ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Box Client Service Controller Service used to obtain a Box API connection. File ID The ID of the file for which to fetch metadata. Template Key The metadata template key to retrieve. Template Scope The metadata template scope (e.g., 'enterprise', 'global').
## Relationships
Name Description failure A FlowFile will be routed here if there is an error fetching metadata instance from the file. file not found FlowFiles for which the specified Box file was not found will be routed to this relationship. success A FlowFile containing the metadata instance will be routed to this relationship upon successful processing. template not found FlowFiles for which the specified metadata template was not found will be routed to this relationship.
## Writes attributes
Name Description box.id The ID of the file from which metadata was fetched box.metadata.template.key The metadata template key box.metadata.template.scope The metadata template scope mime.type The MIME Type of the FlowFile content error.code The error code returned by Box error.message The error message returned by Box
## See also - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.FetchBoxFileInfo](/user-guide/data-integration/openflow/processors/fetchboxfileinfo) - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) - [org.apache.nifi.processors.box.ListBoxFileMetadataInstances](/user-guide/data-integration/openflow/processors/listboxfilemetadatainstances) --- title: FetchBoxFileRepresentation 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchboxfilerepresentation.md section: Loading & Unloading Data --- # FetchBoxFileRepresentation 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Fetches a Box file representation using a representation hint and writes it to the FlowFile content. ## Tags box, cloud, content, download, file, representation, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Box Client Service Controller Service used to obtain a Box API connection. File ID The ID of the Box file to retrieve. Representation Type The type of representation to fetch. Common values include 'pdf', 'text', 'jpg', 'png', etc.
## Relationships
Name Description failure FlowFiles that encounter errors during processing will be routed to this relationship. file.not.found FlowFiles for which the specified Box file was not found. representation.not.found FlowFiles for which the specified Box file's requested representation was not found. success FlowFiles that are successfully processed will be routed to this relationship.
## Writes attributes
Name Description box.id The ID of the Box file. box.file.name The name of the Box file. box.file.size The size of the Box file in bytes. box.file.created.time The timestamp when the file was created. box.file.modified.time The timestamp when the file was last modified. box.file.mime.type The MIME type of the file. box.file.representation.type The representation type that was fetched. box.error.message The error message returned by Box if the operation fails. box.error.code The error code returned by Box if the operation fails.
## See also - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) --- title: FetchDistributedMapCache 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchdistributedmapcache.md section: Loading & Unloading Data --- # FetchDistributedMapCache 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Computes cache key(s) from FlowFile attributes, for each incoming FlowFile, and fetches the value(s) from the Distributed Map Cache associated with each key. If configured without a destination attribute, the incoming FlowFile 's content is replaced with the binary data received by the Distributed Map Cache. If there is no value stored under that key then the flow file will be routed to' not-found '. Note that the processor will always attempt to read the entire cached value into memory before placing it in it's destination. This could be potentially problematic if the cached value is very large. ## Tags cache, distributed, fetch, map ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Cache Entry Identifier A comma-delimited list of FlowFile attributes, or the results of Attribute Expression Language statements, which will be evaluated against a FlowFile in order to determine the value(s) used to identify duplicates; it is these values that are cached. NOTE: Only a single Cache Entry Identifier is allowed unless Put Cache Value In Attribute is specified. Multiple cache lookups are only supported when the destination is a set of attributes (see the documentation for 'Put Cache Value In Attribute' for more details including naming convention. Character Set The Character Set in which the cached value is encoded. This will only be used when routing to an attribute. Distributed Cache Service The Controller Service that is used to get the cached values. Max Length To Put In Attribute If routing the cache value to an attribute of the FlowFile (by setting the "Put Cache Value in attribute" property), the number of characters put to the attribute value will be at most this amount. This is important because attributes are held in memory and large attributes will quickly cause out of memory issues. If the output goes longer than this value, it will be truncated to fit. Consider making this smaller if able. Put Cache Value In Attribute If set, the cache value received will be put into an attribute of the FlowFile instead of a the content of theFlowFile. The attribute key to put to is determined by evaluating value of this property. If multiple Cache Entry Identifiers are selected, multiple attributes will be written, using the evaluated value of this property, appended by a period (.) and the name of the cache entry identifier.
## Relationships
Name Description failure If unable to communicate with the cache or if the cache entry is evaluated to be blank, the FlowFile will be penalized and routed to this relationship not-found If a FlowFile's Cache Entry Identifier was not found in the cache, it will be routed to this relationship success If the cache was successfully communicated with it will be routed to this relationship
## Writes attributes
Name Description user-defined If the 'Put Cache Value In Attribute' property is set then whatever it is set to will become the attribute key and the value would be whatever the response was from the Distributed Map Cache. If multiple cache entry identifiers are selected, multiple attributes will be written, using the evaluated value of this property, appended by a period (.) and the name of the cache entry identifier. For example, if the Cache Entry Identifier property is set to 'id,name', and the user-defined property is named 'fetched', then two attributes will be written, fetched.id and fetched.name, containing their respective values.
## See also - [org.apache.nifi.processors.standard.PutDistributedMapCache](/user-guide/data-integration/openflow/processors/putdistributedmapcache) --- title: FetchDropbox 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchdropbox.md section: Loading & Unloading Data --- # FetchDropbox 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-dropbox-processors-nar ## Description Fetches files from Dropbox. Designed to be used in tandem with ListDropbox. ## Tags dropbox, fetch, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Dropbox Credential Service Controller Service used to obtain Dropbox credentials (App Key, App Secret, Access Token, Refresh Token). See controller service's Additional Details for more information. File The Dropbox identifier or path of the Dropbox file to fetch. The 'File'should match the following regular expression pattern: /.*|id:.* . When ListDropbox is used for input, either '$\{dropbox.id\}' (identifying files by Dropbox id) or '$\{path\}/$\{filename\}' (identifying files by path) can be used as 'File' value. proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure A FlowFile will be routed here for each File for which fetch was attempted but failed. success A FlowFile will be routed here for each successfully fetched File.
## Writes attributes
Name Description error.message The error message returned by Dropbox dropbox.id The Dropbox identifier of the file path The folder path where the file is located filename The name of the file dropbox.size The size of the file dropbox.timestamp The server modified time of the file dropbox.revision Revision of the file
## See also - [org.apache.nifi.processors.dropbox.ListDropbox](/user-guide/data-integration/openflow/processors/listdropbox) - [org.apache.nifi.processors.dropbox.PutDropbox](/user-guide/data-integration/openflow/processors/putdropbox) --- title: FetchFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchfile.md section: Loading & Unloading Data --- # FetchFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Reads the contents of a file from disk and streams it into the contents of an incoming FlowFile. Once this is done, the file is optionally moved elsewhere or deleted to help keep the file system organized. ## Tags fetch, files, filesystem, get, ingest, ingress, input, local, source ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Completion Strategy Specifies what to do with the original file on the file system once it has been pulled into NiFi File to Fetch The fully-qualified filename of the file to fetch from the file system Log level when file not found Log level to use in case the file does not exist when the processor is triggered Log level when permission denied Log level to use if the current application user does not have sufficient permissions to read the file Move Conflict Strategy If Completion Strategy is set to Move File and a file already exists in the destination directory with the same name, this property specifies how that naming conflict should be resolved Move Destination Directory The directory to the move the original file to once it has been fetched from the file system. This property is ignored unless the Completion Strategy is set to "Move File". If the directory does not exist, it will be created.
## Restrictions
Required Permission Explanation read filesystem Provides operator the ability to read from any file that NiFi has access to. write filesystem Provides operator the ability to delete any file that NiFi has access to.
## Relationships
Name Description failure Any FlowFile that could not be fetched from the file system for any reason other than insufficient permissions or the file not existing will be transferred to this Relationship. not.found Any FlowFile that could not be fetched from the file system because the file could not be found will be transferred to this Relationship. permission.denied Any FlowFile that could not be fetched from the file system due to the user running NiFi not having sufficient permissions will be transferred to this Relationship. success Any FlowFile that is successfully fetched from the file system will be transferred to this Relationship.
## Use Cases Involving Other Components | Ingest all files from a directory into NiFi | | ----------------------------------------------------------------------- | | Ingest specific files from a directory into NiFi, filtering on filename | ## See also - [org.apache.nifi.processors.standard.GetFile](/user-guide/data-integration/openflow/processors/getfile) - [org.apache.nifi.processors.standard.ListFile](/user-guide/data-integration/openflow/processors/listfile) - [org.apache.nifi.processors.standard.PutFile](/user-guide/data-integration/openflow/processors/putfile) --- title: FetchFTP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchftp.md section: Loading & Unloading Data --- # FetchFTP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Fetches the content of a file from a remote FTP server and overwrites the contents of an incoming FlowFile with the content of the remote file. ## Tags fetch, files, ftp, get, ingest, input, remote, retrieve, source ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Completion Strategy Specifies what to do with the original file on the server once it has been pulled into NiFi. If the Completion Strategy fails, a warning will be logged but the data will still be transferred. Connection Mode The FTP Connection Mode Connection Timeout Amount of time to wait before timing out while creating a connection Create Directory Used when 'Completion Strategy' is 'Move File'. Specifies whether or not the remote directory should be created if it does not exist. Data Timeout When transferring a file between the local and remote system, this value specifies how long is allowed to elapse without any data being transferred between systems Hostname The fully-qualified hostname or IP address of the host to fetch the data from Internal Buffer Size Set the internal buffer size for buffered data streams Log Level When File Not Found Log level to use in case the file does not exist when the processor is triggered Move Destination Directory The directory on the remote server to move the original file to once it has been ingested into NiFi. This property is ignored unless the Completion Strategy is set to 'Move File'. The specified directory must already exist on the remote system if 'Create Directory' is disabled, or the rename will fail. Password Password for the user account Port The port to connect to on the remote host to fetch the data from Remote File The fully qualified filename on the remote system Transfer Mode The FTP Transfer Mode Use Compression Indicates whether or not ZLIB compression should be used when transferring files Username Username ftp-use-utf8 Tells the client to use UTF-8 encoding when processing files and filenames. If set to true, the server must also support UTF-8 encoding. proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description comms.failure Any FlowFile that could not be fetched from the remote server due to a communications failure will be transferred to this Relationship. not.found Any FlowFile for which we receive a 'Not Found' message from the remote server will be transferred to this Relationship. permission.denied Any FlowFile that could not be fetched from the remote server due to insufficient permissions will be transferred to this Relationship. success All FlowFiles that are received are routed to success
## Writes attributes
Name Description ftp.remote.host The hostname or IP address from which the file was pulled ftp.remote.port The port that was used to communicate with the remote FTP server ftp.remote.filename The name of the remote file that was pulled filename The filename is updated to point to the filename fo the remote file path If the Remote File contains a directory name, that directory name will be added to the FlowFile using the 'path' attribute fetch.failure.reason The name of the failure relationship applied when routing to any failure relationship
## Use Cases Involving Other Components | Retrieve all files in a directory of an FTP Server | | -------------------------------------------------- | ## See also - [org.apache.nifi.processors.standard.GetFTP](/user-guide/data-integration/openflow/processors/getftp) - [org.apache.nifi.processors.standard.GetSFTP](/user-guide/data-integration/openflow/processors/getsftp) - [org.apache.nifi.processors.standard.PutFTP](/user-guide/data-integration/openflow/processors/putftp) - [org.apache.nifi.processors.standard.PutSFTP](/user-guide/data-integration/openflow/processors/putsftp) --- title: FetchGCSObject 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchgcsobject.md section: Loading & Unloading Data --- # FetchGCSObject 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Fetches a file from a Google Cloud Bucket. Designed to be used in tandem with ListGCSBucket. ## Tags fetch, gcs, google, google cloud, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description GCP Credentials Provider Service The Controller Service used to obtain Google Cloud Platform credentials. gcp-project-id Google Cloud Project ID gcp-retry-count How many retry attempts should be made before routing to the failure relationship. gcs-bucket Bucket of the object. gcs-generation The generation of the Object to download. If not set, the latest generation will be downloaded. gcs-key Name of the object. gcs-object-range-length The number of bytes to download from the object, starting from the Range Start. An empty value or a value that extends beyond the end of the object will read to the end of the object. gcs-object-range-start The byte position at which to start reading from the object. An empty value or a value of zero will start reading at the beginning of the object. gcs-server-side-encryption-key An AES256 Key (encoded in base64) which the object has been encrypted in. proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. storage-api-url Overrides the default storage URL. Configuring an alternative Storage API URL also overrides the HTTP Host header on requests as described in the Google documentation for Private Service Connections.
## Relationships
Name Description failure FlowFiles are routed to this relationship if the Google Cloud Storage operation fails. success FlowFiles are routed to this relationship after a successful Google Cloud Storage operation.
## Writes attributes
Name Description filename The name of the file, parsed if possible from the Content-Disposition response header gcs.bucket Bucket of the object. gcs.key Name of the object. gcs.size Size of the object. gcs.cache.control Data cache control of the object. gcs.component.count The number of components which make up the object. gcs.content.disposition The data content disposition of the object. gcs.content.encoding The content encoding of the object. gcs.content.language The content language of the object. mime.type The MIME/Content-Type of the object gcs.crc32c The CRC32C checksum of object's data, encoded in base64 in big-endian order. gcs.create.time The creation time of the object (milliseconds) gcs.update.time The last modification time of the object (milliseconds) gcs.encryption.algorithm The algorithm used to encrypt the object. gcs.encryption.sha256 The SHA256 hash of the key used to encrypt the object gcs.etag The HTTP 1.1 Entity tag for the object. gcs.generated.id The service-generated for the object gcs.generation The data generation of the object. gcs.md5 The MD5 hash of the object's data encoded in base64. gcs.media.link The media download link to the object. gcs.metageneration The metageneration of the object. gcs.owner The owner (uploader) of the object. gcs.owner.type The ACL entity type of the uploader of the object. gcs.acl.owner A comma-delimited list of ACL entities that have owner access to the object. Entities will be either email addresses, domains, or project IDs. gcs.acl.writer A comma-delimited list of ACL entities that have write access to the object. Entities will be either email addresses, domains, or project IDs. gcs.acl.reader A comma-delimited list of ACL entities that have read access to the object. Entities will be either email addresses, domains, or project IDs. gcs.uri The URI of the object as a string.
## Use Cases Involving Other Components | Retrieve all files in a Google Compute Storage (GCS) bucket | | ----------------------------------------------------------- | ## See also - [org.apache.nifi.processors.gcp.storage.DeleteGCSObject](/user-guide/data-integration/openflow/processors/deletegcsobject) - [org.apache.nifi.processors.gcp.storage.ListGCSBucket](/user-guide/data-integration/openflow/processors/listgcsbucket) - [org.apache.nifi.processors.gcp.storage.PutGCSObject](/user-guide/data-integration/openflow/processors/putgcsobject) --- title: FetchGoogleDrive 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchgoogledrive.md section: Loading & Unloading Data --- # FetchGoogleDrive 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Fetches files from a Google Drive Folder. Designed to be used in tandem with ListGoogleDrive. Please see Additional Details to set up access to Google Drive. ## Tags drive, fetch, google, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Google Doc Export Type Google Documents cannot be downloaded directly from Google Drive but instead must be exported to a specified MIME Type. In the event that the incoming FlowFile's MIME Type indicates that the file is a Google Document, this property specifies the MIME Type to export the document to. Google Drawing Export Type Google Drawings cannot be downloaded directly from Google Drive but instead must be exported to a specified MIME Type. In the event that the incoming FlowFile's MIME Type indicates that the file is a Google Drawing, this property specifies the MIME Type to export the drawing to. Google Presentation Export Type Google Presentations cannot be downloaded directly from Google Drive but instead must be exported to a specified MIME Type. In the event that the incoming FlowFile's MIME Type indicates that the file is a Google Presentation, this property specifies the MIME Type to export the presentation to. Google Spreadsheet Export Type Google Spreadsheets cannot be downloaded directly from Google Drive but instead must be exported to a specified MIME Type. In the event that the incoming FlowFile's MIME Type indicates that the file is a Google Spreadsheet, this property specifies the MIME Type to export the spreadsheet to. connect-timeout Maximum wait time for connection to Google Drive service. drive-file-id The Drive ID of the File to fetch. Please see Additional Details for information on how to obtain the Drive ID. gcp-credentials-provider-service The Controller Service used to obtain Google Cloud Platform credentials. proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. read-timeout Maximum wait time for response from Google Drive service.
## Relationships
Name Description failure A FlowFile will be routed here for each File for which fetch was attempted but failed. success A FlowFile will be routed here for each successfully fetched File.
## Writes attributes
Name Description drive.id The id of the file filename The name of the file mime.type The MIME type of the file drive.size The size of the file. Set to 0 when the file size is not available (e.g. externally stored files). drive.size.available Indicates if the file size is known / available drive.timestamp The last modified time or created time (whichever is greater) of the file. The reason for this is that the original modified date of a file is preserved when uploaded to Google Drive. 'Created time' takes the time when the upload occurs. However uploaded files can still be modified later. drive.created.time The file's creation time drive.modified.time The file's last modification time drive.owner The owner of the file drive.last.modifying.user The last modifying user of the file drive.web.view.link Web view link to the file drive.web.content.link Web content link to the file drive.parent.folder.id The id of the file's parent folder drive.parent.folder.name The name of the file's parent folder drive.shared.drive.id The id of the shared drive (if the file is located on a shared drive) drive.shared.drive.name The name of the shared drive (if the file is located on a shared drive) error.code The error code returned by Google Drive error.message The error message returned by Google Drive
## Use Cases Involving Other Components | Retrieve all files in a Google Drive folder | | ------------------------------------------- | ## See also - [org.apache.nifi.processors.gcp.drive.ListGoogleDrive](/user-guide/data-integration/openflow/processors/listgoogledrive) - [org.apache.nifi.processors.gcp.drive.PutGoogleDrive](/user-guide/data-integration/openflow/processors/putgoogledrive) --- title: FetchGoogleDriveFileComments 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchgoogledrivefilecomments.md section: Loading & Unloading Data --- # FetchGoogleDriveFileComments 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-google-drive-nar ## Description Fetches comments and their replies for a Google Drive file. The file ID can be set by a FlowFile attribute. Records include comment metadata such as deleted status, resolved status, anchors, and a nested array of replies. ## Tags comments, drive, gcp, google, openflow, replies ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description File ID Google Drive file ID. GCP Credentials Service Controller Service used to obtain Google Cloud Platform credentials. Record Writer Specifies the Record Writer to use when writing the comments.
## Relationships
Name Description failure FlowFiles are routed here if the processor fails to retrieve comments. not.found A FlowFile is routed here if the file was not found. retry FlowFiles are routed here if a connection or rate-limit issue occurs. success All FlowFiles that are successfully processed are routed here.
## Writes attributes
Name Description record.count Number of comment records returned (not including replies). google.drive.file.id The file ID from which comments were fetched.
--- title: FetchGoogleDriveMetadata 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchgoogledrivemetadata.md section: Loading & Unloading Data --- # FetchGoogleDriveMetadata 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-google-drive-nar ## Description Fetches Google Drive file metadata. This includes the file's name, size, MIME type, and permissions. The file ID must be provided as a FlowFile attribute. ## Tags authorization, cloud, drive, gcp, google, openflow, permissions, storage, unstructured ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description File ID An id of an file to retrieve the metadata for GCP Credentials Service The Controller Service used to obtain Google Cloud Platform credentials.
## Relationships
Name Description failure A FlowFile is routed here if the processor fails to retrieve Google Drive file metadata. not.found A FlowFile is routed here if the file metadata was not found retry A FlowFile is routed here if the processor should retry the request (e.g., after rate limiting). success A FlowFile is routed here after successfully retrieving Google Drive file metadata.
## Writes attributes
Name Description google.drive.drive.id The ID of the Shared Google Drive. google.drive.file.name The name of the file. google.drive.created.time The timestamp when the file was created, in milliseconds since the Unix epoch. google.drive.modified.time The timestamp when the file was modified, in milliseconds since the Unix epoch. google.drive.size The size of the file in bytes. google.drive.md5 The MD5 checksum of the file. google.drive.mime.type The MIME type of the file. google.drive.version The version of the file. This changes based on user and system based updates to the file. google.drive.webUrl A link for opening the file in a relevant Google editor or viewer in a browser. google.drive.lastModifiedBy.displayName A display name of the user that modified the file. google.drive.lastModifiedBy.email An email of the user that modified the file. google.drive.permissions.<role>.users A comma-separated list of email addresses for users with the specified role. Valid roles are 'owner', 'organizer', 'fileOrganizer', 'writer', 'commenter', 'reader'. For example, if the owner is [john.doe@gmail.com](mailto:john.doe@gmail.com) and users [jane.doe@gmail.com](mailto:jane.doe@gmail.com) and [jake.doe@gmail.com](mailto:jake.doe@gmail.com) are readers, there would be an attribute named _google.drive.permissions.owner.users_ with the value _john.doe@gmail.com_, and an attribute named _google.drive.permissions.reader.users_ with the value _jane.doe@gmail.com, jake.doe@gmail.com_ google.drive.permissions.<role>.groups A comma-separated list of email addresses for groups with the specified role. Valid roles are 'owner', 'organizer', 'fileOrganizer', 'writer', 'commenter', 'reader'. For example, if the owner is _employees@openflow-all-dev.iam.gserviceaccount.com_ and the group _contractors@openflow-all-dev.iam.gserviceaccount.com_ is a reader, there would be an attribute named _google.drive.permissions.owner.groups_ with the value _employees@openflow-all-dev.iam.gserviceaccount.com_, and an attribute named _google.drive.permissions.reader.groups_ with the value _contractors@openflow-all-dev.iam.gserviceaccount.com_ google.drive.permissions.<role>.domains A comma-separated list of domain names for which all users have the given role. Valid roles are 'owner', 'organizer', 'fileOrganizer', 'writer', 'commenter', 'reader'. For example, if all users in the domain _snowflake.com_ have the role of reader, there would be an attribute named _google.drive.permissions.reader.domains_ with the value _snowflake.com_ google.drive.permissions.<role>.public If a file is shared publicly, this attribute will be added with a value of 'true' for any role that applies to the public. google.drive.file.path The hierarchical path of the file in Google Drive, e.g. 'parent_folder/child_folder/file.txt'.
## See also - [com.snowflake.openflow.runtime.processors.google.CaptureGoogleDriveChanges](/user-guide/data-integration/openflow/processors/capturegoogledrivechanges) --- title: FetchGridFS 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchgridfs.md section: Loading & Unloading Data --- # FetchGridFS 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-mongodb-nar ## Description Retrieves one or more files from a GridFS bucket by file name or by a user-defined query. ## Tags fetch, gridfs, mongo ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description gridfs-bucket-name The GridFS bucket where the files will be stored. If left blank, it will use the default value 'fs' that the MongoDB client driver uses. gridfs-client-service The MongoDB client service to use for database connections. gridfs-database-name The name of the database to use gridfs-file-name The name of the file in the bucket that is the target of this processor. gridfs-query A valid MongoDB query to use to fetch one or more files from GridFS. mongo-operation-mode This option controls when results are made available to downstream processors. If Stream Query Results is enabled, provenance will not be tracked relative to the input flowfile if an input flowfile is received and starts the query. In Stream Query Results mode errors will be handled by sending a new flowfile with the original content and attributes of the input flowfile to the failure relationship. Streaming should only be used if there is reliable connectivity between MongoDB and NiFi. mongo-query-attribute If set, the query will be written to a specified attribute on the output flowfiles.
## Relationships
Name Description failure When there is a failure processing the flowfile, it goes to this relationship. original The original input flowfile goes to this relationship if the query does not cause an error success When the operation succeeds, the flowfile is sent to this relationship.
## Writes attributes
Name Description gridfs.file.metadata The custom metadata stored with a file is attached to this property if it exists.
--- title: FetchJiraFields 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchjirafields.md section: Loading & Unloading Data --- # FetchJiraFields 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-atlassian-processors-nar ## Description Retrieves comprehensive metadata for all fields available in the Jira Cloud instance using the REST API v3 /field endpoint. For each field, returns detailed information including field ID/key, display name, field properties, JQL clause names for queries, and schema details with data types. ## Tags api, atlassian, fetch, jira, rest ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description API Token Jira API token for authorization Authorization Method Authorization method for Jira Cloud API Environment URL URL to the Atlassian Jira Environment Issue Fields A list of fields to return for each issue. This property accepts a comma-separated list. Jira Email Email address associated with Jira account Request Rate Manager Controller service for keeping track of rate limits for Atlassian APIs Web Client Service Controller service for managing HTTP connections to Jira
## Relationships
Name Description failure Failed to fetch Jira fields, e.g., due to connection issues or invalid credentials retry Retryable failure occurred, e.g. rate limiting success Successfully fetched Jira fields
## Writes attributes
Name Description mime.type The MIME type of the returned response, always set to 'application/json'
## See also - [com.snowflake.openflow.runtime.atlassian.jira.processors.FetchJiraIssues](/user-guide/data-integration/openflow/processors/fetchjiraissues) --- title: FetchJiraIssues 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchjiraissues.md section: Loading & Unloading Data --- # FetchJiraIssues 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-atlassian-processors-nar ## Description Fetches issues from Jira Cloud using REST API v3 with configurable search options. Provides two search modes: 1. Simple Search - Filter by project name, status category, created/updated dates 2. Advanced Search - Use custom JQL (Jira Query Language) expressions Key features: - Smart pagination handling with automatic state management - Incremental sync capability using timestamps between processor runs - Timezone-aware date handling using Jira user's timezone - Configurable issue fields retrieval - Adds metadata to FlowFiles: source URL (jira.source.url), query (jira.query.jql), statement type (statement.type) - Adds insert,upsert attributes for downstream processing The processor maintains cluster state to resume operations after restarts Authentication is handled via basic auth using Jira email/API token credentials. Currently that is the only supported method. LIMITATIONS: - Jira issue deletes are not detected. ## Tags api, atlassian, fetch, jira, rest ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description API Token Jira API token for authorization Authorization Method Authorization method for Jira Cloud API Created After Filter issues created after specified date/time (optional, format: yyyy-MM-dd) Environment URL URL to the Atlassian Jira Environment Issue Fields A list of fields to return for each issue. This property accepts a comma-separated list. JQL Query JQL query string (required when using JQL query type) Jira Email Email address associated with Jira account Maximum Page Size The Maximum Page Size value must be between 50 and 1000 Project Names Comma-separated list of project names for simple search Request Rate Manager Controller service for keeping track of rate limits for Atlassian APIs Search Type Type of search to perform Status Category Status category filter for simple search (optional) Updated After Filter issues updated after specified date/time (optional, format: yyyy-MM-dd) Web Client Service Controller service for managing HTTP connections to Jira
## State management
Scopes Description CLUSTER Stores pagination state to maintain position between restarts. Resets when ingestion configuration changes.
## Relationships
Name Description retry Retryable failure occurred, e.g. rate limiting success Successfully fetched Jira issues
## Writes attributes
Name Description mime.type application/json jira.query.jql The JQL query used for this fetch jira.source.url URL of the Jira source statement.type Statement type INSERT, UPSERT
## See also - [com.snowflake.openflow.runtime.atlassian.jira.processors.FetchJiraFields](/user-guide/data-integration/openflow/processors/fetchjirafields) --- title: FetchMicrosoftDataverseTable 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchmicrosoftdataversetable.md section: Loading & Unloading Data --- # FetchMicrosoftDataverseTable 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-dataverse-processors-nar ## Description Fetch records from Microsoft Dataverse Tables ## Tags dataverse ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Delete Schema Environment URL URL to Microsoft Dataverse Environment Logical Name Logical Name of Dataverse Table Max Page Size Defines how many records will be fetched from Dataverse at once OAuth2 Access Token Provider Enables managed retrieval of OAuth2 Bearer Token. Record Writer Specifies the Controller Service to use for writing out the records Rows Number Limit Defines maximum number of rows returned in a single flow file. Multiple request will be made to API to reach the limit. When not set, a page size value will be used effectively. Table Name Dataverse Table Name Upsert Schema Web Client Service Provider Creates instance of web client.
## State management
Scopes Description CLUSTER status
## Relationships
Name Description failure FlowFile with errors occurred while fetching from Dataverse. retry FlowFile with maintainable errors occurred while fetching from Dataverse. success FlowFile with fetched data stored as records.
--- title: FetchS3Object 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchs3object.md section: Loading & Unloading Data --- # FetchS3Object 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Retrieves the contents of an S3 Object and writes it to the content of a FlowFile ## Tags AWS, Amazon, Fetch, Get, S3 ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider Bucket The S3 Bucket to interact with Communications Timeout The amount of time to wait in order to establish a connection to AWS or receive data from AWS before timing out. Custom Signer Class Name Fully qualified class name of the custom signer class. The signer must implement com.amazonaws.auth. Signer interface. Custom Signer Module Location Comma-separated list of paths to files and/or directories which contain the custom signer's JAR file and its dependencies (if any). Encryption Service Specifies the Encryption Service Controller used to configure requests. PutS3Object: For backward compatibility, this value is ignored when 'Server Side Encryption' is set. FetchS3Object: Only needs to be configured in case of Server-side Customer Key, Client-side KMS and Client-side Customer Key encryptions. Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. Object Key The S3 Object Key to use. This is analogous to a filename for traditional file systems. Range Length The number of bytes to download from the object, starting from the Range Start. An empty value or a value that extends beyond the end of the object will read to the end of the object. Range Start The byte position at which to start reading from the object. An empty value or a value of zero will start reading at the beginning of the object. Region The AWS Region to connect to. Requester Pays If true, indicates that the requester consents to pay any charges associated with retrieving objects from the S3 bucket. This sets the 'x-amz-request-payer' header to 'requester'. SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections Signer Override The AWS S3 library uses Signature Version 4 by default but this property allows you to specify the Version 2 signer to support older S3-compatible services or even to plug in your own custom signer implementation. Version The Version of the Object to download proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure If the Processor is unable to process a given FlowFile, it will be routed to this Relationship. success FlowFiles are routed to this Relationship after they have been successfully processed.
## Writes attributes
Name Description s3.url The URL that can be used to access the S3 object s3.bucket The name of the S3 bucket path The path of the file absolute.path The path of the file filename The name of the file hash.value The MD5 sum of the file hash.algorithm MD5 mime.type If S3 provides the content type/MIME type, this attribute will hold that file s3.etag The ETag that can be used to see if the file has changed s3.exception The class name of the exception thrown during processor execution s3.additionalDetails The S3 supplied detail from the failed operation s3.statusCode The HTTP error code (if available) from the failed operation s3.errorCode The S3 moniker of the failed operation s3.errorMessage The S3 exception message from the failed operation s3.expirationTime If the file has an expiration date, this attribute will be set, containing the milliseconds since epoch in UTC time s3.expirationTimeRuleId The ID of the rule that dictates this object's expiration time s3.sseAlgorithm The server side encryption algorithm of the object s3.version The version of the S3 object s3.encryptionStrategy The name of the encryption strategy that was used to store the S3 object (if it is encrypted)
## Use cases | Fetch a specific file from S3 | | ----------------------------- | ## Use Cases Involving Other Components | Retrieve all files in an S3 bucket | | ------------------------------------------------------------- | | Retrieve only files from S3 that meet some specified criteria | | Retrieve new files as they arrive in an S3 bucket | ## See also - [org.apache.nifi.processors.aws.s3.CopyS3Object](/user-guide/data-integration/openflow/processors/copys3object) - [org.apache.nifi.processors.aws.s3.DeleteS3Object](/user-guide/data-integration/openflow/processors/deletes3object) - [org.apache.nifi.processors.aws.s3.GetS3ObjectMetadata](/user-guide/data-integration/openflow/processors/gets3objectmetadata) - [org.apache.nifi.processors.aws.s3.GetS3ObjectTags](/user-guide/data-integration/openflow/processors/gets3objecttags) - [org.apache.nifi.processors.aws.s3.ListS3](/user-guide/data-integration/openflow/processors/lists3) - [org.apache.nifi.processors.aws.s3.PutS3Object](/user-guide/data-integration/openflow/processors/puts3object) - [org.apache.nifi.processors.aws.s3.TagS3Object](/user-guide/data-integration/openflow/processors/tags3object) --- title: FetchSFTP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchsftp.md section: Loading & Unloading Data --- # FetchSFTP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Fetches the content of a file from a remote SFTP server and overwrites the contents of an incoming FlowFile with the content of the remote file. ## Tags fetch, files, get, ingest, input, remote, retrieve, sftp, source ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Algorithm Negotiation Configuration strategy for SSH algorithm negotiation Ciphers Allowed A comma-separated list of Ciphers allowed for SFTP connections. Leave unset to allow all. Available options are: 3des-cbc, aes128-cbc, aes128-ctr, [aes128-gcm@openssh.com](mailto:aes128-gcm@openssh.com), aes192-cbc, aes192-ctr, aes256-cbc, aes256-ctr, [aes256-gcm@openssh.com](mailto:aes256-gcm@openssh.com), arcfour128, arcfour256, blowfish-cbc, [chacha20-poly1305@openssh.com](mailto:chacha20-poly1305@openssh.com), none Completion Strategy Specifies what to do with the original file on the server once it has been pulled into NiFi. If the Completion Strategy fails, a warning will be logged but the data will still be transferred. Connection Timeout Amount of time to wait before timing out while creating a connection Create Directory Used when 'Completion Strategy' is 'Move File'. Specifies whether or not the remote directory should be created if it does not exist. Data Timeout When transferring a file between the local and remote system, this value specifies how long is allowed to elapse without any data being transferred between systems Disable Directory Listing Control how 'Move Destination Directory' is created when 'Completion Strategy' is 'Move File' and 'Create Directory' is enabled. If set to 'true', directory listing is not performed prior to create missing directories. By default, this processor executes a directory listing command to see target directory existence before creating missing directories. However, there are situations that you might need to disable the directory listing such as the following. Directory listing might fail with some permission setups (e.g. chmod 100) on a directory. Also, if any other SFTP client created the directory after this processor performed a listing and before a directory creation request by this processor is finished, then an error is returned because the directory already exists. Host Key File If supplied, the given file will be used as the Host Key; otherwise, if 'Strict Host Key Checking' property is applied (set to true) then uses the 'known_hosts' and 'known_hosts2' files from ~/.ssh directory else no host key file will be used Hostname The fully-qualified hostname or IP address of the host to fetch the data from Key Algorithms Allowed A comma-separated list of Key Algorithms allowed for SFTP connections. Leave unset to allow all. Available options are: ecdsa-sha2-nistp256, [ecdsa-sha2-nistp256-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp256-cert-v01@openssh.com), ecdsa-sha2-nistp384, [ecdsa-sha2-nistp384-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp384-cert-v01@openssh.com), ecdsa-sha2-nistp521, [ecdsa-sha2-nistp521-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp521-cert-v01@openssh.com), rsa-sha2-256, [rsa-sha2-256-cert-v01@openssh.com](mailto:rsa-sha2-256-cert-v01@openssh.com), rsa-sha2-512, [rsa-sha2-512-cert-v01@openssh.com](mailto:rsa-sha2-512-cert-v01@openssh.com), [sk-ecdsa-sha2-nistp256@openssh.com](mailto:sk-ecdsa-sha2-nistp256@openssh.com), [sk-ssh-ed25519@openssh.com](mailto:sk-ssh-ed25519@openssh.com), ssh-dss, [ssh-dss-cert-v01@openssh.com](mailto:ssh-dss-cert-v01@openssh.com), ssh-ed25519, [ssh-ed25519-cert-v01@openssh.com](mailto:ssh-ed25519-cert-v01@openssh.com), ssh-rsa, [ssh-rsa-cert-v01@openssh.com](mailto:ssh-rsa-cert-v01@openssh.com) Key Exchange Algorithms Allowed A comma-separated list of Key Exchange Algorithms allowed for SFTP connections. Leave unset to allow all. Available options are: curve25519-sha256, [curve25519-sha256@libssh.org](mailto:curve25519-sha256@libssh.org), curve448-sha512, diffie-hellman-group-exchange-sha1, diffie-hellman-group-exchange-sha256, diffie-hellman-group1-sha1, diffie-hellman-group14-sha1, diffie-hellman-group14-sha256, diffie-hellman-group15-sha512, diffie-hellman-group16-sha512, diffie-hellman-group17-sha512, diffie-hellman-group18-sha512, ecdh-sha2-nistp256, ecdh-sha2-nistp384, ecdh-sha2-nistp521, mlkem1024nistp384-sha384, mlkem768nistp256-sha256, mlkem768x25519-sha256, sntrup761x25519-sha512, [sntrup761x25519-sha512@openssh.com](mailto:sntrup761x25519-sha512@openssh.com) Log Level When File Not Found Log level to use in case the file does not exist when the processor is triggered Message Authentication Codes Allowed A comma-separated list of Message Authentication Codes allowed for SFTP connections. Leave unset to allow all. Available options are: hmac-md5, hmac-md5-96, hmac-sha1, hmac-sha1-96, [hmac-sha1-etm@openssh.com](mailto:hmac-sha1-etm@openssh.com), hmac-sha2-256, [hmac-sha2-256-etm@openssh.com](mailto:hmac-sha2-256-etm@openssh.com), hmac-sha2-512, [hmac-sha2-512-etm@openssh.com](mailto:hmac-sha2-512-etm@openssh.com) Move Destination Directory The directory on the remote server to move the original file to once it has been ingested into NiFi. This property is ignored unless the Completion Strategy is set to 'Move File'. The specified directory must already exist on the remote system if 'Create Directory' is disabled, or the rename will fail. Password Password for the user account Port The port to connect to on the remote host to fetch the data from Private Key Passphrase Password for the private key Private Key Path The fully qualified path to the Private Key file Remote File The fully qualified filename on the remote system Send Keep Alive On Timeout Send a Keep Alive message every 5 seconds up to 5 times for an overall timeout of 25 seconds. Strict Host Key Checking Indicates whether or not strict enforcement of hosts keys should be applied Use Compression Indicates whether or not ZLIB compression should be used when transferring files Username Username proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description comms.failure Any FlowFile that could not be fetched from the remote server due to a communications failure will be transferred to this Relationship. not.found Any FlowFile for which we receive a 'Not Found' message from the remote server will be transferred to this Relationship. permission.denied Any FlowFile that could not be fetched from the remote server due to insufficient permissions will be transferred to this Relationship. success All FlowFiles that are received are routed to success
## Writes attributes
Name Description sftp.remote.host The hostname or IP address from which the file was pulled sftp.remote.port The port that was used to communicate with the remote SFTP server sftp.remote.filename The name of the remote file that was pulled filename The filename is updated to point to the filename fo the remote file path If the Remote File contains a directory name, that directory name will be added to the FlowFile using the 'path' attribute fetch.failure.reason The name of the failure relationship applied when routing to any failure relationship
## Use Cases Involving Other Components | Retrieve all files in a directory of an SFTP Server | | --------------------------------------------------- | ## See also - [org.apache.nifi.processors.standard.GetFTP](/user-guide/data-integration/openflow/processors/getftp) - [org.apache.nifi.processors.standard.GetSFTP](/user-guide/data-integration/openflow/processors/getsftp) - [org.apache.nifi.processors.standard.PutFTP](/user-guide/data-integration/openflow/processors/putftp) - [org.apache.nifi.processors.standard.PutSFTP](/user-guide/data-integration/openflow/processors/putsftp) --- title: FetchSharepointFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchsharepointfile.md section: Loading & Unloading Data --- # FetchSharepointFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-msgraph-nar ## Description Fetches the contents of a file from a Sharepoint Drive, optionally downloading a PDF or HTML version of the file when applicable. Any FlowFile that represents a Sharepoint folder will be routed to success without fetching contents. ## Tags cdc, document, graph, microsoft, openflow, sharepoint, unstructured ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Authentication Service The service that provides authentication for the SharePoint API Download PDF/HTML Version Sharepoint supports automatically converting certain file formats to PDF or HTML. If this property is set to _true_, the Processor will inspect the FlowFile's filename extension to determine if the file can be converted to PDF or HTML. If the file can be converted, the Processor will download the converted version. If the file cannot be converted, the Processor will download the original file. If this property is set to _false_, the Processor will always download the original file. Drive ID The ID of the drive that contains the file to fetch Fallback Retry Duration The time to wait before retrying the operation after a communication failure. This value is used when the response doesn't contain a Retry-After header. Item ID The ID of the item to fetch Update Extension If true, the Processor will update the filename extension to match the format of the downloaded file
## Relationships
Name Description comms.failure A FlowFile is routed here if the processor failed to communicate with the Graph API. Can be retried failure An incoming FlowFile is routed to this relationship if the contents of the item could not be fetched not.found A FlowFile is routed here if the item was not found success An incoming FlowFile is routed to this relationship after the contents of the item have been fetched and written to the FlowFile
## Use Cases Involving Other Components | Fetch a file from Sharepoint by the Site URL, Drive Name and file path. | | ----------------------------------------------------------------------- | ## See also - [com.snowflake.openflow.runtime.processors.sharepoint.CaptureSharepointChanges](/user-guide/data-integration/openflow/processors/capturesharepointchanges) --- title: FetchSharepointMetadata 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchsharepointmetadata.md section: Loading & Unloading Data --- # FetchSharepointMetadata 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-msgraph-nar ## Description For each drive item retrieves its metadata and permissions and writes them as FlowFile attributes. ## Tags cdc, document, graph, library, microsoft, openflow, sharepoint, unstructured ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Authentication Service The service that provides authentication for the SharePoint API Drive ID A drive id where the Sharepoint file resides Fallback Retry Duration The time to wait before retrying the operation after a communication failure. This value is used when the response doesn't contain a Retry-After header. Fetch Item Permissions If true, the Processor will fetch user and group permission information for the captured Sharepoint item. Item ID An id of an item to retrieve the metadata for Item Permissions To Fetch A comma-separated list of permission types to fetch for the captured Sharepoint item. Available permission types: USER, GROUP, SITE_USER, SITE_GROUP. Site ID A site id where the Sharepoint file resides
## Relationships
Name Description comms.failure A FlowFile is routed here if the processor failed to communicate with the Graph API. Can be retried failure An incoming FlowFile is routed to this relationship if the metadata and permissions of the item could not be fetched not.found A FlowFile is routed here if the item was not found success An incoming FlowFile is routed to this relationship after the metadata and permissions of the item have been fetched and written to the FlowFile attributes
## Writes attributes
Name Description sharepoint.item.id The ID of the Sharepoint item. sharepoint.item.type The type of the Sharepoint item. Possible values are 'File' and 'Folder'. sharepoint.path The path of the Sharepoint item. This is the path relative to the root of the Document Library. sharepoint.filename The name of the Sharepoint item. This attribute is not available for 'Deleted' changes. sharepoint.size The size of the Sharepoint item. sharepoint.createdAt The creation timestamp of the Sharepoint item. sharepoint.lastModified The last modified timestamp of the Sharepoint item. sharepoint.createdBy.<identity>.id An id of the identity that created the Sharepoint item. This attribute is not always available. sharepoint.createdBy.<identity>.displayName A display name of the identity that created the Sharepoint item. This attribute is not always available. sharepoint.createdBy.<identity>.email An email of the identity that created the Sharepoint item. This attribute is not always available. sharepoint.lastModifiedBy.<identity>.id An id of the identity that modified the Sharepoint item last. This attribute is not always available. sharepoint.lastModifiedBy.<identity>.displayName A display name of the identity that modified the Sharepoint item last. This attribute is not always available. sharepoint.lastModifiedBy.<identity>.email An email of the identity that modified the Sharepoint item last. This attribute is not always available. sharepoint.drive.id The ID of the Sharepoint Drive that contains the item. sharepoint.site.id The ID of the Sharepoint Site that contains the item. sharepoint.ctag The CTag of the Sharepoint item. sharepoint.etag The ETag of the Sharepoint item. sharepoint.webUrl The browser view url of the Sharepoint item. sharepoint.permissions.read.groups A comma-separated list of groups that have read permissions on the Sharepoint item. For each group, if an e-mail address is available in Sharepoint, it will be included. Additionally, the group principal, such as _mygroup@mytenant.onmicrosoft.com_, is included. sharepoint.permissions.read.groups.ids A comma-separated list of group IDs that have read permissions on the Sharepoint item. sharepoint.permissions.read.users A comma-separated list of users that have read permissions on the Sharepoint item. For each user, if an e-mail address is available in Sharepoint, it will be included. Additionally, the user principal, such as _johndoe@mytenant.onmicrosoft.com_, is included. sharepoint.permissions.read.users.ids A comma-separated list of Microsoft365 user IDs that have read permissions on the Sharepoint item. sharepoint.permissions.read.siteusers A comma-separated list of Sharepoint site user emails that have read permissions on the Sharepoint item. sharepoint.permissions.read.siteusers.ids A comma-separated list of Sharepoint site user IDs that have read permissions on the Sharepoint item. sharepoint.permissions.read.sitegroups.ids A comma-separated list of Sharepoint site group IDs that have read permissions on the Sharepoint item. filename The name of the Sharepoint item. path The path of the Sharepoint item. This is the path relative to the root of the Document Library. mime.type The MIME type of the Sharepoint item. This attribute is only available for 'File' items. hash.quickxor The QuickXor hash of the Sharepoint item. This attribute is not always available. hash.sha256 The SHA-256 hash of the Sharepoint item. This attribute is not always available. hash.sha1 The SHA-1 hash of the Sharepoint item. This attribute is not always available. hash.crc32 The CRC32 hash of the Sharepoint item. This attribute is not always available.
--- title: FetchSlackConversationInfo 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchslackconversationinfo.md section: Loading & Unloading Data --- # FetchSlackConversationInfo 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-slack-processors-nar ## Description Fetches Slack conversation info and member emails ## Tags conversation, conversation.members, slack, social media, team ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Access Token OAuth Access Token used for authenticating/authorizing the Slack request sent by NiFi. This may be either a User Token or a Bot Token. It must be granted the channels:history, groups:history, im:history, or mpim:history scope, depending on the type of conversation being used. Cache Expiration User emails are cached to reduce network lookups. A longer expiration reduces network overhead but can cause data to be out of sync. Cache Size User emails are cached to reduce network lookups. A larger cache consumes memory but reduces network overhead. Channel The Slack Channel ID to retrieve info from. Leave blank to iterate over every available Conversation. Rate Limiter Service Slack Rate Limiter Service to coordinate rate limiting across processors
## Relationships
Name Description conversations Each configured Slack Conversation info and members will be routed to this relationship in separate FlowFiles failure If Slack Conversation metadata is unable to be received the input FlowFile will be routed to this relationship original Original input FlowFile that has been successfully processed.
## Writes attributes
Name Description conversation.members.count Set to the number of members of the conversation conversation.id Set to the number of members of the conversation channel.name Set to the name of the channel if the conversation is a channel mime.type Set to application/json, as the output will always be in JSON format
--- title: FetchSlackFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchslackfile.md section: Loading & Unloading Data --- # FetchSlackFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-slack-processors-nar ## Description Downloads a file shared on Slack. Writes the file content to the FlowFile content and FlowFile attributes from the file. ## Tags download, file, slack ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Bot Token The Bot Token that is registered to your Slack application Channel ID The Slack Channel ID where the file was shared. File ID The Slack File ID to download. Rate Limiter Service Slack Rate Limiter Service to coordinate rate limiting across processors Web Client Service The Web Client Service to use for downloading files from Slack
## Relationships
Name Description failure FlowFiles that could not be processed are routed to this relationship success FlowFiles containing successfully downloaded Slack files are routed to this relationship
## Writes attributes
Name Description mime.type The MIME type of the downloaded file filename The name of the downloaded file slack.file.name The Slack File name slack.file.mimetype The Slack File MIME type slack.file.size The Slack File size in bytes slack.conversation.id The Slack Channel ID slack.event.ts The Slack event timestamp
## See also - [com.snowflake.openflow.runtime.processors.slack.FetchSlackConversationInfo](/user-guide/data-integration/openflow/processors/fetchslackconversationinfo) - [com.snowflake.openflow.runtime.processors.slack.FetchSlackMessage](/user-guide/data-integration/openflow/processors/fetchslackmessage) --- title: FetchSlackMessage 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchslackmessage.md section: Loading & Unloading Data --- # FetchSlackMessage 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-slack-processors-nar ## Description Fetches data about a single Slack message ## Tags conversation, conversation.history, slack, social media, team, text, unstructured ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Access Token OAuth Access Token used for authenticating/authorizing the Slack request sent by NiFi. This may be either a User Token or a Bot Token. It must be granted the channels:history, groups:history, im:history, or mpim:history scope, depending on the type of conversation being used. Channel The Slack Channel ID to Retrieve a message from. Include Message Blocks Specifies whether or not the output JSON should include the value of the 'blocks' field for each Slack Message. This field includes information such as individual parts of a message that are formatted using rich text. This may be useful, for instance, for parsing. However, it often accounts for a significant portion of the data and as such may be set to null when it is not useful to you. Include Null Fields Specifies whether or not fields that have null values should be included in the output JSON. If true, any field in a Slack Message that has a null value will be included in the JSON with a value of null. If false, the key omitted from the output JSON entirely. Omitting null values results in smaller messages that are generally more efficient to process, but including the values may provide a better understanding of the format, especially for schema inference. Message Timestamp The timestamp of the message which is also its ID within a channel. Rate Limiter Service Slack Rate Limiter Service to coordinate rate limiting across processors Resolve Usernames Specifies whether or not User IDs should be resolved to usernames. By default, Slack Messages provide the ID of the user that sends a message, such as U0123456789, but not the username, such as NiFiUser. The username may be resolved, but it may require additional calls to the Slack API and requires that the Token used be granted the users:read scope. If set to true, usernames will be resolved with a best-effort policy: if a username cannot be obtained, it will be skipped over. Also, note that when a username is obtained, the Message's <username> field is populated, and the <text> field is updated such that any mention will be output such as "Hi @user" instead of "Hi <@U1234567>". Thread Timestamp The timestamp of the thread the message belongs to. This can be null or empty unless the message is a reply to another message.
## Relationships
Name Description failure Slack messages that fail to be received will be routed to this relationship not found Slack messages that were not found on the Slack server will be routed to this relationship success Slack messages that are successfully received will be routed to this relationship
## Writes attributes
Name Description mime.type Set to application/json, as the output will always be in JSON format
--- title: FetchSmb 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchsmb.md section: Loading & Unloading Data --- # FetchSmb 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-smb-nar ## Description Fetches files from a SMB Share. Designed to be used in tandem with ListSmb. ## Tags cifs, fetch, files, samba, smb ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Completion Strategy Specifies what to do with the original file on the server once it has been processed. If the Completion Strategy fails, a warning will be logged but the data will still be transferred. Create Destination Directory Specifies whether or not the remote directory should be created if it does not exist. Destination Directory The directory on the remote server to move the original file to once it has been processed. remote-file The full path of the file to be retrieved from the remote server. Expression language is supported. smb-client-provider-service Specifies the SMB client provider to use for creating SMB connections.
## Relationships
Name Description failure A FlowFile will be routed here when failed to fetch its content. success A FlowFile will be routed here for each successfully fetched file.
## Writes attributes
Name Description error.code The error code returned by SMB when the fetch of a file fails. error.message The error message returned by SMB when the fetch of a file fails.
## See also - [org.apache.nifi.processors.smb.GetSmbFile](/user-guide/data-integration/openflow/processors/getsmbfile) - [org.apache.nifi.processors.smb.ListSmb](/user-guide/data-integration/openflow/processors/listsmb) - [org.apache.nifi.processors.smb.PutSmbFile](/user-guide/data-integration/openflow/processors/putsmbfile) --- title: FetchSnowflakeTableProperties 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchsnowflaketableproperties.md section: Loading & Unloading Data --- # FetchSnowflakeTableProperties 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowflake-processors-nar ## Description Reads properties from a table and stores them as flow file attributes. ## Tags database, jdbc, openflow, snowflake ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Connection Pool The connection pool to use to connect to Snowflake Schema Name The name of the schema Table Metadata Cache Expiration Time The time in seconds after which the cache entry will be removed Table Name The name of the table Use Table Metadata Cache Whether to cache table's metadata instead of reading it directly from Snowflake.
## Relationships
Name Description failure The incoming FlowFile is routed to this relationship if the properties cannot be read success The incoming FlowFile is routed to this relationship after the table properties has been successfully read table not found The incoming FlowFile is routed to this relationship if the specified table does not exist.
--- title: FetchSourceTableSchema 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchsourcetableschema.md section: Loading & Unloading Data --- # FetchSourceTableSchema 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-database-cdc-processors-nar ## Description Fetches the table schema (i.e., column names, data types, etc.) for a given table in a database, converting the data types to Snowflake-compatible types. The schema is written to the FlowFile content as a JSON object, in a form such as: \{ "columns": [ \{ "name": "<columnName>", "type": "<snowflakeType>", "nullable": <true|false>, "scale": <scale>, "precision": <precision> \}, ... ], "primaryKeys": ["<primaryKey1>", "<primaryKey2>", ...] \} ## Tags ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Column Filter Service Specifies the Column Filter Service to be used for filtering out unwanted columns Connection Pool The connection pool to use to fetch the source table schema Schema Name The name of the schema that the source table is stored in Table Name The name of the source table
## Relationships
Name Description failure FlowFiles are routed to this relationship in the event that the source table's schema cannot be fetched success FlowFiles are routed to this relationship when the source table's schema is successfully fetched table not found FlowFiles are routed to this relationship when the source table does not exist
## Writes attributes
Name Description mime.type application/json dbms.type The type of database management system (DBMS) that the source table is stored in. E.g. _POSTGRESQL_ primary.key.count The number of primary keys in the source table column.count The number of columns in the source table
--- title: FetchTableSnapshot 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/fetchtablesnapshot.md section: Loading & Unloading Data --- # FetchTableSnapshot 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-database-cdc-processors-nar ## Description Fetches a snapshot of a table from a database. The snapshot is fetched incrementally, using the primary key columns of the table to fetch rows in batches. Replicating a table without primary key is not supported. The snapshot is written to a FlowFile in the specified Record Writer format. The input FlowFile is expected to consist of a JSON representation of the table schema in the following format: \{ "columns": [\{ "name": "<column name>", "type": "<column type>" \}, \{ "name": "<column name>", "type": "<column type>" \}, ... ], "primaryKeys": ["<name of first primary key column>", "<name of second primary key column>", ...] \} Only those columns that are specified in the schema will be fetched from the table. ## Tags database, fetch, rdbms, snapshot, snowflake, table ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Connection Pool The connection pool to use to fetch the database snapshot Fetch Size The maximum number of rows loaded into memory at once JDBC Driver Location Comma-separated list of files/folders and/or URLs containing the driver JAR and its dependencies (if any). For example '/var/tmp/postgresql-java-client-42.7.5.jar' Max Batch Size The maximum number of rows to fetch in a single batch Record Writer The record writer to use to write the fetched snapshot Schema Name The name of the schema to fetch the snapshot from Table Name The name of the table to fetch the snapshot from
## Relationships
Name Description complete When the snapshot is complete, the original FlowFile will be routed to this relationship failure If the data cannot be retrieved from the table represented by the FlowFile, the FlowFile will be routed to this relationship. retryable failure If the data cannot be retrieved from the table represented by the FlowFile but we expect it to be possible in future, the FlowFile will be routed to this relationship. rows When the snapshot is successfully retrieved from the table represented by the FlowFile, the rows will be routed to this relationship.
## Writes attributes
Name Description snapshot.complete Indicates whether the snapshot is complete rows.total.fetched The total number of rows fetched for the table rows.delta.fetched The number of rows fetched for the table in the last iteration start.row.index The index of the first row within the snapshot for a given iteration, starting from 0 last.row.index The index of the last row within the snapshot for a given iteration, starting from 0 fetch.delta.time.in.millis The time in milliseconds taken to fetch the rows in the last iteration fetch.total.time.in.millis The time in milliseconds taken so far to fetch the rows
--- title: FilterAttribute 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/filterattribute.md section: Loading & Unloading Data --- # FilterAttribute 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Filters the attributes of a FlowFile by retaining specified attributes and removing the rest or by removing specified attributes and retaining the rest. ## Tags Attribute Expression Language, attributes, delete, filter, modification, regex, regular expression, remove, retain ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Attribute Matching Strategy Specifies the strategy to filter attributes by. Filter Mode Specifies the strategy to apply on filtered attributes. Either 'Remove' or 'Retain' only the matching attributes. Filtered Attributes A set of attribute names to filter from FlowFiles. Each attribute name is separated by the comma delimiter ','. Filtered Attributes Pattern A regular expression to match names of attributes to filter from FlowFiles.
## Relationships
Name Description success All successful FlowFiles are routed to this relationship
## Use cases | Retain all FlowFile attributes matching a regular expression | | ------------------------------------------------------------ | | Remove only a specified set of FlowFile attributes | --- title: FindConfluencePages 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/findconfluencepages.md section: Loading & Unloading Data --- # FindConfluencePages 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-atlassian-processors-nar ## Description Processor for finding Confluence pages using space name and page name. ## Tags Preview, atlassian, confluence, fetch, pages ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Confluence Client Service Controller service for managing connections to Confluence Confluence Page Name Name of the Confluence Page. If not provided, all pages in the space will be retrieved. Confluence Space Name Name of the Confluence Space
## Relationships
Name Description failure Failed to find Confluence pages not found Pages for given space name and page name not found retry Retryable failure occurred, e.g. rate limiting success Successfully found Confluence pages
## Writes attributes
Name Description confluence.page.name Unique identifier of the Confluence page. confluence.page.change.type Informs about status change for the searched page. confluence.page.url Confluence page url. confluence.page.title Confluence page title. confluence.page.last.modification.date Last modification date of the Confluence page. confluence.space.name Name of the Confluence space.
--- title: FindSharepointDriveItem 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/findsharepointdriveitem.md section: Loading & Unloading Data --- # FindSharepointDriveItem 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-msgraph-nar ## Description Finds a Sharepoint Drive Item by its Drive ID and Item path. ## Tags document, graph, microsoft, openflow, sharepoint, unstructured ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Authentication Service The service that provides authentication for the SharePoint API. Drive ID The ID of the Sharepoint Drive. Fallback Retry Duration The time to wait before retrying the operation after a communication failure. This value is used when the response doesn't contain a Retry-After header. Item Path The path of the Drive Item to find in a Drive.
## Relationships
Name Description comms.failure A FlowFile is routed here if the processor failed to communicate with the Graph API. Can be retried failure An incoming FlowFile is routed to this relationship if an unexpected error has occurred found An incoming FlowFile is routed to this relationship, with attributes about the Item added, if the specified item was found in Sharepoint not.found An incoming FlowFile is routed to this relationship if the specified item was not found in Sharepoint
## Writes attributes
Name Description sharepoint.item.id The ID of the Sharepoint Drive Item. sharepoint.item.type The type of the Sharepoint Drive Item, possible values are 'File' and 'Folder'.
## See also - [com.snowflake.openflow.runtime.processors.sharepoint.FetchSharepointFile](/user-guide/data-integration/openflow/processors/fetchsharepointfile) - [com.snowflake.openflow.runtime.processors.sharepoint.ListSharepointDrives](/user-guide/data-integration/openflow/processors/listsharepointdrives) --- title: FlattenJson 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/flattenjson.md section: Loading & Unloading Data --- # FlattenJson 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Provides the user with the ability to take a nested JSON document and flatten it into a simple key/value pair document. The keys are combined at each level with a user-defined separator that defaults to '.'. This Processor also allows to unflatten back the flattened json. It supports four kinds of flatten mode such as normal, keep-arrays, dot notation for MongoDB query and keep-primitive-arrays. Default flatten mode is 'keep-arrays'. ## Tags flatten, json, unflatten ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description flatten-json-character-set The Character Set in which file is encoded flatten-json-pretty-print-json Specifies whether or not resulted json should be pretty printed flatten-json-return-type Specifies the desired return type of json such as flatten/unflatten flatten-json-separator The separator character used for joining keys. Must be a JSON-legal character. flatten-mode Specifies how json should be flattened/unflattened ignore-reserved-characters If true, reserved characters in keys will be ignored
## Relationships
Name Description failure Files that cannot be flattened/unflattened go to this relationship. success Successfully flattened/unflattened files go to this relationship.
--- title: ForkEnrichment 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/forkenrichment.md section: Loading & Unloading Data --- # ForkEnrichment 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Used in conjunction with the JoinEnrichment processor, this processor is responsible for adding the attributes that are necessary for the JoinEnrichment processor to perform its function. Each incoming FlowFile will be cloned. The original FlowFile will have appropriate attributes added and then be transferred to the 'original' relationship. The clone will have appropriate attributes added and then be routed to the 'enrichment' relationship. See the documentation for the JoinEnrichment processor (and especially its Additional Details) for more information on how these Processors work together and how to perform enrichment tasks in NiFi by using these Processors. ## Tags enrich, fork, join, record ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Relationships
Name Description enrichment A clone of the incoming FlowFile will be routed to this relationship, after adding appropriate attributes. original The incoming FlowFile will be routed to this relationship, after adding appropriate attributes.
## Writes attributes
Name Description enrichment.group.id The Group ID to use in order to correlate the 'original' FlowFile with the 'enrichment' FlowFile. enrichment.role The role to use for enrichment. This will either be ORIGINAL or ENRICHMENT.
## See also - [org.apache.nifi.processors.standard.JoinEnrichment](/user-guide/data-integration/openflow/processors/joinenrichment) --- title: ForkRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/forkrecord.md section: Loading & Unloading Data --- # ForkRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description This processor allows the user to fork a record into multiple records. The user must specify at least one Record Path, as a dynamic property, pointing to a field of type ARRAY containing RECORD objects. The processor accepts two modes: 'split' and 'extract'. In both modes, there is one record generated per element contained in the designated array. In the 'split' mode, each generated record will preserve the same schema as given in the input but the array will contain only one element. In the 'extract' mode, the element of the array must be of record type and will be the generated record. Additionally, in the 'extract' mode, it is possible to specify if each generated record should contain all the fields of the parent records from the root level to the extracted record. This assumes that the fields to add in the record are defined in the schema of the Record Writer controller service. See examples in the additional details documentation of this processor. ## Tags array, content, event, fork, record, stream ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description fork-mode Specifies the forking mode of the processor include-parent-fields This parameter is only valid with the 'extract' mode. If set to true, all the fields from the root level to the given array will be added as fields of each element of the array to fork. record-reader Specifies the Controller Service to use for reading incoming data record-writer Specifies the Controller Service to use for writing out the records
## Relationships
Name Description failure In case a FlowFile generates an error during the fork operation, it will be routed to this relationship fork The FlowFiles containing the forked records will be routed to this relationship original The original FlowFiles will be routed to this relationship
## Writes attributes
Name Description record.count The generated FlowFile will have a 'record.count' attribute indicating the number of records that were written to the FlowFile. mime.type The MIME Type indicated by the Record Writer <Attributes from Record Writer> Any Attribute that the configured Record Writer returns will be added to the FlowFile.
--- title: FreeFormTextRecordSetWriter source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/freeformtextrecordsetwriter.md section: Loading & Unloading Data --- # FreeFormTextRecordSetWriter This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Writes the contents of a RecordSet as free-form text. The configured text is able to make use of the Expression Language to reference each of the fields that are available in a Record, as well as the attributes in the FlowFile and variables. If there is a name collision, the field name/value is used before attributes or variables. Each record in the RecordSet will be separated by a single newline character. ## Tags el, expression, freeform, language, record, recordset, resultset, serialize, text, writer ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Character Set * Character Set UTF-8 The Character set to use when writing the data to the FlowFile Text * Text The text to use when writing the results. This property will evaluate the Expression Language using any of the fields available in a Record.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: GCPCredentialsControllerService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/gcpcredentialscontrollerservice.md section: Loading & Unloading Data --- # GCPCredentialsControllerService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Defines credentials for Google Cloud Platform processors. Uses Application Default credentials without configuration. Application Default credentials support environmental variable (GOOGLE_APPLICATION_CREDENTIALS) pointing to a credential file, the config generated by *gcloud auth application-default login*, AppEngine/Compute Engine service accounts, etc. ## Tags credentials, gcp, provider ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Delegation Strategy * Delegation Strategy Service Account - Service Account - Delegated Account The Delegation Strategy determines which account is used when calls are made with the GCP Credential. Delegation User * Delegation User This user will be impersonated by the service account for api calls. API calls made using this credential will appear as if they are coming from delegate user with the delegate user's access. Any scopes supplied from processors to this credential must have domain-wide delegation setup with the service account. Use Application Default Credentials application-default-credentials false - true - false If true, uses Google Application Default Credentials, which checks the GOOGLE_APPLICATION_CREDENTIALS environment variable for a filepath to a service account JSON key, the config generated by the gcloud sdk, the App Engine service account, and the Compute Engine service account. Use Compute Engine Credentials compute-engine-credentials false - true - false If true, uses Google Compute Engine Credentials of the Compute Engine VM Instance which NiFi is running on. Proxy Configuration Service proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. Service Account JSON service-account-json The raw JSON containing a Service Account keyfile. Service Account JSON File service-account-json-file Path to a file containing a Service Account key file in JSON format.
## State management This component does not store state. ## Restricted ## Restrictions
Required Permission Explanation access environment credentials The default configuration can read environment variables and system properties for credentials
## System Resource Considerations This component does not specify system resource considerations. --- title: GCSFileResourceService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/gcsfileresourceservice.md section: Loading & Unloading Data --- # GCSFileResourceService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a Google Compute Storage (GCS) file resource for other components. ## Tags file, gcs, resource ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Bucket * Bucket $\{gcs.bucket\} Bucket of the object. Name * Name $\{filename\} Name of the object. GCP Credentials Provider Service * gcp-credentials-provider-service The Controller Service used to obtain Google Cloud Platform credentials.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: GenerateAnswersFromContext 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/generateanswersfromcontext.md section: Loading & Unloading Data --- # GenerateAnswersFromContext 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-rag-evaluation-processors-nar ## Description Generates synthetic answers for each question present in the incoming records using a Large Language Model (LLM). For every record, the processor extracts the question and its associated context based on the specified RecordPaths, constructs a prompt, and sends it to an LLM provider to obtain a synthetic answer. The generated answer is then inserted into the record at the designated RecordPath. ## Tags ai, answers, contextual, generation, llm, nlp, openai, openflow, rag, synthetic ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Answer Record Path The RecordPath to the synthetically generated answers Context Record Path The RecordPath to the array of contexts in the record. LLM Provider Service The provider service for sending evaluation prompts to LLM Max Character Context Length Maximum character length of context window. Question Record Path The RecordPath to the question field in the record. Record Reader The Record Reader to use for reading the FlowFile. Record Writer The Record Writer to use for writing the results.
## Relationships
Name Description failure FlowFiles that cannot be processed are routed to this relationship success FlowFiles that are successfully processed are routed to this relationship
## Writes attributes
Name Description answers.successfully.generated The total number of successfully generated synthetic answers for the FlowFile. answers.failed.generated The total number of synthetic answer generation attempts that failed for the FlowFile. json.parse.failures Number of JSON parse failures encountered.
--- title: GenerateAnswersFromGroundTruth 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/generateanswersfromgroundtruth.md section: Loading & Unloading Data --- # GenerateAnswersFromGroundTruth 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-rag-evaluation-processors-nar ## Description Generates synthetic answers for each question in the incoming records using an LLM. The synthetic answers are added to the specified RecordPath within each record. Additionally, the processor tracks the number of answers generated and updates the FlowFile attributes accordingly. ## Tags ai, answers, generation, llm, nlp, openai, openflow, rag, synthetic ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Answer Record Path The RecordPath to the synthetically generated answers. Ground Truth Record Path The RecordPath to the ground truth field in the record. LLM Provider Service The provider service for sending evaluation prompts to LLM Question Record Path The RecordPath to the question field in the record. Record Reader The Record Reader to use for reading the FlowFile. Record Writer The Record Writer to use for writing the results.
## Relationships
Name Description failure FlowFiles that cannot be processed are routed to this relationship success FlowFiles that are successfully processed are routed to this relationship
## Writes attributes
Name Description answers.successfully.generated The total number of successfully synthetic answers generated for the FlowFile. answers.failed.generated The total number of failed answer generation for the FlowFile. json.parse.failures Number of JSON parse failures encountered.
--- title: GenerateFlowFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/generateflowfile.md section: Loading & Unloading Data --- # GenerateFlowFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description This processor creates FlowFiles with random data or custom content. GenerateFlowFile is useful for load testing, configuration, and simulation. Also see DuplicateFlowFile for additional load testing. ## Tags generate, load, random, test ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Batch Size The number of FlowFiles to be transferred in each invocation Data Format Specifies whether the data should be Text or Binary File Size The size of the file that will be used Unique FlowFiles If true, each FlowFile that is generated will be unique. If false, a random value will be generated and all FlowFiles will get the same content but this offers much higher throughput character-set Specifies the character set to use when writing the bytes of Custom Text to a flow file. generate-ff-custom-text If Data Format is text and if Unique FlowFiles is false, then this custom text will be used as content of the generated FlowFiles and the File Size will be ignored. Finally, if Expression Language is used, evaluation will be performed only once per batch of generated FlowFiles mime-type Specifies the value to set for the "mime.type" attribute.
## Relationships
Name Description success
## Writes attributes
Name Description mime.type Sets the MIME type of the output if the 'Mime Type' property is set
--- title: GenerateJSON 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/generatejson.md section: Loading & Unloading Data --- # GenerateJSON 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-record-generation-nar ## Description Produces a batch of JSON Objects with random field values based on a configurable JSON Schema. ## Tags JSON, JSON Schema, generate, random ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Batch Size Number of records generated per FlowFile produced JSON Schema JSON Schema version 2020-12 describing an object with properties indicating type and format for each field Output Structure Structure for writing batches of records to each FlowFile
## Relationships
Name Description success FlowFiles with generated JSON records
--- title: GenerateRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/generaterecord.md section: Loading & Unloading Data --- # GenerateRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description This processor creates FlowFiles with records having random value for the specified fields. GenerateRecord is useful for testing, configuration, and simulation. It uses either user-defined properties to define a record schema or a provided schema and generates the specified number of records using random data for the fields in the schema. ## Tags fake, generate, random, test ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description null-percentage The percent probability (0-100%) that a generated value for any nullable field will be null. Set this property to zero to have no null values, or 100 to have all null values. nullable-fields Whether the generated fields will be nullable. Note that this property is ignored if Schema Text is set. Also it only affects the schema of the generated data, not whether any values will be null. If this property is true, see 'Null Value Percentage' to set the probability that any generated field will be null. number-of-records Specifies how many records will be generated for each outgoing FlowFile. record-writer Specifies the Controller Service to use for writing out the records schema-text The text of an Avro-formatted Schema used to generate record data. If this property is set, any user-defined properties are ignored.
## Relationships
Name Description success FlowFiles that are successfully created will be routed to this relationship
## Writes attributes
Name Description mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer record.count The number of records in the FlowFile
--- title: GenerateTableFetch 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/generatetablefetch.md section: Loading & Unloading Data --- # GenerateTableFetch 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Generates SQL select queries that fetch "pages" of rows from a table. The partition size property, along with the table 's row count, determine the size and number of pages and generated FlowFiles. In addition, incremental fetching can be achieved by setting Maximum-Value Columns, which causes the processor to track the columns' maximum values, thus only fetching rows whose columns 'values exceed the observed maximums. This processor is intended to be run on the Primary Node only. This processor can accept incoming connections; the behavior of the processor is different whether incoming connections are provided: - If no incoming connection(s) are specified, the processor will generate SQL queries on the specified processor schedule. Expression Language is supported for many fields, but no FlowFile attributes are available. However the properties will be evaluated using the Environment/System properties. - If incoming connection(s) are specified and no FlowFile is available to a processor task, no work will be performed. - If incoming connection(s) are specified and a FlowFile is available to a processor task, the FlowFile's attributes may be used in Expression Language for such fields as Table Name and others. However, the Max-Value Columns and Columns to Return fields must be empty or refer to columns that are available in each specified table. ## Tags database, fetch, generate, jdbc, query, select, sql ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Columns to Return A comma-separated list of column names to be used in the query. If your database requires special treatment of the names (quoting, e.g.), each name should include such treatment. If no column names are supplied, all columns in the specified table will be returned. NOTE: It is important to use consistent column names for a given table for incremental fetch to work properly. Database Connection Pooling Service The Controller Service that is used to obtain a connection to the database. Database Dialect Service Database Dialect Service for generating statements specific to a particular service or vendor. Max Wait Time The maximum amount of time allowed for a running SQL select query , zero means there is no limit. Max time less than 1 second will be equal to zero. Maximum-value Columns A comma-separated list of column names. The processor will keep track of the maximum value for each column that has been returned since the processor started running. Using multiple columns implies an order to the column list, and each column 's values are expected to increase more slowly than the previous columns' values. Thus, using multiple columns implies a hierarchical structure of columns, which is usually used for partitioning tables. This processor can be used to retrieve only those rows that have been added/updated since the last retrieval. Note that some JDBC types such as bit/boolean are not conducive to maintaining maximum value, so columns of these types should not be listed in this property, and will result in error(s) during processing. If no columns are provided, all rows from the table will be considered, which could have a performance impact. NOTE: It is important to use consistent max-value column names for a given table for incremental fetch to work properly. Table Name The name of the database table to be queried. db-fetch-db-type Database Type for generating statements specific to a particular service or vendor. The Generic Type supports most cases but selecting a specific type enables optimal processing or additional features. db-fetch-where-clause A custom clause to be added in the WHERE condition when building SQL queries. gen-table-column-for-val-partitioning The name of a column whose values will be used for partitioning. The default behavior is to use row numbers on the result set for partitioning into 'pages' to be fetched from the database, using an offset/limit strategy. However for certain databases, it can be more efficient under the right circumstances to use the column values themselves to define the 'pages'. This property should only be used when the default queries are not performing well, when there is no maximum-value column or a single maximum-value column whose type can be coerced to a long integer (i.e. not date or timestamp), and the column values are evenly distributed and not sparse, for best performance. gen-table-custom-orderby-column The name of a column to be used for ordering the results if Max-Value Columns are not provided and partitioning is enabled. This property is ignored if either Max-Value Columns is set or Partition Size = 0. NOTE: If neither Max-Value Columns nor Custom ORDER BY Column is set, then depending on the database/driver, the processor may report an error and/or the generated SQL may result in missing and/or duplicate rows. This is because without an explicit ordering, fetching each partition is done using an arbitrary ordering. gen-table-fetch-partition-size The number of result rows to be fetched by each generated SQL statement. The total number of rows in the table divided by the partition size gives the number of SQL statements (i.e. FlowFiles) generated. A value of zero indicates that a single FlowFile is to be generated whose SQL statement will fetch all rows in the table. gen-table-output-flowfile-on-zero-results Depending on the specified properties, an execution of this processor may not result in any SQL statements generated. When this property is true, an empty FlowFile will be generated (having the parent of the incoming FlowFile if present) and transferred to the 'success' relationship. When this property is false, no output FlowFiles will be generated.
## State management
Scopes Description CLUSTER After performing a query on the specified table, the maximum values for the specified column(s) will be retained for use in future executions of the query. This allows the Processor to fetch only those records that have max values greater than the retained values. This can be used for incremental fetching, fetching of newly added rows, etc. To clear the maximum values, clear the state of the processor per the State Management documentation
## Relationships
Name Description failure This relationship is only used when SQL query execution (using an incoming FlowFile) failed. The incoming FlowFile will be penalized and routed to this relationship. If no incoming connection(s) are specified, this relationship is unused. success Successfully created FlowFile from SQL query result set.
## Writes attributes
Name Description generatetablefetch.sql.error If the processor has incoming connections, and processing an incoming FlowFile causes a SQL Exception, the FlowFile is routed to failure and this attribute is set to the exception message. generatetablefetch.tableName The name of the database table to be queried. generatetablefetch.columnNames The comma-separated list of column names used in the query. generatetablefetch.whereClause Where clause used in the query to get the expected rows. generatetablefetch.maxColumnNames The comma-separated list of column names used to keep track of data that has been returned since the processor started running. generatetablefetch.limit The number of result rows to be fetched by the SQL statement. generatetablefetch.offset Offset to be used to retrieve the corresponding partition. fragment.identifier All FlowFiles generated from the same query result set will have the same value for the fragment.identifier attribute. This can then be used to correlate the results. fragment.count This is the total number of FlowFiles produced by a single ResultSet. This can be used in conjunction with the fragment.identifier attribute in order to know how many FlowFiles belonged to the same incoming ResultSet. fragment.index This is the position of this FlowFile in the list of outgoing FlowFiles that were all generated from the same execution. This can be used in conjunction with the fragment.identifier attribute to know which FlowFiles originated from the same execution and in what order FlowFiles were produced
## See also - [org.apache.nifi.processors.standard.ExecuteSQL](/user-guide/data-integration/openflow/processors/executesql) - [org.apache.nifi.processors.standard.ListDatabaseTables](/user-guide/data-integration/openflow/processors/listdatabasetables) - [org.apache.nifi.processors.standard.QueryDatabaseTable](/user-guide/data-integration/openflow/processors/querydatabasetable) --- title: GeoEnrichIP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/geoenrichip.md section: Loading & Unloading Data --- # GeoEnrichIP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-enrich-nar ## Description Looks up geolocation information for an IP address and adds the geo information to FlowFile attributes. The geo data is provided as a MaxMind database. The attribute that contains the IP address to lookup is provided by the 'IP Address Attribute' property. If the name of the attribute provided is 'X', then the attributes added by enrichment will take the form X.geo.<fieldName> ## Tags enrich, geo, ip, maxmind ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description IP Address Attribute The name of an attribute whose value is a dotted decimal IP address for which enrichment should occur Log Level The Log Level to use when an IP is not found in the database. Accepted values: INFO, DEBUG, WARN, ERROR. MaxMind Database File Path to Maxmind IP Enrichment Database File
## Relationships
Name Description found Where to route flow files after successfully enriching attributes with data provided by database not found Where to route flow files after unsuccessfully enriching attributes because no data was found
## Writes attributes
Name Description X.geo.lookup.micros The number of microseconds that the geo lookup took X.geo.city The city identified for the IP address X.geo.accuracy The accuracy radius if provided by the database (in Kilometers) X.geo.latitude The latitude identified for this IP address X.geo.longitude The longitude identified for this IP address X.geo.subdivision.N Each subdivision that is identified for this IP address is added with a one-up number appended to the attribute name, starting with 0 X.geo.subdivision.isocode.N The ISO code for the subdivision that is identified by X.geo.subdivision.N X.geo.country The country identified for this IP address X.geo.country.isocode The ISO Code for the country identified X.geo.postalcode The postal code for the country identified
--- title: GeoEnrichIPRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/geoenrichiprecord.md section: Loading & Unloading Data --- # GeoEnrichIPRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-enrich-nar ## Description Looks up geolocation information for an IP address and adds the geo information to FlowFile attributes. The geo data is provided as a MaxMind database. This version uses the NiFi Record API to allow large scale enrichment of record-oriented data sets. Each field provided by the MaxMind database can be directed to a field of the user's choosing by providing a record path for that field configuration. ## Tags enrich, geo, ip, maxmind, record ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description City Record Path Record path for putting the city identified for the IP address Country ISO Code Record Path Record path for putting the ISO Code for the country identified Country Postal Code Record Path Record path for putting the postal code for the country identified Country Record Path Record path for putting the country identified for this IP address IP Address Record Path The record path to retrieve the IP address for doing the lookup. Latitude Record Path Record path for putting the latitude identified for this IP address Log Level The Log Level to use when an IP is not found in the database. Accepted values: INFO, DEBUG, WARN, ERROR. Longitude Record Path Record path for putting the longitude identified for this IP address MaxMind Database File Path to Maxmind IP Enrichment Database File Record Reader Record reader service to use for reading the flowfile contents. Record Writer Record writer service to use for enriching the flowfile contents. Separate Enriched From Not Enriched Separate records that have been enriched from ones that have not. Default behavior is to send everything to the found relationship if even one record is enriched.
## Relationships
Name Description found Where to route flow files after successfully enriching attributes with data provided by database not found Where to route flow files after unsuccessfully enriching attributes because no data was found original The original input flowfile goes to this relationship regardless of whether the content was enriched or not.
--- title: GetAmazonAdsReport 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getamazonadsreport.md section: Loading & Unloading Data --- # GetAmazonAdsReport 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-amazon-ads-processors-nar ## Description Processor downloading report from Amazon Ads if ready. ## Tags Amazon, Amazon Ads, report ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Access Token Provider Service providing OAuth access token. Amazon Advertising Client ID Client ID of the Amazon Advertising user. Region Environment from which advertising data will be downloaded. Report ID ID of the generated report. Report Profile ID The profile ID associated with an advertising account in a specific marketplace. Web Client Service Provider Service providing client for REST request execution.
## Relationships
Name Description failure Error FlowFiles transferred when receiving error response from Amazon Ads Reporting API or when an error occurred during response processing. retry Response FlowFiles transferred when report prepared by Amazon Ads Reporting API is not yet ready to be downloaded. success Response FlowFiles transferred when receiving COMPLETED response from Amazon Ads Reporting API.
## Writes attributes
Name Description mime.type Mime type of the returned report.
--- title: GetAwsPollyJobStatus 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getawspollyjobstatus.md section: Loading & Unloading Data --- # GetAwsPollyJobStatus 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Retrieves the current status of an AWS Polly job. ## Tags AWS, Amazon, ML, Machine Learning, Polly ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider AWS Task ID Communications Timeout Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. Region SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure The job failed, the original FlowFile will be routed to this relationship. original Upon successful completion, the original FlowFile will be routed to this relationship. running The job is currently still being processed success Job successfully finished. FlowFile will be routed to this relation.
## Writes attributes
Name Description PollyS3OutputBucket The bucket name where polly output will be located. filename Object key of polly output. outputLocation S3 path-style output location of the result.
## See also - [org.apache.nifi.processors.aws.ml.polly.StartAwsPollyJob](/user-guide/data-integration/openflow/processors/startawspollyjob) --- title: GetAwsTextractJobStatus 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getawstextractjobstatus.md section: Loading & Unloading Data --- # GetAwsTextractJobStatus 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Retrieves the current status of an AWS Textract job. ## Tags AWS, Amazon, ML, Machine Learning, Textract ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider AWS Task ID Communications Timeout Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. Region SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections Textract Type Supported values: "Document Analysis", "Document Text Detection", "Expense Analysis" proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure The job failed, the original FlowFile will be routed to this relationship. original Upon successful completion, the original FlowFile will be routed to this relationship. running The job is currently still being processed success Job successfully finished. FlowFile will be routed to this relation. throttled Retrieving results failed for some reason, but the issue is likely to resolve on its own, such as Provisioned Throughput Exceeded or a Throttling failure. It is generally expected to retry this relationship.
## See also - [org.apache.nifi.processors.aws.ml.textract.StartAwsTextractJob](/user-guide/data-integration/openflow/processors/startawstextractjob) --- title: GetAwsTranscribeJobStatus 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getawstranscribejobstatus.md section: Loading & Unloading Data --- # GetAwsTranscribeJobStatus 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Retrieves the current status of an AWS Transcribe job. ## Tags AWS, Amazon, ML, Machine Learning, Transcribe ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider AWS Task ID Communications Timeout Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. Region SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure The job failed, the original FlowFile will be routed to this relationship. original Upon successful completion, the original FlowFile will be routed to this relationship. running The job is currently still being processed success Job successfully finished. FlowFile will be routed to this relation. throttled Retrieving results failed for some reason, but the issue is likely to resolve on its own, such as Provisioned Throughput Exceeded or a Throttling failure. It is generally expected to retry this relationship.
## Writes attributes
Name Description outputLocation S3 path-style output location of the result.
## See also - [org.apache.nifi.processors.aws.ml.transcribe.StartAwsTranscribeJob](/user-guide/data-integration/openflow/processors/startawstranscribejob) --- title: GetAwsTranslateJobStatus 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getawstranslatejobstatus.md section: Loading & Unloading Data --- # GetAwsTranslateJobStatus 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Retrieves the current status of an AWS Translate job. ## Tags AWS, Amazon, ML, Machine Learning, Translate ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider AWS Task ID Communications Timeout Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. Region SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure The job failed, the original FlowFile will be routed to this relationship. original Upon successful completion, the original FlowFile will be routed to this relationship. running The job is currently still being processed success Job successfully finished. FlowFile will be routed to this relation. throttled Retrieving results failed for some reason, but the issue is likely to resolve on its own, such as Provisioned Throughput Exceeded or a Throttling failure. It is generally expected to retry this relationship.
## Writes attributes
Name Description outputLocation S3 path-style output location of the result.
## See also - [org.apache.nifi.processors.aws.ml.translate.StartAwsTranslateJob](/user-guide/data-integration/openflow/processors/startawstranslatejob) --- title: GetAzureEventHub 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getazureeventhub.md section: Loading & Unloading Data --- # GetAzureEventHub 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Receives messages from Microsoft Azure Event Hubs without reliable checkpoint tracking. In clustered environment, GetAzureEventHub processor instances work independently and all cluster nodes process all messages (unless running the processor in Primary Only mode). ConsumeAzureEventHub offers the recommended approach to receiving messages from Azure Event Hubs. This processor creates a thread pool for connections to Azure Event Hubs. ## Tags azure, cloud, eventhub, events, microsoft, streaming, streams ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Consumer Group The name of the consumer group to use when pulling events Event Hub Name Name of Azure Event Hubs source Event Hub Namespace Namespace of Azure Event Hubs prefixed to Service Bus Endpoint domain Message Enqueue Time A timestamp (ISO-8601 Instant) formatted as YYYY-MM-DDThhmmss.sssZ (2016-01-01T01:01:01.000Z) from which messages should have been enqueued in the Event Hub to start reading from Partition Receiver Fetch Size The number of events that a receiver should fetch from an Event Hubs partition before returning. The default is 100 Partition Receiver Timeout The amount of time in milliseconds a Partition Receiver should wait to receive the Fetch Size before returning. The default is 60000 Service Bus Endpoint To support namespaces not in the default windows.net domain. Shared Access Policy Key The key of the shared access policy. Either the primary or the secondary key can be used. Shared Access Policy Name The name of the shared access policy. This policy must have Listen claims. Transport Type Advanced Message Queuing Protocol Transport Type for communication with Azure Event Hubs Use Azure Managed Identity Choose whether or not to use the managed identity of Azure VM/VMSS proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description success Any FlowFile that is successfully received from the event hub will be transferred to this Relationship.
## Writes attributes
Name Description eventhub.enqueued.timestamp The time (in milliseconds since epoch, UTC) at which the message was enqueued in the event hub eventhub.offset The offset into the partition at which the message was stored eventhub.sequence The Azure sequence number associated with the message eventhub.name The name of the event hub from which the message was pulled eventhub.partition The name of the event hub partition from which the message was pulled eventhub.property.* The application properties of this message. IE: 'application' would be 'eventhub.property.application'
## See also - [org.apache.nifi.processors.azure.eventhub.ConsumeAzureEventHub](/user-guide/data-integration/openflow/processors/consumeazureeventhub) --- title: GetAzureQueueStorage_v12 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getazurequeuestorage_v12.md section: Loading & Unloading Data --- # GetAzureQueueStorage_v12 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Retrieves the messages from an Azure Queue Storage. The retrieved messages will be deleted from the queue by default. If the requirement is to consume messages without deleting them, set 'Auto Delete Messages' to 'false'. Note: There might be chances of receiving duplicates in situations like when a message is received but was unable to be deleted from the queue due to some unexpected situations. ## Tags azure, cloud, dequeue, microsoft, queue, storage ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Auto Delete Messages Specifies whether the received message is to be automatically deleted from the queue. Credentials Service Controller Service used to obtain Azure Storage Credentials. Endpoint Suffix Storage accounts in public Azure always use a common FQDN suffix. Override this endpoint suffix with a different suffix in certain circumstances (like Azure Stack or non-public Azure regions). Message Batch Size The number of messages to be retrieved from the queue. Queue Name Name of the Azure Storage Queue Request Timeout The timeout for read or write requests to Azure Queue Storage. Defaults to 1 second. Visibility Timeout The duration during which the retrieved message should be invisible to other consumers. proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
## Relationships
Name Description success All successfully processed FlowFiles are routed to this relationship
## Writes attributes
Name Description azure.queue.uri The absolute URI of the configured Azure Queue Storage azure.queue.insertionTime The time when the message was inserted into the queue storage azure.queue.expirationTime The time when the message will expire from the queue storage azure.queue.messageId The ID of the retrieved message azure.queue.popReceipt The pop receipt of the retrieved message
## See also - [org.apache.nifi.processors.azure.storage.queue.PutAzureQueueStorage_v12](/user-guide/data-integration/openflow/processors/putazurequeuestorage_v12) --- title: GetBoxFileCollaborators 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getboxfilecollaborators.md section: Loading & Unloading Data --- # GetBoxFileCollaborators 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Retrieves all collaborators on a Box file and adds the collaboration information to the FlowFile's attributes. ## Tags box, collaboration, permissions, sharing, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Box Client Service Controller Service used to obtain a Box API connection. File ID The ID of the Box file to retrieve collaborators for Roles A comma-separated list of collaboration roles to retrieve. Available roles: editor, viewer, previewer, uploader, previewer uploader, viewer uploader, co-owner, owner. If not specified, no filtering by role will be applied. Statuses A comma-separated list of collaboration statuses to retrieve. Available statuses: accepted, pending, rejected. If not specified, no filtering by status will be applied.
## Relationships
Name Description failure FlowFiles that encounter errors during processing will be routed to this relationship not.found FlowFiles for which the specified Box file was not found success FlowFiles that have been successfully processed will be routed to this relationship
## Writes attributes
Name Description box.id The id of the file box.collaborations.<status>.users.ids Comma-separated list of user collaborator IDs by status box.collaborations.<status>.groups.ids Comma-separated list of group collaborator IDs by status box.collaborations.<status>.users.emails Comma-separated list of user collaborator emails by status box.collaborations.<status>.groups.emails Comma-separated list of group collaborator emails by status box.collaborations.<status>.<role>.users.ids Comma-separated list of user collaborator IDs by status and role. Only present when both Roles and Statuses properties are set. box.collaborations.<status>.<role>.users.logins Comma-separated list of user collaborator logins by status and role. Only present when both Roles and Statuses properties are set. box.collaborations.<status>.<role>.groups.ids Comma-separated list of group collaborator IDs by status and role. Only present when both Roles and Statuses properties are set. box.collaborations.<status>.<role>.groups.emails Comma-separated list of group collaborator emails by status and role. Only present when both Roles and Statuses properties are set. box.collaborations.count Total number of collaborations on the file error.code The error code returned by Box error.message The error message returned by Box
## See also - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) --- title: GetBoxGroupMembers 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getboxgroupmembers.md section: Loading & Unloading Data --- # GetBoxGroupMembers 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Retrieves members for a Box Group and writes their details in FlowFile attributes. ## Tags box, metadata, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Box Client Service Controller Service used to obtain a Box API connection. Group ID The ID of the Group to retrieve members for
## Relationships
Name Description failure The FlowFile will be routed here when Group memberships retrieval was attempted but failed. not.found The FlowFile will be routed here when the Group was not found. success The FlowFile will be routed here after successfully retrieving Group members.
## Writes attributes
Name Description box.group.user.ids A comma-separated list of user IDs in the group. box.group.user.logins A comma-separated list of user Logins (emails) in the group. error.code An http error code returned by Box. error.message An error message returned by Box.
--- title: GetConfluenceAuditRecords 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getconfluenceauditrecords.md section: Loading & Unloading Data --- # GetConfluenceAuditRecords 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-atlassian-processors-nar ## Description Processor listing Confluence audit records. ## Tags Preview, atlassian, audit log, confluence ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Audit Log Fetch Limit How many audit logs will be fetched from Confluence API in one request Confluence Client Service Controller service for managing connections to Confluence
## State management
Scopes Description CLUSTER Stores last synchronization timestamp.
## Relationships
Name Description failure Failed to fetch Confluence audit records original The input Flow File is routed to the original relationship. retry Retryable failure occurred, e.g. rate limiting success Successfully fetched Confluence audit records
## Writes attributes
Name Description confluence.group.ids List of identifiers of the Confluence groups. confluence.page.names List of the names of the Confluence page. confluence.space.names List of the Confluence spaces. confluence.continue.fetching Indicates whether there are more pages to fetch (true/false).
--- title: GetConfluenceGroupUsers 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getconfluencegroupusers.md section: Loading & Unloading Data --- # GetConfluenceGroupUsers 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-atlassian-processors-nar ## Description Processor that downloads information about users belonging to a given Confluence group ## Tags Preview, atlassian, confluence, groups, users ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Confluence Client Service Controller service for managing connections to Confluence Confluence Group ID Identifier of the Confluence Group
## Relationships
Name Description failure Failed to fetch Confluence group users retry Retryable failure occurred, e.g. rate limiting success Successfully fetched Confluence group users
## Writes attributes
Name Description confluence.group.user.ids Identifiers of the Confluence group users. confluence.group.user.emails Emails of the Confluence group users.
--- title: GetConfluencePageContent 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getconfluencepagecontent.md section: Loading & Unloading Data --- # GetConfluencePageContent 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-atlassian-processors-nar ## Description Processor downloading Confluence pages. ## Tags Preview, atlassian, confluence, content, fetch, page ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Body Format Format in which body of the Confluence Page will be fetched Confluence Client Service Controller service for managing connections to Confluence Confluence Page ID Identifier of the Confluence Page
## Relationships
Name Description failure Failed to fetch Confluence page not found Confluence page not found removed Confluence page was removed retry Retryable failure occurred, e.g. rate limiting success Successfully fetched Confluence page
## Writes attributes
Name Description mime.type text/html confluence.page.version Version of the Confluence page. confluence.page.last.modification.date Last modification date of the Confluence page. confluence.page.change.type Informs about status change for the searched page.
--- title: GetConfluencePageIds 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getconfluencepageids.md section: Loading & Unloading Data --- # GetConfluencePageIds 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-atlassian-processors-nar ## Description Downloads changed Confluence pages since the last sync and emits each as a FlowFile with metadata. ## Tags Preview, atlassian, changes, confluence, fetch, pages ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Confluence Client Service Controller service for managing connections to Confluence Page IDs Comma separated list of page IDs to filter page by; only pages with these IDs are returned Space IDs Comma separated list of space IDs to filter pages by; only pages from these spaces are returned Start Date Start date from which the ingestion should happen (format: yyyy-MM-dd, inclusive)
## State management
Scopes Description CLUSTER Stores pagination state to maintain position between restarts.
## Relationships
Name Description failure Failed to fetch changed Confluence pages original The input Flow File is routed to the original relationship. retry Retryable failure occurred, e.g. rate limiting success Successfully fetched changed Confluence pages
## Writes attributes
Name Description confluence.page.id Unique identifier of the Confluence page. confluence.page.change.type Informs about status change for the searched page. confluence.page.url Confluence page url. confluence.page.title Confluence page title. confluence.page.last.modification.date Last modification date of the Confluence page. confluence.space.id Unique identifier of the Confluence space. confluence.continue.fetching Indicates whether there are more pages to fetch (true/false).
--- title: GetConfluencePagePermissions 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getconfluencepagepermissions.md section: Loading & Unloading Data --- # GetConfluencePagePermissions 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-atlassian-processors-nar ## Description Processor downloading Confluence page permissions. ## Tags Preview, atlassian, confluence, page, permissions ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Confluence Client Service Controller service for managing connections to Confluence Confluence Page ID Identifier of the Confluence Page
## Relationships
Name Description failure Failed to fetch and parse Confluence page permissions. page not found Confluence page not found restrictions changed Confluence page restrictions changed since last fetch retry Retryable failure occurred, e.g. rate limiting success Successfully fetched Confluence page permissions.
## Writes attributes
Name Description confluence.permissions.users IDs of users with permissions to the Confluence page confluence.permissions.emails Emails of users with permissions to the Confluence page confluence.permissions.groups Groups with permissions to the Confluence page
--- title: GetConfluenceSpaceIds 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getconfluencespaceids.md section: Loading & Unloading Data --- # GetConfluenceSpaceIds 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-atlassian-processors-nar ## Description Processor for retrieving Confluence space ids. ## Tags atlassian, confluence, preview, spaces ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Confluence Client Service Controller service for managing connections to Confluence Space Keys Comma-separated list of space keys to filter. If not specified, all spaces will be retrieved.
## Relationships
Name Description retry Retryable failure occurred, e.g. rate limiting success Successfully fetched Confluence spaces
## Writes attributes
Name Description confluence.space.ids List of identifiers of the Confluence spaces.
--- title: GetConfluenceSpacePermissions 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getconfluencespacepermissions.md section: Loading & Unloading Data --- # GetConfluenceSpacePermissions 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-atlassian-processors-nar ## Description Processor downloading Confluence space permissions. ## Tags Preview, atlassian, confluence, permissions, space ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Confluence Client Service Controller service for managing connections to Confluence Confluence Space ID Identifier of the Confluence Space.
## Relationships
Name Description failure Failed to fetch and parse Confluence space permissions. retry Retryable failure occurred, e.g. rate limiting space not found Confluence space not found success Successfully fetched Confluence space permissions.
## Writes attributes
Name Description confluence.permissions.users IDs of users with permissions to the Confluence space confluence.permissions.emails Emails of users with permissions to the Confluence space confluence.permissions.groups Groups with permissions to the Confluence space
--- title: GetDataShareCredentials 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getdatasharecredentials.md section: Loading & Unloading Data --- # GetDataShareCredentials 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-salesforce-processors-nar ## Description Describe the specified data share metadata in Salesforce Data Cloud. ## Tags daas, data cloud, describe, object, preview, salesforce, sfdc ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Calculated Insights Objects Comma separated list of Calculated Insight Object names to describe. Connection Pooling Service The Connection Pooling Service that is used to create the Snowflake volumes holding the credentials. Data Lake Objects Comma separated list of Data Lake Object names to describe. Data Model Objects Comma separated list of Data Model Object names to describe. Data Share Name The name of the Data Share to describe. Salesforce Data Cloud Client Salesforce Data Cloud Client to interact with the APIs
## State management
Scopes Description CLUSTER Provides information about the last time an external volume has been created/updated for credentials.
## Relationships
Name Description comms.failure A FlowFile is routed to this relationship if the data share credentials metadata could not be retrieved but the operation might be retried failure A FlowFile is routed to this relationship if the data share credentials cannot be retrieved or volumes cannot be created success FlowFile containing the data share metadata after successful creation of the volumes will be routed to this relationship
## See also - [com.snowflake.openflow.runtime.processors.salesforce.ListSFDCDataShares](/user-guide/data-integration/openflow/processors/listsfdcdatashares) --- title: GetDataShareTables 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getdatasharetables.md section: Loading & Unloading Data --- # GetDataShareTables 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-salesforce-processors-nar ## Description Describe the specified data share metadata in Salesforce Data Cloud. ## Tags daas, data cloud, describe, object, preview, salesforce, sfdc ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Calculated Insights Objects Comma separated list of Calculated Insight Object names to describe. Data Lake Objects Comma separated list of Data Lake Object names to describe. Data Model Objects Comma separated list of Data Model Object names to describe. Data Share Name The name of the Data Share to describe. Salesforce Data Cloud Client Salesforce Data Cloud Client to interact with the APIs
## Relationships
Name Description comms.failure A FlowFile is routed to this relationship if the data share tables metadata could not be retrieved but the operation might be retried failure A FlowFile is routed to this relationship if the data share tables metadata could not be retrieved success FlowFile containing the data share tables metadata will be routed to this relationship
## See also - [com.snowflake.openflow.runtime.processors.salesforce.ListSFDCDataShares](/user-guide/data-integration/openflow/processors/listsfdcdatashares) --- title: GetDBFSFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getdbfsfile.md section: Loading & Unloading Data --- # GetDBFSFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-databricks-processors-nar ## Description Read a DBFS file. ## Tags databricks, dbfs, openflow ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description DBFS File Path DBFS file path e.g. /directory/file.txt Databricks Client Databricks Client Service.
## Relationships
Name Description failure Databricks failure relationship success Databricks success relationship
## Writes attributes
Name Description error.code The error code for the SQL statement if an error occurred. error.message The error message for the SQL statement if an error occurred.
--- title: GetDynamoDB 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getdynamodb.md section: Loading & Unloading Data --- # GetDynamoDB 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Retrieves a document from DynamoDB based on hash and range key. The key can be string or number. For any get request all the primary keys are required (hash or hash and range based on the table keys).A Json Document ( 'Map') attribute of the DynamoDB item is read into the content of the FlowFile. ## Tags AWS, Amazon, DynamoDB, Fetch, Get ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider Batch items for each request (between 1 and 50) The items to be retrieved in one batch Communications Timeout Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. Hash Key Name The hash key name of the item Hash Key Value The hash key value of the item Hash Key Value Type The hash key value type of the item Json Document attribute The Json document to be retrieved from the dynamodb item ( 's' type in the schema) Range Key Name The range key name of the item Range Key Value Range Key Value Type The range key value type of the item Region SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections Table Name The DynamoDB table name proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure FlowFiles are routed to failure relationship not found FlowFiles are routed to not found relationship if key not found in the table success FlowFiles are routed to success relationship unprocessed FlowFiles are routed to unprocessed relationship when DynamoDB is not able to process all the items in the request. Typical reasons are insufficient table throughput capacity and exceeding the maximum bytes per request. Unprocessed FlowFiles can be retried with a new request.
## Writes attributes
Name Description dynamodb.key.error.unprocessed DynamoDB unprocessed keys dynmodb.range.key.value.error DynamoDB range key error dynamodb.key.error.not.found DynamoDB key not found dynamodb.error.exception.message DynamoDB exception message dynamodb.error.code DynamoDB error code dynamodb.error.message DynamoDB error message dynamodb.error.service DynamoDB error service dynamodb.error.retryable DynamoDB error is retryable dynamodb.error.request.id DynamoDB error request id dynamodb.error.status.code DynamoDB status code
## See also - [org.apache.nifi.processors.aws.dynamodb.DeleteDynamoDB](/user-guide/data-integration/openflow/processors/deletedynamodb) - [org.apache.nifi.processors.aws.dynamodb.PutDynamoDB](/user-guide/data-integration/openflow/processors/putdynamodb) - [org.apache.nifi.processors.aws.dynamodb.PutDynamoDBRecord](/user-guide/data-integration/openflow/processors/putdynamodbrecord) --- title: GetElasticsearch 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getelasticsearch.md section: Loading & Unloading Data --- # GetElasticsearch 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-elasticsearch-restapi-nar ## Description Elasticsearch get processor that uses the official Elastic REST client libraries to fetch a single document from Elasticsearch by _id. Note that the full body of the document will be read into memory before being written to a FlowFile for transfer. ## Tags elasticsearch, elasticsearch7, elasticsearch8, elasticsearch9, index, json, put, record ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Attribute Name The name of the FlowFile attribute to use for the retrieved document output. Client Service An Elasticsearch client service to use for running queries. Destination Indicates whether the retrieved document is written to the FlowFile content or a FlowFile attribute. Document Id The _id of the document to retrieve. Index The name of the index to use. Type The type of this document (used by Elasticsearch for indexing and searching).
## Relationships
Name Description document Fetched documents are routed to this relationship. failure All flowfiles that fail for reasons unrelated to server availability go to this relationship. not_found A FlowFile is routed to this relationship if the specified document does not exist in the Elasticsearch cluster. retry All flowfiles that fail due to server/cluster availability go to this relationship.
## Writes attributes
Name Description filename The filename attribute is set to the document identifier elasticsearch.index The Elasticsearch index containing the document elasticsearch.type The Elasticsearch document type elasticsearch.get.error The error message provided by Elasticsearch if there is an error fetching the document.
## See also - [org.apache.nifi.processors.elasticsearch.JsonQueryElasticsearch](/user-guide/data-integration/openflow/processors/jsonqueryelasticsearch) --- title: GetFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getfile.md section: Loading & Unloading Data --- # GetFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Creates FlowFiles from files in a directory. NiFi will ignore files it doesn't have at least read permissions for. ## Tags files, filesystem, get, ingest, ingress, input, local, source ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Batch Size The maximum number of files to pull in each invocation of the processor File Filter Only files whose names match the given regular expression will be picked up Ignore Hidden Files Indicates whether or not hidden files should be ignored Input Directory The input directory from which to pull files Keep Source File If true, the file is not deleted after it has been copied to the Content Repository; this causes the file to be picked up continually and is useful for testing purposes. If not keeping original NiFi will need write permissions on the directory it is pulling from otherwise it will ignore the file. Maximum File Age The maximum age that a file must be in order to be pulled; any file older than this amount of time (according to last modification date) will be ignored Maximum File Size The maximum size that a file can be in order to be pulled Minimum File Age The minimum age that a file must be in order to be pulled; any file younger than this amount of time (according to last modification date) will be ignored Minimum File Size The minimum size that a file must be in order to be pulled Path Filter When Recurse Subdirectories is true, then only subdirectories whose path matches the given regular expression will be scanned Polling Interval Indicates how long to wait before performing a directory listing Recurse Subdirectories Indicates whether or not to pull files from subdirectories
## Restrictions
Required Permission Explanation read filesystem Provides operator the ability to read from any file that NiFi has access to. write filesystem Provides operator the ability to delete any file that NiFi has access to.
## Relationships
Name Description success All files are routed to success
## Writes attributes
Name Description filename The filename is set to the name of the file on disk path The path is set to the relative path of the file's directory on disk. For example, if the <Input Directory> property is set to /tmp, files picked up from /tmp will have the path attribute set to ./. If the <Recurse Subdirectories> property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to abc/1/2/3 file.creationTime The date and time that the file was created. May not work on all file systems file.lastModifiedTime The date and time that the file was last modified. May not work on all file systems file.lastAccessTime The date and time that the file was last accessed. May not work on all file systems file.owner The owner of the file. May not work on all file systems file.group The group owner of the file. May not work on all file systems file.permissions The read/write/execute permissions of the file. May not work on all file systems absolute.path The full/absolute path from where a file was picked up. The current 'path' attribute is still populated, but may be a relative path
## See also - [org.apache.nifi.processors.standard.FetchFile](/user-guide/data-integration/openflow/processors/fetchfile) - [org.apache.nifi.processors.standard.PutFile](/user-guide/data-integration/openflow/processors/putfile) --- title: GetFileResource 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getfileresource.md section: Loading & Unloading Data --- # GetFileResource 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description This processor creates FlowFiles with the content of the configured File Resource. GetFileResource is useful for load testing, configuration, and simulation. ## Tags file, generate, load, test ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description File Resource Location of the File Resource (Local File or URL). This file will be used as content of the generated FlowFiles. MIME Type Specifies the value to set for the [mime.type] attribute.
## Restrictions
Required Permission Explanation read filesystem Provides operator the ability to read from any file that NiFi has access to. reference remote resources File Resource can reference resources over HTTP/HTTPS
## Relationships
Name Description success
## Writes attributes
Name Description mime.type Sets the MIME type of the output if the 'MIME Type' property is set Dynamic property key Value for the corresponding dynamic property, if any is set
--- title: GetFTP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getftp.md section: Loading & Unloading Data --- # GetFTP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Fetches files from an FTP Server and creates FlowFiles from them ## Tags FTP, fetch, files, get, ingest, input, remote, retrieve, source ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Connection Mode The FTP Connection Mode Connection Timeout Amount of time to wait before timing out while creating a connection Data Timeout When transferring a file between the local and remote system, this value specifies how long is allowed to elapse without any data being transferred between systems Delete Original Determines whether or not the file is deleted from the remote system after it has been successfully transferred File Filter Regex Provides a Java Regular Expression for filtering Filenames; if a filter is supplied, only files whose names match that Regular Expression will be fetched Follow Symbolic Links If true, will pull even symbolic files and also nested symbolic subdirectories; otherwise, will not read symbolic files and will not traverse symbolic link subdirectories Hostname The fully qualified hostname or IP address of the remote system Ignore Dotted Files If true, files whose names begin with a dot (".") will be ignored Internal Buffer Size Set the internal buffer size for buffered data streams Max Selects The maximum number of files to pull in a single connection Password Password for the user account Path Filter Regex When Search Recursively is true, then only subdirectories whose path matches the given Regular Expression will be scanned Polling Interval Determines how long to wait between fetching the listing for new files Port The port that the remote system is listening on for file transfers Remote Path The path on the remote system from which to pull or push files Remote Poll Batch Size The value specifies how many file paths to find in a given directory on the remote system when doing a file listing. This value in general should not need to be modified but when polling against a remote system with a tremendous number of files this value can be critical. Setting this value too high can result very poor performance and setting it too low can cause the flow to be slower than normal. Search Recursively If true, will pull files from arbitrarily nested subdirectories; otherwise, will not traverse subdirectories Transfer Mode The FTP Transfer Mode Use Natural Ordering If true, will pull files in the order in which they are naturally listed; otherwise, the order in which the files will be pulled is not defined Username Username ftp-use-utf8 Tells the client to use UTF-8 encoding when processing files and filenames. If set to true, the server must also support UTF-8 encoding. proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description success All FlowFiles that are received are routed to success
## Writes attributes
Name Description filename The filename is set to the name of the file on the remote server path The path is set to the path of the file's directory on the remote server. For example, if the <Remote Path> property is set to /tmp, files picked up from /tmp will have the path attribute set to /tmp. If the <Search Recursively> property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to /tmp/abc/1/2/3 file.lastModifiedTime The date and time that the source file was last modified file.lastAccessTime The date and time that the file was last accessed. May not work on all file systems file.owner The numeric owner id of the source file file.group The numeric group id of the source file file.permissions The read/write/execute permissions of the source file absolute.path The full/absolute path from where a file was picked up. The current 'path' attribute is still populated, but may be a relative path
## See also - [org.apache.nifi.processors.standard.PutFTP](/user-guide/data-integration/openflow/processors/putftp) --- title: GetGcpVisionAnnotateFilesOperationStatus 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getgcpvisionannotatefilesoperationstatus.md section: Loading & Unloading Data --- # GetGcpVisionAnnotateFilesOperationStatus 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Retrieves the current status of an Google Vision operation. ## Tags Cloud, Google, Machine Learning, Vision ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description gcp-credentials-provider-service The Controller Service used to obtain Google Cloud Platform credentials. operationKey The unique identifier of the Vision operation.
## Relationships
Name Description failure FlowFiles are routed to failure relationship original Upon successful completion, the original FlowFile will be routed to this relationship. running The job is currently still being processed success FlowFiles are routed to success relationship
## See also - [org.apache.nifi.processors.gcp.vision.StartGcpVisionAnnotateFilesOperation](/user-guide/data-integration/openflow/processors/startgcpvisionannotatefilesoperation) --- title: GetGcpVisionAnnotateImagesOperationStatus 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getgcpvisionannotateimagesoperationstatus.md section: Loading & Unloading Data --- # GetGcpVisionAnnotateImagesOperationStatus 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Retrieves the current status of an Google Vision operation. ## Tags Cloud, Google, Machine Learning, Vision ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description gcp-credentials-provider-service The Controller Service used to obtain Google Cloud Platform credentials. operationKey The unique identifier of the Vision operation.
## Relationships
Name Description failure FlowFiles are routed to failure relationship original Upon successful completion, the original FlowFile will be routed to this relationship. running The job is currently still being processed success FlowFiles are routed to success relationship
## See also - [org.apache.nifi.processors.gcp.vision.StartGcpVisionAnnotateImagesOperation](/user-guide/data-integration/openflow/processors/startgcpvisionannotateimagesoperation) --- title: GetGoogleAdsReport 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getgoogleadsreport.md section: Loading & Unloading Data --- # GetGoogleAdsReport 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-google-ads-nar ## Description A processor which can interact with Google Ads Reporting API. By default it fetch data once a day ## Tags Google, Google Ads, report ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Client Account ID ID of the Google Ads account for which the report should be fetched GCP Credentials Service Controller Service used to obtain Google Cloud Platform credentials. Google Ads Resource Name Name of the resource that should be used in 'FROM' clause of the query Google Developer Token Developer token required to access Google APIs Report Attributes List of comma-separated report attributes Report Metrics List of comma-separated report metrics Report Segments List of comma-separated report segments Report Start Date Start date from which the ingestion should happen.
## State management
Scopes Description CLUSTER Stores information about last report definition in form of hash to detect schema changes. In incremental ingestion (when the 'segments.date' segment is selected) it keeps track of latest ingested date to download only new data chunks. Additionally start date is saved.
## Relationships
Name Description failure Error FlowFiles transferred when receiving error response from Google Ads Reporting API or when an error occurred during response processing. success Response FlowFiles transferred when receiving success response from Google Ads Reporting API.
## Writes attributes
Name Description google.ads.client.account.id ID of the account in Google Ads for which given report should be ingested google.ads.resource.name Name of the resource in Google Ads that is a source for the report google.ads.query Query used to fetch data from Google Ads StreamSearch API google.ads.attributes Attributes of the selected resource google.ads.metrics Metrics collected in the context of a given resource google.ads.segments Buckets in which metrics should be grouped google.ads.ingestion.strategy The strategy used for ingestion. Can be 'SNAPSHOT' or 'INCREMENTAL' google.ads.start.date Date from which data is downloaded from Google Ads (including given date) google.ads.end.date Date to which data is downloaded from Google Ads (including given date) google.ads.report.schema.changed Flag meaning if the report schema has changed between processor executions google.ads.report.conversion.window Number of days which are fetched from Google Ads during incremental load. Based on Conversion Window values fragment.identifier A unique ID of each ingestion run. Allows to identify all flow files generated during a single run. fragment.index Number representing unique identifier in batch of flowfiles generated during one ingestion run fragment.count Amount of flowfiles generated during processor execution avro.schema Avro schema representing fetched data mime.type Mime type of the returned report.
--- title: GetGoogleGroupMembers 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getgooglegroupmembers.md section: Loading & Unloading Data --- # GetGoogleGroupMembers 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-google-drive-nar ## Description Retrieves the members of one or more Google Groups, specified as a comma-separated list of group IDs that is given as a FlowFile attribute. Supports both immediate (top-level) and nested group member retrieval. Outputs four FlowFile attributes: 'google.group.member.user.ids', 'google.group.member.user.emails', 'google.group.member.group.ids', and 'google.group.member.group.emails'. When nested fetching is enabled, it recursively expands sub-groups up to the specified depth. If an attribute already exists on the FlowFile, the new values are concatenated to the existing value (separated by a comma). ## Tags cloud, directory, gcp, google, groups, membership ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Fetch Nested Groups When enabled, recursively fetches members from nested groups within the specified groups. When disabled, only top-level members are retrieved. GCP Credentials Service Specifies the Controller Service used to obtain Google Cloud Platform credentials. Google Group IDs Specifies the comma-separated list of Google Group IDs (email addresses for the groups). Supports Expression Language. Nested Depth Limit Maximum depth to traverse when fetching nested group members.
## Relationships
Name Description failure A FlowFile is routed here if the processor fails to retrieve Google group members. not.found A FlowFile is routed here if for each Google group that was not found. retry A FlowFile is routed here if the processor should retry the request (e.g., after rate limiting). success A FlowFile is routed here after successfully retrieving Google group members.
## Writes attributes
Name Description google.group.ids A comma-separated list of Google Group IDs that were found. google.group.member.user.ids A comma-separated list of user IDs found in the specified groups. When nested fetching is enabled, includes users from nested groups up to the specified depth. google.group.member.user.emails A comma-separated list of user email addresses found in the specified groups. When nested fetching is enabled, includes users from nested groups up to the specified depth. google.group.member.group.ids A comma-separated list of nested group IDs found in the specified groups. When nested fetching is enabled, includes all groups discovered during recursive traversal. google.group.member.group.emails A comma-separated list of nested group email addresses found in the specified groups. When nested fetching is enabled, includes all groups discovered during recursive traversal.
## See also - [com.snowflake.openflow.runtime.processors.google.CaptureGoogleDriveChanges](/user-guide/data-integration/openflow/processors/capturegoogledrivechanges) --- title: GetGoogleSheets 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getgooglesheets.md section: Loading & Unloading Data --- # GetGoogleSheets 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-google-sheets-processors-nar ## Description Processor responsible for fetching data from Google Sheets. By default it fetches data once a day. ## Tags Google, Google Sheets, spreadsheet ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Date Time Render Option Determines how dates should be rendered in the output. GCP Credentials Service Controller Service used to obtain Google Cloud Platform credentials. Ranges The A1 notation or R1C1 notation of the comma-separated ranges to retrieve values from. For example: Sheet1!A1:B2,Sheet2!D4:E5,Sheet3. The first row in a sheet must represent column names. If not specified, all sheets will be downloaded. Spreadsheet ID ID of the Google Sheets Spreadsheet. Can be found in the URL of the spreadsheet. Value Render Option Determines how values should be rendered in the output.
## Relationships
Name Description failure FlowFile with errors occurred while fetching from Google Sheets. success FlowFile containing a JSON array where each object represents a row from the source sheet. Keys correspond to column headers from the first row, and values to the respective row entries.
## Writes attributes
Name Description google.sheets.spreadsheet.id ID of the Google Sheets Spreadsheet. google.sheets.range Range in Google Sheets Spreadsheet that was fetched. run.id A unique ID of each ingestion run. Allows to identify all flow files generated during a single run. destination.table.schema A Snowflake schema of the destination table in the following format: \{ "columns": [ \{ "name": "<column name>", "type": "<column type>", "nullable": <true/false>, "precision": <precision, only for numeric type>, "scale": <scale, only for numeric type> \}, ... ], "primaryKeys": ["<name of first primary key column>", "<name of second primary key column>", ...] \}
--- title: GetHubSpot 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/gethubspot.md section: Loading & Unloading Data --- # GetHubSpot 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-hubspot-nar ## Description Retrieves JSON data from a private HubSpot application. This processor is intended to be run on the Primary Node only. ## Tags hubspot ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description access-token Access Token to authenticate requests incremental-delay The ending timestamp of the time window will be adjusted earlier by the amount configured in this property. For example, with a property value of 10 seconds, an ending timestamp of 12:30:45 would be changed to 12:30:35. Set this property to avoid missing objects when the clock of your local machines and HubSpot servers 'clock are not in sync and to protect against HubSpot's mechanism that changes last updated timestamps after object creation. incremental-initial-start-time This property specifies the start time that the processor applies when running the first request. The expected format is a UTC date-time such as '2011-12-03T10:15:30Z' is-incremental The processor can incrementally load the queried objects so that each object is queried exactly once. For each query, the processor queries objects within a time window where the objects were modified between the previous run time and the current time (optionally adjusted by the Incremental Delay property). object-type The HubSpot Object Type requested result-limit The maximum number of results to request for each invocation of the Processor web-client-service-provider Controller service for HTTP client operations
## State management
Scopes Description CLUSTER In case of incremental loading, the start and end timestamps of the last query time window are stored in the state. When the 'Result Limit' property is set, the paging cursor is saved after executing a request. Only the objects after the paging cursor will be retrieved. The maximum number of retrieved objects can be set in the 'Result Limit' property.
## Relationships
Name Description success For FlowFiles created as a result of a successful HTTP request.
## Writes attributes
Name Description mime.type Sets the MIME type to application/json
--- title: GetHubSpotObject 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/gethubspotobject.md section: Loading & Unloading Data --- # GetHubSpotObject 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-hubspot-processors-nar ## Description Get a HubSpot object and its associations by ID or unique value. ## Tags Preview, hubspot ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description HubSpot Service HubSpot Client Service. Object ID Property HubSpot property used to uniquely identify the object. Object ID Value Matching HubSpot property value to search for. Object Type HubSpot object type
## Relationships
Name Description failure HubSpot fail relationship missing HubSpot object does not exist. retry HubSpot retry relationship. FlowFiles that failed to process due to a server timeout or rate limit related error. FlowFiles routed here should be routed back into the processor. success HubSpot success relationship
## See also - [com.snowflake.openflow.runtime.processors.hubspot.GetHubSpotSchema](/user-guide/data-integration/openflow/processors/gethubspotschema) - [com.snowflake.openflow.runtime.processors.hubspot.ListArchivedHubSpotData](/user-guide/data-integration/openflow/processors/listarchivedhubspotdata) - [com.snowflake.openflow.runtime.processors.hubspot.ListHubSpotObjects](/user-guide/data-integration/openflow/processors/listhubspotobjects) - [com.snowflake.openflow.runtime.processors.hubspot.PutHubSpot](/user-guide/data-integration/openflow/processors/puthubspot) --- title: GetHubSpotSchema 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/gethubspotschema.md section: Loading & Unloading Data --- # GetHubSpotSchema 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-hubspot-processors-nar ## Description Retrieves schema information for HubSpot object types including field names, types, and labels. Outputs detailed field metadata as JSON for schema discovery and mapping purposes. ## Tags Preview, crm, hubspot, metadata, schema ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description HubSpot Service HubSpot Client Service. Object Type HubSpot object type
## Relationships
Name Description failure HubSpot fail relationship retry HubSpot retry relationship. FlowFiles that failed to process due to a server timeout or rate limit related error. FlowFiles routed here should be routed back into the processor. success HubSpot success relationship
## Writes attributes
Name Description hubspot.object.type The HubSpot object type hubspot.field.count Number of fields retrieved mime.type MIME type of the output (application/json)
## See also - [com.snowflake.openflow.runtime.processors.hubspot.GetHubSpotObject](/user-guide/data-integration/openflow/processors/gethubspotobject) - [com.snowflake.openflow.runtime.processors.hubspot.ListArchivedHubSpotData](/user-guide/data-integration/openflow/processors/listarchivedhubspotdata) - [com.snowflake.openflow.runtime.processors.hubspot.ListHubSpotObjects](/user-guide/data-integration/openflow/processors/listhubspotobjects) - [com.snowflake.openflow.runtime.processors.hubspot.PutHubSpot](/user-guide/data-integration/openflow/processors/puthubspot) --- title: GetLinkedInAdsReport 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getlinkedinadsreport.md section: Loading & Unloading Data --- # GetLinkedInAdsReport 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-linkedin-ads-processors-nar ## Description Processor downloading metrics from the LinkedIn Reporting APIs. ## Tags LinkedIn, LinkedIn Ads, ads, report ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Accounts List of comma-separated accounts. Campaign Groups List of comma-separated campaign groups. Campaigns List of comma-separated campaigns. Companies List of comma-separated companies. Conversion Window Timeframe for which data is refreshed during incremental load. Metrics List of comma-separated metrics. OAuth Token Provider Service providing OAuth access token. Pivots List of comma-separated pivots. Report Name Unique name of the report. Shares List of comma-separated shares. Start Date Start date from which ingestion should begin. It must be in the yyyy-MM-dd format. Time Granularity Time granularity of results. Web Client Service Provider Service providing client for REST request execution.
## State management
Scopes Description CLUSTER Stores information about last report definition in form of hash to detect schema changes. Incrementally loaded reports persist last ingestion date to define ingestion date ranges after initial load. Additionally start date is saved.
## Relationships
Name Description success Response FlowFiles transferred when successfully processed a response from the LinkedIn Ads Reporting API.
## Writes attributes
Name Description linkedin.ads.report.name Unique name of the report. linkedin.ads.run.id Unique identifier of the run. avro.schema Avro schema that contains a set of all configured metrics and pivots. linkedin.ads.ingestion.strategy Strategy that defines whether the report will be downloaded as SNAPSHOT or INCREMENTAL. linkedin.ads.report.schema.changed Flag that indicates whether the report schema has changed between processor executions. linkedin.ads.ingestion.start.date Date from which data is downloaded from LinkedIn Ads (including a given date). linkedin.ads.ingestion.end.date Date to which data is downloaded from LinkedIn Ads (including a given date).
--- title: GetMicrosoft365GroupMembers 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getmicrosoft365groupmembers.md section: Loading & Unloading Data --- # GetMicrosoft365GroupMembers 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-msgraph-nar ## Description Retrieves Microsoft365 group members and emits a FlowFile for each change that occurs. This includes membership changes. ## Tags cdc, document, graph, library, microsoft, sharepoint, unstructured ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Authentication Service The service that provides authentication for the SharePoint API Fallback Retry Duration The time to wait before retrying the operation after a communication failure. This value is used when the response doesn't contain a Retry-After header. Microsoft365 Group id Specifies a Microsoft365 group id to retrieve the members for. Supports Expression Language.
## Relationships
Name Description comms.failure A FlowFile is routed here if the processor failed to communicate with the Graph API. Can be retried failure An incoming FlowFile is routed to this relationship if the group members could not be fetched not.found A FlowFile is routed here if the group was not found success A FlowFile is routed here if the group members were successfully retrieved
## Writes attributes
Name Description microsoft365.group.user.ids A comma-separated list of Microsoft365 user ids that are members of the Microsoft365 group. microsoft365.group.user.emails A comma-separated list of user emails that are members of the Microsoft365 group.
--- title: GetMongo 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getmongo.md section: Loading & Unloading Data --- # GetMongo 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-mongodb-nar ## Description Creates FlowFiles from documents in MongoDB loaded by a user-specified query. ## Tags get, mongodb, read ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Batch Size The number of elements to be returned from the server in one batch Limit The maximum number of elements to return Mongo Collection Name The name of the collection to use Mongo Database Name The name of the database to use Projection The fields to be returned from the documents in the result set; must be a valid BSON document Query The selection criteria to do the lookup. If the field is left blank, it will look for input from an incoming connection from another processor to provide the query as a valid JSON document inside of the FlowFile's body. If this field is left blank and a timer is enabled instead of an incoming connection, that will result in a full collection fetch using a "\{\}" query. Sort The fields by which to sort; must be a valid BSON document get-mongo-send-empty If a query executes successfully, but returns no results, send an empty JSON document signifying no result. json-type By default, MongoDB's Java driver returns "extended JSON". Some of the features of this variant of JSON may cause problems for other JSON parsers that expect only standard JSON types and conventions. This configuration setting controls whether to use extended JSON or provide a clean view that conforms to standard JSON. mongo-charset Specifies the character set of the document data. mongo-client-service If configured, this property will use the assigned client service for connection pooling. mongo-date-format The date format string to use for formatting Date fields that are returned from Mongo. It is only applied when the JSON output format is set to Standard JSON. mongo-query-attribute If set, the query will be written to a specified attribute on the output flowfiles. results-per-flowfile How many results to put into a FlowFile at once. The whole body will be treated as a JSON array of results. use-pretty-printing Choose whether or not to pretty print the JSON from the results of the query. Choosing 'True' can greatly increase the space requirements on disk depending on the complexity of the JSON document
## Relationships
Name Description failure All input FlowFiles that are part of a failed query execution go here. original All input FlowFiles that are part of a successful query execution go here. success All FlowFiles that have the results of a successful query execution go here.
## Writes attributes
Name Description mongo.database.name The database where the results came from. mongo.collection.name The collection where the results came from.
--- title: GetMongoRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getmongorecord.md section: Loading & Unloading Data --- # GetMongoRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-mongodb-nar ## Description A record-based version of GetMongo that uses the Record writers to write the MongoDB result set. ## Tags fetch, get, json, mongo, mongodb, record ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Batch Size The number of elements to be returned from the server in one batch Limit The maximum number of elements to return Mongo Collection Name The name of the collection to use Mongo Database Name The name of the database to use Projection The fields to be returned from the documents in the result set; must be a valid BSON document Query The selection criteria to do the lookup. If the field is left blank, it will look for input from an incoming connection from another processor to provide the query as a valid JSON document inside of the FlowFile's body. If this field is left blank and a timer is enabled instead of an incoming connection, that will result in a full collection fetch using a "\{\}" query. Sort The fields by which to sort; must be a valid BSON document get-mongo-record-writer-factory The record writer to use to write the result sets. mongo-client-service If configured, this property will use the assigned client service for connection pooling. mongo-query-attribute If set, the query will be written to a specified attribute on the output flowfiles. mongodb-schema-name The name of the schema in the configured schema registry to use for the query results.
## Relationships
Name Description failure All input FlowFiles that are part of a failed query execution go here. original All input FlowFiles that are part of a successful query execution go here. success All FlowFiles that have the results of a successful query execution go here.
## Writes attributes
Name Description mongo.database.name The database where the results came from. mongo.collection.name The collection where the results came from.
--- title: GetQueryJobResult 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getqueryjobresult.md section: Loading & Unloading Data --- # GetQueryJobResult 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-salesforce-processors-nar ## Description Gets the results of a Query Job in Salesforce using the Bulk API 2.0. The output is CSV and GZIP compression is used. ## Tags bulk, job, preview, query, salesforce ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Job ID The ID of the job for which the status is checked. Salesforce Client Salesforce Client to interact with the APIs
## Relationships
Name Description comms.failure A FlowFile is routed to this relationship if the Query Job result could not be retrieved but the operation might be retried failure A FlowFile is routed to this relationship if the Query Job Results could not be retrieved success If Query Job Results have been successfully retrieved, the FlowFile is routed to this relationship
## See also - [com.snowflake.openflow.runtime.processors.salesforce.AbortQueryJob](/user-guide/data-integration/openflow/processors/abortqueryjob) - [com.snowflake.openflow.runtime.processors.salesforce.DeleteQueryJob](/user-guide/data-integration/openflow/processors/deletequeryjob) - [com.snowflake.openflow.runtime.processors.salesforce.GetQueryJobStatus](/user-guide/data-integration/openflow/processors/getqueryjobstatus) - [com.snowflake.openflow.runtime.processors.salesforce.SubmitQueryJob](/user-guide/data-integration/openflow/processors/submitqueryjob) --- title: GetQueryJobStatus 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getqueryjobstatus.md section: Loading & Unloading Data --- # GetQueryJobStatus 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-salesforce-processors-nar ## Description Gets the status of a Query Job in Salesforce using the Bulk API 2.0. ## Tags bulk, job, preview, query, salesforce, status ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Job ID The ID of the job for which the status is checked. Salesforce Client Salesforce Client to interact with the APIs
## Relationships
Name Description comms.failure A FlowFile is routed to this relationship if the Query Job status could not be retrieved but the operation might be retried failure A FlowFile is routed to this relationship if the Query Job status could not be retrieved job.aborted If the Query Job has been aborted, the FlowFile is routed to this relationship job.completed If the Query Job completed, the FlowFile is routed to this relationship job.failed If the Query Job failed, the FlowFile is routed to this relationship wait If the Query Job is in the processing queue or in progress, the FlowFile is routed to this relationship
## Writes attributes
Name Description jobState The current state of processing for the job. systemModstamp The UTC date and time when the API last updated the job information. numberRecordsProcessed The number of records processed in this job. retries The number of times that Salesforce attempted to save the results of an operation. Repeated attempts indicate a problem such as a lock contention. totalProcessingTime The number of milliseconds taken to process the job. isPkChunkingSupported Whether PK chunking is supported for the queried object (true), or isn't supported (false).
## See also - [com.snowflake.openflow.runtime.processors.salesforce.AbortQueryJob](/user-guide/data-integration/openflow/processors/abortqueryjob) - [com.snowflake.openflow.runtime.processors.salesforce.DeleteQueryJob](/user-guide/data-integration/openflow/processors/deletequeryjob) - [com.snowflake.openflow.runtime.processors.salesforce.GetQueryJobResult](/user-guide/data-integration/openflow/processors/getqueryjobresult) - [com.snowflake.openflow.runtime.processors.salesforce.SubmitQueryJob](/user-guide/data-integration/openflow/processors/submitqueryjob) --- title: GetS3ObjectMetadata 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/gets3objectmetadata.md section: Loading & Unloading Data --- # GetS3ObjectMetadata 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Check for the existence of an Object in S3 and fetch its Metadata without attempting to download it. This processor can be used as a router for workflows that need to check on an Object in S3 before proceeding with data processing ## Tags AWS, Amazon, Archive, Exists, S3 ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider Bucket The S3 Bucket to interact with Communications Timeout The amount of time to wait in order to establish a connection to AWS or receive data from AWS before timing out. Custom Signer Class Name Fully qualified class name of the custom signer class. The signer must implement com.amazonaws.auth. Signer interface. Custom Signer Module Location Comma-separated list of paths to files and/or directories which contain the custom signer's JAR file and its dependencies (if any). Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. FullControl User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Full Control for an object Metadata Attribute Include Pattern A regular expression pattern to use for determining which object metadata entries are included as FlowFile attributes. This pattern is only applied to the 'found' relationship and will not be used to filter the error attributes in the 'failure' relationship. Metadata Target This determines where the metadata will be written when found. Object Key The S3 Object Key to use. This is analogous to a filename for traditional file systems. Owner The Amazon ID to use for the object's owner Read ACL User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have permissions to read the Access Control List for an object Read Permission User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Read Access for an object Region The AWS Region to connect to. SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections Signer Override The AWS S3 library uses Signature Version 4 by default but this property allows you to specify the Version 2 signer to support older S3-compatible services or even to plug in your own custom signer implementation. Version The Version of the Object for which to retrieve Metadata proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure If the Processor is unable to process a given FlowFile, it will be routed to this Relationship. found An object was found in the bucket at the supplied key not found No object was found in the bucket the supplied key
## See also - [org.apache.nifi.processors.aws.s3.DeleteS3Object](/user-guide/data-integration/openflow/processors/deletes3object) - [org.apache.nifi.processors.aws.s3.FetchS3Object](/user-guide/data-integration/openflow/processors/fetchs3object) - [org.apache.nifi.processors.aws.s3.GetS3ObjectTags](/user-guide/data-integration/openflow/processors/gets3objecttags) - [org.apache.nifi.processors.aws.s3.ListS3](/user-guide/data-integration/openflow/processors/lists3) - [org.apache.nifi.processors.aws.s3.PutS3Object](/user-guide/data-integration/openflow/processors/puts3object) - [org.apache.nifi.processors.aws.s3.TagS3Object](/user-guide/data-integration/openflow/processors/tags3object) --- title: GetS3ObjectTags 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/gets3objecttags.md section: Loading & Unloading Data --- # GetS3ObjectTags 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Check for the existence of an Object in S3 and fetch its Tags without attempting to download it. This processor can be used as a router for workflows that need to check on an Object in S3 before proceeding with data processing ## Tags AWS, Amazon, Archive, Exists, S3 ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider Bucket The S3 Bucket to interact with Communications Timeout The amount of time to wait in order to establish a connection to AWS or receive data from AWS before timing out. Custom Signer Class Name Fully qualified class name of the custom signer class. The signer must implement com.amazonaws.auth. Signer interface. Custom Signer Module Location Comma-separated list of paths to files and/or directories which contain the custom signer's JAR file and its dependencies (if any). Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. FullControl User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Full Control for an object Object Key The S3 Object Key to use. This is analogous to a filename for traditional file systems. Owner The Amazon ID to use for the object's owner Read ACL User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have permissions to read the Access Control List for an object Read Permission User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Read Access for an object Region The AWS Region to connect to. SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections Signer Override The AWS S3 library uses Signature Version 4 by default but this property allows you to specify the Version 2 signer to support older S3-compatible services or even to plug in your own custom signer implementation. Tag Attribute Include Pattern A regular expression pattern to use for determining which object tags are included as FlowFile attributes. This pattern is only applied to the 'found' relationship and will not be used to filter the error attributes in the 'failure' relationship. Tags Target This determines where the tags will be written when found. Version The Version of the Object for which to retrieve Tags proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description failure If the Processor is unable to process a given FlowFile, it will be routed to this Relationship. found An object was found in the bucket at the supplied key not found No object was found in the bucket the supplied key
## See also - [org.apache.nifi.processors.aws.s3.DeleteS3Object](/user-guide/data-integration/openflow/processors/deletes3object) - [org.apache.nifi.processors.aws.s3.FetchS3Object](/user-guide/data-integration/openflow/processors/fetchs3object) - [org.apache.nifi.processors.aws.s3.GetS3ObjectMetadata](/user-guide/data-integration/openflow/processors/gets3objectmetadata) - [org.apache.nifi.processors.aws.s3.ListS3](/user-guide/data-integration/openflow/processors/lists3) - [org.apache.nifi.processors.aws.s3.PutS3Object](/user-guide/data-integration/openflow/processors/puts3object) - [org.apache.nifi.processors.aws.s3.TagS3Object](/user-guide/data-integration/openflow/processors/tags3object) --- title: GetSFTP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getsftp.md section: Loading & Unloading Data --- # GetSFTP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Fetches files from an SFTP Server and creates FlowFiles from them ## Tags fetch, files, get, ingest, input, remote, retrieve, sftp, source ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Algorithm Negotiation Configuration strategy for SSH algorithm negotiation Ciphers Allowed A comma-separated list of Ciphers allowed for SFTP connections. Leave unset to allow all. Available options are: 3des-cbc, aes128-cbc, aes128-ctr, [aes128-gcm@openssh.com](mailto:aes128-gcm@openssh.com), aes192-cbc, aes192-ctr, aes256-cbc, aes256-ctr, [aes256-gcm@openssh.com](mailto:aes256-gcm@openssh.com), arcfour128, arcfour256, blowfish-cbc, [chacha20-poly1305@openssh.com](mailto:chacha20-poly1305@openssh.com), none Connection Timeout Amount of time to wait before timing out while creating a connection Data Timeout When transferring a file between the local and remote system, this value specifies how long is allowed to elapse without any data being transferred between systems Delete Original Determines whether or not the file is deleted from the remote system after it has been successfully transferred File Filter Regex Provides a Java Regular Expression for filtering Filenames; if a filter is supplied, only files whose names match that Regular Expression will be fetched Follow Symbolic Links If true, will pull even symbolic files and also nested symbolic subdirectories; otherwise, will not read symbolic files and will not traverse symbolic link subdirectories Host Key File If supplied, the given file will be used as the Host Key; otherwise, if 'Strict Host Key Checking' property is applied (set to true) then uses the 'known_hosts' and 'known_hosts2' files from ~/.ssh directory else no host key file will be used Hostname The fully qualified hostname or IP address of the remote system Ignore Dotted Files If true, files whose names begin with a dot (".") will be ignored Key Algorithms Allowed A comma-separated list of Key Algorithms allowed for SFTP connections. Leave unset to allow all. Available options are: ecdsa-sha2-nistp256, [ecdsa-sha2-nistp256-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp256-cert-v01@openssh.com), ecdsa-sha2-nistp384, [ecdsa-sha2-nistp384-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp384-cert-v01@openssh.com), ecdsa-sha2-nistp521, [ecdsa-sha2-nistp521-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp521-cert-v01@openssh.com), rsa-sha2-256, [rsa-sha2-256-cert-v01@openssh.com](mailto:rsa-sha2-256-cert-v01@openssh.com), rsa-sha2-512, [rsa-sha2-512-cert-v01@openssh.com](mailto:rsa-sha2-512-cert-v01@openssh.com), [sk-ecdsa-sha2-nistp256@openssh.com](mailto:sk-ecdsa-sha2-nistp256@openssh.com), [sk-ssh-ed25519@openssh.com](mailto:sk-ssh-ed25519@openssh.com), ssh-dss, [ssh-dss-cert-v01@openssh.com](mailto:ssh-dss-cert-v01@openssh.com), ssh-ed25519, [ssh-ed25519-cert-v01@openssh.com](mailto:ssh-ed25519-cert-v01@openssh.com), ssh-rsa, [ssh-rsa-cert-v01@openssh.com](mailto:ssh-rsa-cert-v01@openssh.com) Key Exchange Algorithms Allowed A comma-separated list of Key Exchange Algorithms allowed for SFTP connections. Leave unset to allow all. Available options are: curve25519-sha256, [curve25519-sha256@libssh.org](mailto:curve25519-sha256@libssh.org), curve448-sha512, diffie-hellman-group-exchange-sha1, diffie-hellman-group-exchange-sha256, diffie-hellman-group1-sha1, diffie-hellman-group14-sha1, diffie-hellman-group14-sha256, diffie-hellman-group15-sha512, diffie-hellman-group16-sha512, diffie-hellman-group17-sha512, diffie-hellman-group18-sha512, ecdh-sha2-nistp256, ecdh-sha2-nistp384, ecdh-sha2-nistp521, mlkem1024nistp384-sha384, mlkem768nistp256-sha256, mlkem768x25519-sha256, sntrup761x25519-sha512, [sntrup761x25519-sha512@openssh.com](mailto:sntrup761x25519-sha512@openssh.com) Max Selects The maximum number of files to pull in a single connection Message Authentication Codes Allowed A comma-separated list of Message Authentication Codes allowed for SFTP connections. Leave unset to allow all. Available options are: hmac-md5, hmac-md5-96, hmac-sha1, hmac-sha1-96, [hmac-sha1-etm@openssh.com](mailto:hmac-sha1-etm@openssh.com), hmac-sha2-256, [hmac-sha2-256-etm@openssh.com](mailto:hmac-sha2-256-etm@openssh.com), hmac-sha2-512, [hmac-sha2-512-etm@openssh.com](mailto:hmac-sha2-512-etm@openssh.com) Password Password for the user account Path Filter Regex When Search Recursively is true, then only subdirectories whose path matches the given Regular Expression will be scanned Polling Interval Determines how long to wait between fetching the listing for new files Port The port that the remote system is listening on for file transfers Private Key Passphrase Password for the private key Private Key Path The fully qualified path to the Private Key file Remote Path The path on the remote system from which to pull or push files Remote Poll Batch Size The value specifies how many file paths to find in a given directory on the remote system when doing a file listing. This value in general should not need to be modified but when polling against a remote system with a tremendous number of files this value can be critical. Setting this value too high can result very poor performance and setting it too low can cause the flow to be slower than normal. Search Recursively If true, will pull files from arbitrarily nested subdirectories; otherwise, will not traverse subdirectories Send Keep Alive On Timeout Send a Keep Alive message every 5 seconds up to 5 times for an overall timeout of 25 seconds. Strict Host Key Checking Indicates whether or not strict enforcement of hosts keys should be applied Use Compression Indicates whether or not ZLIB compression should be used when transferring files Use Natural Ordering If true, will pull files in the order in which they are naturally listed; otherwise, the order in which the files will be pulled is not defined Username Username proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description success All FlowFiles that are received are routed to success
## Writes attributes
Name Description filename The filename is set to the name of the file on the remote server path The path is set to the path of the file's directory on the remote server. For example, if the <Remote Path> property is set to /tmp, files picked up from /tmp will have the path attribute set to /tmp. If the <Search Recursively> property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to /tmp/abc/1/2/3 file.lastModifiedTime The date and time that the source file was last modified file.owner The numeric owner id of the source file file.group The numeric group id of the source file file.permissions The read/write/execute permissions of the source file absolute.path The full/absolute path from where a file was picked up. The current 'path' attribute is still populated, but may be a relative path
## See also - [org.apache.nifi.processors.standard.PutSFTP](/user-guide/data-integration/openflow/processors/putsftp) --- title: GetSharepointSiteGroupMembers 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getsharepointsitegroupmembers.md section: Loading & Unloading Data --- # GetSharepointSiteGroupMembers 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-sharepoint-rest-nar ## Description Retrieves all members of a SharePoint site group. ## Tags groups, membership, microsoft, openflow, sharepoint ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Group ID The ID of the SharePoint group. OAuth2 Access Token Provider Enables managed retrieval of OAuth2 Bearer Token. Site URL The URL of the SharePoint site. Web Client Service The Web Client Service to use for communicating with Sharepoint.
## Relationships
Name Description comms.failure A FlowFile is routed here if the processor failed to communicate with Sharepoint. Can be retried failure A FlowFile is routed here if the group members could not be fetched success A FlowFile is routed here if the group members were successfully retrieved
## Writes attributes
Name Description sharepoint.group.user.ids The IDs of the users in the SharePoint site group. sharepoint.group.user.emails The emails of the users in the SharePoint site group.
## See also - [com.snowflake.openflow.runtime.processors.sharepoint.rest.ListSharepointSiteGroups](/user-guide/data-integration/openflow/processors/listsharepointsitegroups) --- title: GetShopify 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getshopify.md section: Loading & Unloading Data --- # GetShopify 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-shopify-nar ## Description Retrieves objects from a custom Shopify store. The processor yield time must be set to the account's rate limit accordingly. ## Tags shopify ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description CUSTOMERS Customer resource to query DISCOUNTS Discount resource to query INVENTORY Inventory resource to query ONLINE_STORE Online Store resource to query ORDERS Order resource to query PRODUCT Product resource to query SALES_CHANNELS Sales Channel resource to query STORE_PROPERTIES Store Property resource to query access-token Access Token to authenticate requests api-version The Shopify REST API version incremental-delay The ending timestamp of the time window will be adjusted earlier by the amount configured in this property. For example, with a property value of 10 seconds, an ending timestamp of 12:30:45 would be changed to 12:30:35. Set this property to avoid missing objects when the clock of your local machines and Shopify servers' clock are not in sync. incremental-initial-start-time This property specifies the start time when running the first request. Represents an ISO 8601-encoded date and time string. For example, 3:50 pm on September 7, 2019 in the time zone of UTC (Coordinated Universal Time) is represented as "2019-09-07T15:50:00Z". is-incremental The processor can incrementally load the queried objects so that each object is queried exactly once. For each query, the processor queries objects which were created or modified after the previous run time but before the current time. object-category Shopify object category result-limit The maximum number of results to request for each invocation of the Processor store-domain The domain of the Shopify store, e.g. nifistore.myshopify.com web-client-service-provider Controller service for HTTP client operations
## State management
Scopes Description CLUSTER For a few resources the processor supports incremental loading. The list of the resources with the supported parameters can be found in the additional details.
## Relationships
Name Description success For FlowFiles created as a result of a successful query.
## Writes attributes
Name Description mime.type Sets the MIME type to application/json
--- title: GetSmbFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getsmbfile.md section: Loading & Unloading Data --- # GetSmbFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-smb-nar ## Description Reads file from a samba network location to FlowFiles. Use this processor instead of a cifs mounts if share access control is important. Configure the Hostname, Share and Directory accordingly: \[Hostname][Share][pathtoDirectory] ## Tags samba, smb, cifs, files, get ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Batch Size The maximum number of files to pull in each iteration Directory The network folder to which files should be written. This is the remaining relative path after the share: \hostnameshare[dir1dir2]. Domain The domain used for authentication. Optional, in most cases username and password is sufficient. File Filter Only files whose names match the given regular expression will be picked up Hostname The network host to which files should be written. Ignore Hidden Files Indicates whether or not hidden files should be ignored Keep Source File If true, the file is not deleted after it has been copied to the Content Repository; this causes the file to be picked up continually and is useful for testing purposes. If not keeping original NiFi will need write permissions on the directory it is pulling from otherwise it will ignore the file. Password The password used for authentication. Required if Username is set. Path Filter When Recurse Subdirectories is true, then only subdirectories whose path matches the given regular expression will be scanned Polling Interval Indicates how long to wait before performing a directory listing Recurse Subdirectories Indicates whether or not to pull files from subdirectories Share The network share to which files should be written. This is the "first folder"after the hostname: \hostname[share]dir1dir2 Share Access Strategy Indicates which shared access are granted on the file during the read. None is the most restrictive, but the safest setting to prevent corruption. Username The username used for authentication. If no username is set then anonymous authentication is attempted. enable-dfs Enables accessing Distributed File System (DFS) and following DFS links during SMB operations. smb-dialect The SMB dialect is negotiated between the client and the server by default to the highest common version supported by both end. In some rare cases, the client-server communication may fail with the automatically negotiated dialect. This property can be used to set the dialect explicitly (e.g. to downgrade to a lower version), when those situations would occur. timeout Timeout for read and write operations. use-encryption Turns on/off encrypted communication between the client and the server. The property's behavior is SMB dialect dependent: SMB 2.x does not support encryption and the property has no effect. In case of SMB 3.x, it is a hint/request to the server to turn encryption on if the server also supports it.
## Relationships
Name Description success All files are routed to success
## Writes attributes
Name Description filename The filename is set to the name of the file on the network share path The path is set to the relative path of the file's network share name. For example, if the input is set to \hostnamesharetmp, files picked up from tmp will have the path attribute set to tmp file.creationTime The date and time that the file was created. May not work on all file systems file.lastModifiedTime The date and time that the file was last modified. May not work on all file systems file.lastAccessTime The date and time that the file was last accessed. May not work on all file systems absolute.path The full path from where a file was picked up. This includes the hostname and the share name
## See also - [org.apache.nifi.processors.smb.FetchSmb](/user-guide/data-integration/openflow/processors/fetchsmb) - [org.apache.nifi.processors.smb.ListSmb](/user-guide/data-integration/openflow/processors/listsmb) - [org.apache.nifi.processors.smb.PutSmbFile](/user-guide/data-integration/openflow/processors/putsmbfile) --- title: GetSplunk 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getsplunk.md section: Loading & Unloading Data --- # GetSplunk 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-splunk-nar ## Description Retrieves data from Splunk Enterprise. ## Tags get, logs, splunk ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description API Version Select which version of the Splunk Search API to use for search operations. Version 2 is recommended for newer Splunk instances. Application The Splunk Application to query. Connection Timeout Max wait time for connection to the Splunk server. Earliest Time The value to use for the earliest time when querying. Only used with a Time Range Strategy of Provided. See Splunk's documentation on Search Time Modifiers for guidance in populating this field. Hostname The ip address or hostname of the Splunk server. Latest Time The value to use for the latest time when querying. Only used with a Time Range Strategy of Provided. See Splunk's documentation on Search Time Modifiers for guidance in populating this field. Output Mode The output mode for the results. Owner The owner to pass to Splunk. Password The password to authenticate to Splunk. Port The port of the Splunk server. Query The query to execute. Typically beginning with a <search> command followed by a search clause, such as <search source="[tcp:7689](tcp:7689)"> to search for messages received on TCP port 7689. Read Timeout Max wait time for response from the Splunk server. SSL Context Service The SSL Context Service used to provide client certificate information for TLS/SSL connections. Scheme The scheme for connecting to Splunk. Security Protocol The security protocol to use for communicating with Splunk. Time Field Strategy Indicates whether to search by the time attached to the event, or by the time the event was indexed in Splunk. Time Range Strategy Indicates how to apply time ranges to each execution of the query. Selecting a managed option allows the processor to apply a time range from the last execution time to the current execution time. When using <Managed from Beginning>, an earliest time will not be applied on the first execution, and thus all records searched. When using <Managed from Current> the earliest time of the first execution will be the initial execution time. When using <Provided>, the time range will come from the Earliest Time and Latest Time properties, or no time range will be applied if these properties are left blank. Time Zone The Time Zone to use for formatting dates when performing a search. Only used with Managed time strategies. Token The token to pass to Splunk. Username The username to authenticate to Splunk.
## State management
Scopes Description CLUSTER If using one of the managed Time Range Strategies, this processor will store the values of the latest and earliest times from the previous execution so that the next execution of the can pick up where the last execution left off. The state will be cleared and start over if the query is changed.
## Relationships
Name Description success Results retrieved from Splunk are sent out this relationship.
## Writes attributes
Name Description splunk.query The query that performed to produce the FlowFile. splunk.earliest.time The value of the earliest time that was used when performing the query. splunk.latest.time The value of the latest time that was used when performing the query.
--- title: GetSQS 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getsqs.md section: Loading & Unloading Data --- # GetSQS 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Fetches messages from an Amazon Simple Queuing Service Queue ## Tags AWS, Amazon, Fetch, Get, Poll, Queue, SQS ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider Auto Delete Messages Specifies whether the messages should be automatically deleted by the processors once they have been received. Batch Size The maximum number of messages to send in a single network request Character Set The Character Set that should be used to encode the textual content of the SQS message Communications Timeout Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. Queue URL The URL of the queue to get messages from Receive Message Wait Time The maximum amount of time to wait on a long polling receive call. Setting this to a value of 1 second or greater will reduce the number of SQS requests and decrease fetch latency at the cost of a constantly active thread. Region SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections Visibility Timeout The amount of time after a message is received but not deleted that the message is hidden from other consumers proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
## Relationships
Name Description success FlowFiles are routed to success relationship
## Writes attributes
Name Description hash.value The MD5 sum of the message hash.algorithm MD5 sqs.message.id The unique identifier of the SQS message sqs.receipt.handle The SQS Receipt Handle that is to be used to delete the message from the queue
## See also - [org.apache.nifi.processors.aws.sqs.DeleteSQS](/user-guide/data-integration/openflow/processors/deletesqs) - [org.apache.nifi.processors.aws.sqs.PutSQS](/user-guide/data-integration/openflow/processors/putsqs) --- title: GetUnityCatalogFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getunitycatalogfile.md section: Loading & Unloading Data --- # GetUnityCatalogFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-databricks-processors-nar ## Description Read a Unity Catalog file up to 5 GiB. ## Tags databricks, openflow, unity catalog ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Databricks Client Databricks Client Service. Unity Catalog File Path Unity Catalog file path e.g. /Volumes/catalog/schema/volume_name/file.txt
## Relationships
Name Description failure Databricks failure relationship success Databricks success relationship
## Writes attributes
Name Description error.code The error code for the SQL statement if an error occurred. error.message The error message for the SQL statement if an error occurred.
--- title: GetUnityCatalogFileMetadata 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getunitycatalogfilemetadata.md section: Loading & Unloading Data --- # GetUnityCatalogFileMetadata 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-databricks-processors-nar ## Description Checks for Unity Catalog file metadata. ## Tags databricks, openflow, unity catalog ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Databricks Client Databricks Client Service. Unity Catalog File Path Unity Catalog file path e.g. /Volumes/catalog/schema/volume_name/file.txt
## Relationships
Name Description failure Databricks failure relationship not.found The original FlowFile is transferred to this relationship if no Unity Catalog can be found at the specified path success Databricks success relationship
## Writes attributes
Name Description mime.type The content type of the checked file. uc.size The size of the Unity Catalog file. uc.lastModifiedTime The last modified time of the Unity Catalog file in milliseconds since epoch in UTC time. error.code The error code for the SQL statement if an error occurred. error.message The error message for the SQL statement if an error occurred.
--- title: GetWorkdayReport 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getworkdayreport.md section: Loading & Unloading Data --- # GetWorkdayReport 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-workday-processors-nar ## Description A processor which can interact with a configurable Workday Report. The processor can forward the content without modification, or you can transform it by providing the specific Record Reader and Record Writer services based on your needs. You can also remove fields by defining schema in the Record Writer. Supported Workday report formats are: csv, simplexml, json ## Tags Workday, report ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Access Token Provider Enables managed retrieval of OAuth2 Bearer Token. Authorization Type The type of authorization for retrieving data from Workday resources. Web Client Service Provider Web client which is used to communicate with the Workday API. Workday Password The password provided for authentication of Workday requests. Encoded using Base64 for HTTP Basic Authentication as described in RFC 7617. Workday Report URL HTTP remote URL of Workday report including a scheme of http or https, as well as a hostname or IP address with optional port and path elements. Workday Username The username provided for authentication of Workday requests. Encoded using Base64 for HTTP Basic Authentication as described in RFC 7617. record-reader Specifies the Controller Service to use for parsing incoming data and determining the data's schema. record-writer The Record Writer to use for serializing Records to an output FlowFile.
## Relationships
Name Description failure Request FlowFiles transferred when receiving socket communication errors. original Request FlowFiles transferred when receiving HTTP responses with a status code between 200 and 299. success Response FlowFiles transferred when receiving HTTP responses with a status code between 200 and 299.
## Writes attributes
Name Description getworkdayreport.java.exception.class The Java exception class raised when the processor fails getworkdayreport.java.exception.message The Java exception message raised when the processor fails mime.type Sets the mime.type attribute to the MIME Type specified by the Source / Record Writer record.count The number of records in an outgoing FlowFile. This is only populated on the 'success' relationship when Record Reader and Writer is set.
--- title: GetZendesk 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/getzendesk.md section: Loading & Unloading Data --- # GetZendesk 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-zendesk-nar ## Description Incrementally fetches data from Zendesk API. ## Tags zendesk ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description web-client-service-provider Controller service for HTTP client operations. zendesk-authentication-type-name Type of authentication to Zendesk API. zendesk-authentication-value-name Password or authentication token for Zendesk login user. zendesk-export-method Method for incremental export. zendesk-query-start-timestamp Initial timestamp to query Zendesk API from in Unix timestamp seconds format. zendesk-resource The particular Zendesk resource which is meant to be exported. zendesk-subdomain Name of the Zendesk subdomain. zendesk-user Login user to Zendesk subdomain.
## State management
Scopes Description CLUSTER Paging cursor for Zendesk API is stored. Cursor is updated after each successful request.
## Relationships
Name Description success For FlowFiles created as a result of a successful HTTP request.
## Writes attributes
Name Description record.count The number of records fetched by the processor.
--- title: GrokReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/grokreader.md section: Loading & Unloading Data --- # GrokReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a mechanism for reading unstructured text data, such as log files, and structuring the data so that it can be processed. The service is configured using Grok patterns. The service reads from a stream of data and splits each message that it finds into a separate Record, each containing the fields that are configured. If a line in the input does not match the expected message pattern, the line of text is either considered to be part of the previous message or is skipped, depending on the configuration, with the exception of stack traces. A stack trace that is found at the end of a log message is considered to be part of the previous message but is added to the 'stackTrace' field of the Record. If a record has no stack trace, it will have a NULL value for the stackTrace field (assuming that the schema does in fact include a stackTrace field of type String). Assuming that the schema includes a '_raw' field of type String, the raw message will be included in the Record. ## Tags grok, logfiles, logs, logstash, parse, pattern, reader, record, regex, text, unstructured ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Grok Expressions * Grok Expression Specifies the format of a log line in Grok format. This allows the Record Reader to understand how to parse each log line. The property supports one or more Grok expressions. The Reader attempts to parse input lines according to the configured order of the expressions.If a line in the log file does not match any expressions, the line will be assumed to belong to the previous log message.If other Grok patterns are referenced by this expression, they need to be supplied in the Grok Pattern File property. Grok Patterns Grok Pattern File Grok Patterns to use for parsing logs. If not specified, a built-in default Pattern file will be used. If specified, all patterns specified will override the default patterns. See the Controller Service's Additional Details for a list of pre-defined patterns. Schema Access Strategy * Schema Access Strategy string-fields-from-grok-expression - Use String Fields From Grok Expression - Use 'Schema Name' Property - Use 'Schema Text' Property - Schema Reference Reader Specifies how to obtain the schema that is to be used for interpreting the data. Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored. Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved. No Match Behavior * no-match-behavior append-to-previous-message - Append to Previous Message - Skip Line - Raw Line If a line of text is encountered and it does not match the given Grok Expression, and it is not part of a stack trace, this property specifies how the text should be processed.
## State management This component does not store state. ## Restricted ## Restrictions
Required Permission Explanation reference remote resources Patterns and Expressions can reference resources over HTTP
## System Resource Considerations This component does not specify system resource considerations. --- title: Guidelines for using Python extensions in Openflow source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors-python-ext-guide.md section: Loading & Unloading Data --- # Guidelines for using Python extensions in Openflow This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index) This topic describes the limitations, supported configurations, and best practices when using Python extensions in Openflow. Python processors in Openflow use NiFi's Py4J bridge architecture, which has fundamentally different resource characteristics than native Java processors. Because Python processors run as external OS processes outside the JVM, they consume additional system memory, are not governed by NiFi's internal resource management, and have limited observability. These differences affect runtime sizing, capacity planning, and monitoring. ## Architecture differences Python processors run as external OS processes rather than within the JVM. This architecture affects how resources are allocated, monitored, and managed:
Processor type Java processor Python processor Runtime environment JVM internal threads External OS process Memory management Managed within JVM heap Separate process memory Lifecycle NiFi-controlled External process lifecycle Monitoring Full NiFi observability Limited visibility
## Runtime size constraints Python extensions are only available on Medium and Large runtimes. Small runtimes do not support Python processors due to CPU and memory constraints. Snowflake Openflow blocks Python extensions on Small runtimes:
Runtime size Python support Notes Small Not supported Python processors are blocked on Small runtimes due to CPU and memory constraints. Medium Limited (up to 2 Python processors) The limit is for the entire runtime, not per connector or process group. This limit is currently a recommendation that will be an enforced maximum value for Openflow runtimes in the future. Large Limited (up to 4 Python processors) The limit is for the entire runtime, not per connector or process group. This limit is currently a recommendation that will be an enforced maximum value for Openflow runtimes in the future.
## Best practices Follow these guidelines for working with Python processors in Openflow: - Use Java for CPU-heavy operations. Java provides more efficient thread management within the JVM. Groovy scripting is a Java-based alternative. - Use Medium or Large runtimes. Python is not available on Small runtimes. - Limit the number of Python processors. Stay within the documented limits per runtime size. - Monitor resource usage. Watch for memory pressure and CPU contention. - Plan for upgrades. Custom Python processors might require a virtual environment (venv) reset after runtime upgrades. For more information, see [Restore Python processors following runtime upgrades](#label-openflow-python-ext-restore). - Use single-threaded Python processors. Openflow does not support Python processors spawning subprocesses or using multithreading. ## Limitations on using Python processors The following limitations apply when using Python processors in Openflow.
Runtime constraints
Python extensions can only be used with Medium or Large runtimes. Python extensions cannot be used with Small runtimes. This is disabled by the platform.
Memory overhead
Each Python processor spawns an external OS process with its own memory footprint. Python processes can collectively compete with the JVM for resources.
No NiFi resource management
Python processors are not observed or limited by NiFi's internal resource management. CPU-heavy Python operations can consume approximately 50% of total server CPU time.
Monitoring gaps
The platform lacks visibility into external Python process health and resource consumption.
Upgrade handling
After runtime upgrades, custom Python processors might fail to load or exhibit unexpected behavior until virtual environments are recreated.
## Restore Python processors following runtime upgrades If Python processors fail after upgrading the runtime, do the following: 1. Increment the processor version in the `ProcessorDetails.version` field. 2. Rebuild and re-upload the NiFi Archive (NAR) binary. This triggers the Python virtual environment cache to reset. 3. Remove and re-add the processor on the canvas. This triggers reinitialization of the Py4J bridge. --- title: HandleHttpRequest 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/handlehttprequest.md section: Loading & Unloading Data --- # HandleHttpRequest 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Starts an HTTP Server and listens for HTTP Requests. For each request, creates a FlowFile and transfers to 'success'. This Processor is designed to be used in conjunction with the HandleHttpResponse Processor in order to create a Web Service. In case of a multipart request, one FlowFile is generated for each part. ## Tags http, https, ingress, listen, request, web service ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Additional HTTP Methods A comma-separated list of non-standard HTTP Methods that should be allowed Allow DELETE Allow HTTP DELETE Method Allow GET Allow HTTP GET Method Allow HEAD Allow HTTP HEAD Method Allow OPTIONS Allow HTTP OPTIONS Method Allow POST Allow HTTP POST Method Allow PUT Allow HTTP PUT Method Allowed Paths A Regular Expression that specifies the valid HTTP Paths that are allowed in the incoming URL Requests. If this value is specified and the path of the HTTP Requests does not match this Regular Expression, the Processor will respond with a 404: NotFound Client Authentication Specifies whether or not the Processor should authenticate clients. This value is ignored if the <SSL Context Service> Property is not specified or the SSL Context provided uses only a KeyStore and not a TrustStore. Default URL Character Set The character set to use for decoding URL parameters if the HTTP Request does not supply one HTTP Context Map The HTTP Context Map Controller Service to use for caching the HTTP Request Information HTTP Protocols HTTP Protocols supported for Application Layer Protocol Negotiation with TLS Hostname The Hostname to bind to. If not specified, will bind to all hosts Listening Port The Port to listen on for incoming HTTP requests Maximum Threads The maximum number of threads that the embedded HTTP server will use for handling requests. Request Header Maximum Size The maximum supported size of HTTP headers in requests sent to this processor SSL Context Service The SSL Context Service to use in order to secure the server. If specified, the server will accept only HTTPS requests; otherwise, the server will accept only HTTP requests container-queue-size The size of the queue for Http Request Containers multipart-read-buffer-size The threshold size, at which the contents of an incoming file would be written to disk. Only applies for requests with Content-Type: multipart/form-data. It is used to prevent denial of service type of attacks, to prevent filling up the heap or disk space. multipart-request-max-size The max size of the request. Only applies for requests with Content-Type: multipart/form-data, and is used to prevent denial of service type of attacks, to prevent filling up the heap or disk space parameters-to-attributes A comma-separated list of HTTP parameters or form data to output as attributes
## Relationships
Name Description success All content that is received is routed to the 'success' relationship
## Writes attributes
Name Description http.context.identifier An identifier that allows the HandleHttpRequest and HandleHttpResponse to coordinate which FlowFile belongs to which HTTP Request/Response. mime.type The MIME Type of the data, according to the HTTP Header "Content-Type" http.servlet.path The part of the request URL that is considered the Servlet Path http.context.path The part of the request URL that is considered to be the Context Path http.method The HTTP Method that was used for the request, such as GET or POST http.local.name IP address/hostname of the server http.server.port Listening port of the server http.query.string The query string portion of the Request URL http.remote.host The hostname of the requestor http.remote.addr The hostname:port combination of the requestor http.remote.user The username of the requestor http.protocol The protocol used to communicate http.request.uri The full Request URL http.auth.type The type of HTTP Authorization used http.principal.name The name of the authenticated user making the request http.query.param.XXX Each of query parameters in the request will be added as an attribute, prefixed with "http.query.param." http.param.XXX Form parameters in the request that are configured by "Parameters to Attributes List" will be added as an attribute, prefixed with "http.param.". Putting form parameters of large size is not recommended. http.subject.dn The Distinguished Name of the requestor. This value will not be populated unless the Processor is configured to use an SSLContext Service http.issuer.dn The Distinguished Name of the entity that issued the Subject's certificate. This value will not be populated unless the Processor is configured to use an SSLContext Service http.certificate.sans.N.name X.509 Client Certificate Subject Alternative Name value from mutual TLS authentication. The attribute name has a zero-based index ordered according to the content of Client Certificate http.certificate.sans.N.nameType X.509 Client Certificate Subject Alternative Name type from mutual TLS authentication. The attribute name has a zero-based index ordered according to the content of Client Certificate. The attribute value is one of the General Names from RFC 3280 Section 4.1.2.7 http.headers.XXX Each of the HTTP Headers that is received in the request will be added as an attribute, prefixed with "http.headers." For example, if the request contains an HTTP Header named "x-my-header", then the value will be added to an attribute named "http.headers.x-my-header" http.headers.multipart.XXX Each of the HTTP Headers that is received in the multipart request will be added as an attribute, prefixed with "http.headers.multipart." For example, if the multipart request contains an HTTP Header named "content-disposition", then the value will be added to an attribute named "http.headers.multipart.content-disposition" http.multipart.size For requests with Content-Type "multipart/form-data", the part's content size is recorded into this attribute http.multipart.content.type For requests with Content-Type "multipart/form-data", the part's content type is recorded into this attribute http.multipart.name For requests with Content-Type "multipart/form-data", the part's name is recorded into this attribute http.multipart.filename For requests with Content-Type "multipart/form-data", when the part contains an uploaded file, the name of the file is recorded into this attribute. Files are stored temporarily at the default temporary-file directory specified in "java.io.File" Java Docs) http.multipart.fragments.sequence.number For requests with Content-Type "multipart/form-data", the part's index is recorded into this attribute. The index starts with 1. http.multipart.fragments.total.number For requests with Content-Type "multipart/form-data", the count of all parts is recorded into this attribute.
## See also - [org.apache.nifi.processors.standard.HandleHttpResponse](/user-guide/data-integration/openflow/processors/handlehttpresponse) --- title: HandleHttpResponse 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/handlehttpresponse.md section: Loading & Unloading Data --- # HandleHttpResponse 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Sends an HTTP Response to the Requestor that generated a FlowFile. This Processor is designed to be used in conjunction with the HandleHttpRequest in order to create a web service. ## Tags egress, http, https, response, web service ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Attributes to add to the HTTP Response (Regex) Specifies the Regular Expression that determines the names of FlowFile attributes that should be added to the HTTP response HTTP Context Map The HTTP Context Map Controller Service to use for caching the HTTP Request Information HTTP Status Code The HTTP Status Code to use when responding to the HTTP Request. See Section 10 of RFC 2616 for more information.
## Relationships
Name Description failure FlowFiles will be routed to this Relationship if the Processor is unable to respond to the requestor. This may happen, for instance, if the connection times out or if NiFi is restarted before responding to the HTTP Request. success FlowFiles will be routed to this Relationship after the response has been successfully sent to the requestor
## See also - [org.apache.nifi.processors.standard.HandleHttpRequest](/user-guide/data-integration/openflow/processors/handlehttprequest) --- title: HazelcastMapCacheClient source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/hazelcastmapcacheclient.md section: Loading & Unloading Data --- # HazelcastMapCacheClient This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description An implementation of DistributedMapCacheClient that uses Hazelcast as the backing cache. This service relies on an other controller service, manages the actual Hazelcast calls, set in Hazelcast Cache Manager. ## Tags cache, hazelcast, map ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Hazelcast Cache Manager * hazelcast-cache-manager A Hazelcast Cache Manager which manages connections to Hazelcast and provides cache instances. Hazelcast Cache Name * hazelcast-cache-name The name of a given cache. A Hazelcast cluster may handle multiple independent caches, each identified by a name. Clients using caches with the same name are working on the same data structure within Hazelcast. Hazelcast Entry Lifetime * hazelcast-entry-ttl 5 min Indicates how long the written entries should exist in Hazelcast. Setting it to '0 secs' means that the datawill exists until its deletion or until the Hazelcast server is shut down. Using _EmbeddedHazelcastCacheManager_ ascache manager will not provide policies to limit the size of the cache.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: HikariCPConnectionPool source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/hikaricpconnectionpool.md section: Loading & Unloading Data --- # HikariCPConnectionPool This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides Database Connection Pooling Service based on HikariCP. Connections can be asked from pool and returned after usage. ## Tags connection, database, dbcp, hikari, jdbc, pooling, store ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description Database Connection URL * hikaricp-connection-url A database connection URL used to connect to a database. May contain database system name, host, port, database name and some parameters. The exact syntax of a database connection URL is specified by your DBMS. Database Driver Class Name * hikaricp-driver-classname The fully-qualified class name of the JDBC driver. Example: com.mysql.jdbc.Driver Database Driver Location(s) hikaricp-driver-locations Comma-separated list of files/folders and/or URLs containing the driver JAR and its dependencies (if any). For example '/var/tmp/mariadb-java-client-1.1.7.jar' Kerberos User Service hikaricp-kerberos-user-service Specifies the Kerberos User Controller Service that should be used for authenticating with Kerberos Max Connection Lifetime hikaricp-max-conn-lifetime -1 The maximum lifetime of a connection. After this time is exceeded the connection will fail the next activation, passivation or validation test. A value of zero or less means the connection has an infinite lifetime. Max Total Connections * hikaricp-max-total-conns 10 This property controls the maximum size that the pool is allowed to reach, including both idle and in-use connections. Basically this value will determine the maximum number of actual connections to the database backend. A reasonable value for this is best determined by your execution environment. When the pool reaches this size, and no idle connections are available, the service will block for up to connectionTimeout milliseconds before timing out. Max Wait Time * hikaricp-max-wait-time 500 millis The maximum amount of time that the pool will wait (when there are no available connections) for a connection to be returned before failing, or 0 <time units> to wait indefinitely. Minimum Idle Connections * hikaricp-min-idle-conns 10 This property controls the minimum number of idle connections that HikariCP tries to maintain in the pool. If the idle connections dip below this value and total connections in the pool are less than 'Max Total Connections', HikariCP will make a best effort to add additional connections quickly and efficiently. It is recommended that this property to be set equal to 'Max Total Connections'. Password hikaricp-password The password for the database user Database User hikaricp-username Database user name Validation Query hikaricp-validation-query Validation Query used to validate connections before returning them. When connection is invalid, it gets dropped and new valid connection will be returned. NOTE: Using validation might have some performance penalty.
## State management This component does not store state. ## Restricted ## Restrictions
Required Permission Explanation reference remote resources Database Driver Location can reference resources over HTTP
## System Resource Considerations This component does not specify system resource considerations. --- title: HttpRecordSink source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/httprecordsink.md section: Loading & Unloading Data --- # HttpRecordSink This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Format and send Records to a configured uri using HTTP post. The Record Writer formats the records which are sent as the body of the HTTP post request. JsonRecordSetWriter is often used with this processor because many HTTP posts require a JSON body. ## Tags http, post, record, sink ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name API Name Default Value Allowable Values Description API URL * API URL The URL which receives the HTTP requests. Maximum Batch Size * Maximum Batch Size 0 Specifies the maximum number of records to send in the body of each HTTP request. Zero means the batch size is not limited, and all records are sent together in a single HTTP request. OAuth2 Access Token Provider OAuth2 Access Token Provider OAuth2 service that provides the access tokens for the HTTP requests. Web Service Client Provider * Web Service Client Provider Controller service to provide the HTTP client for sending the HTTP requests. Record Writer * record-sink-record-writer Specifies the Controller Service to use for writing out the records.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: IdentifyMimeType 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/identifymimetype.md section: Loading & Unloading Data --- # IdentifyMimeType 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Attempts to identify the MIME Type used for a FlowFile. If the MIME Type can be identified, an attribute with the name 'mime.type' is added with the value being the MIME Type. If the MIME Type cannot be determined, the value will be set to 'application/octet-stream'. In addition, the attribute 'mime.extension' will be set if a common file extension for the MIME Type is known. If the MIME Type detected is of type text/*, attempts to identify the charset used and an attribute with the name 'mime.charset' is added with the value being the charset. ## Tags MIME, bzip2, compression, file, gzip, identify, mime.type, zip ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
Property Description Custom MIME Configuration A URL or file path to a custom Tika Mime type configuration or the actual content of a custom Tika Mime type configuration. config-strategy Select the loading strategy for MIME Type configuration to be used. use-filename-in-detection If true will pass the filename to Tika to aid in detection.
## Relationships
Name Description success All FlowFiles are routed to success
## Writes attributes
Name Description mime.type This Processor sets the FlowFile's mime.type attribute to the detected MIME Type. If unable to detect the MIME Type, the attribute's value will be set to application/octet-stream mime.extension This Processor sets the FlowFile's mime.extension attribute to the file extension associated with the detected MIME Type. If there is no correlated extension, the attribute's value will be empty mime.charset This Processor sets the FlowFile's mime.charset attribute to the detected charset. If unable to detect the charset or the detected MIME type is not of type text/*, the attribute will not be set
--- title: Install and configure the Openflow Connector for Oracle source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/setup-connector.md section: Loading & Unloading Data --- # Install and configure the %oracleofc% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). The %oracleofc% is also subject to additional terms of service beyond the standard connector terms of service. For more information, see the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/). - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/oracle/about) - [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms) - [](/user-guide/data-integration/openflow/connectors/oracle/setup-snowflake) - [](/user-guide/data-integration/openflow/connectors/oracle/incremental-replication) This topic describes the steps to install and configure the %oracleofc% connector. As a data engineer, perform the following tasks to install and configure the connector: ## Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ## Runtime sizing The runtime size determines the CPU and memory available to the connector. The available sizes are Small, Medium, and Large. The connector requires Medium or Large. Choose the size when you create the runtime: you can't change the size of an existing runtime in place. Choose Large if you expect high replication throughput or if source tables contain wide rows. ## Resize a runtime Runtime size is fixed at creation, so to change size you run the connector on a different runtime. You have two options depending on whether you want to preserve the current replication progress. If you don't need to keep the progress of the current connector, the simplest path is to create a new runtime at the size you need and install a new connector instance on it. The new connector starts from scratch: it snapshots all configured tables and then captures ongoing changes from that point. The replication progress of the existing connector is discarded. To keep the progress of the current connector, for example to avoid re-snapshotting tables that took a long time to snapshot initially, migrate the connector to the new runtime. This reuses the existing destination tables and resumes incremental replication from where it left off. For migration instructions, see [Reinstall the connector](#label-oracle-reinstall-connector). ## Configure the connector To configure the connector, do the following as a data engineer: 1. Right-click on the added runtime and select **Parameters**. 2. Populate the required parameter values. For more information on the required parameter values, see the following sections: - [](#label-oracle-snowflake-destination-parameters): Used to establish connection with Snowflake. - [](#label-oracle-ingestion-parameters): Used to specify the tables to replicate. - [](#label-oracle-source-parameters): Used to define the configuration of data downloaded from Oracle. ### Snowflake Destination Parameters
Parameter Description Required Destination Database The database where data is persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes Destination Schema Pattern A pattern for the names of destination schemas where data is persisted. The connector creates the schemas if they don't exist. You can customize the pattern per ingested table using these optional variables: - `${source.database.name}`: a source table's database. - `${source.schema.name}`: a source table's schema. - `${source.table.name}`: a source table's name. For example, for a table with the qualified name `source_db.tenant_a.data`, the pattern `prefix_${source.database.name}_${source.schema.name}` evaluates to `prefix_source_db_tenant_a`. To ingest all tables into a single schema, provide a schema name without any variables, like `destination_schema`. Don't change this setting after the connector has begun ingesting data. Changing this setting after ingestion has begun breaks the existing ingestion. If you must change this setting, create a new connector instance. Yes Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data is persisted. Yes Snowflake Connection Strategy When using KEY_PAIR, specify the strategy for connecting to Snowflake: - **STANDARD** (default): Connect using standard public routing to Snowflake services. - **PRIVATE_CONNECTIVITY**: Connect using private addresses associated with the supporting cloud platform such as AWS PrivateLink. Required for BYOC with KEY_PAIR only, otherwise ignored. Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake Private Key File. No Snowflake Role When using - **Session Token Authentication Strategy**: Use Snowflake Role assigned to the runtime or child role granted to this Snowflake Role. You can find your runtime Snowflake Role in the Openflow UI, by expanding the **More Options [⋮]** button for your runtime and selecting **Set Snowflake role**. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No Snowflake Warehouse Snowflake warehouse used to run queries. Yes
### Oracle Ingestion Parameters
Parameter Description Included Table Names Comma-separated list of fully-qualified table paths. Tables must be specified using fully qualified database, schema and table name format: DATABASE_NAME.SCHEMA_NAME.TABLE_NAME. For example: `MYPDB.SALES.CUSTOMERS, MYPDB.SALES.ORDERS` Included Table Regex A regular expression to match table paths for automatic inclusion of existing and new tables. The regex pattern must match the three-part naming convention: DATABASE_NAME.SCHEMA_NAME.TABLE_NAME. For example: `MYPDB\.SALES\..*` to match all tables in the SALES schema within the MYPDB database. Column Filter JSON Optional. A JSON array of filter objects specifying which columns to include or exclude per table. For syntax details and examples, see [Replicate a subset of columns in a table](#replicate-a-subset-of-columns-in-a-table). Table Key Configuration Service Optional. A `MultiDatabaseJsonTableKeyConfigService` controller service that supplies a user-declared logical key for one or more tables. The service exposes a **Table Key Configuration JSON** property where you define the key mappings. When configured, the logical key takes the highest priority and overrides any primary key, unique constraint, or unique index that the connector would otherwise auto-detect. For more information on when to use this and how to configure it, see [](#label-oracle-logical-key). Merge Task Schedule CRON A CRON expression to define when merge operations from the Journal to the Destination Table are triggered. For example, * * * * * ? for continuous merge. Object Identifier Resolution Specifies how source object identifiers such as schemas, tables, and column names are stored and queried in Snowflake. This setting determines if you must use double quotes in SQL queries.

Option 1: Default, case-insensitive (recommended).

- **Transformation**: All identifiers are converted to uppercase. For example, `My_Table` becomes `MY_TABLE`. - **Queries**: SQL queries are case-insensitive and don't require SQL double quotes. For example `SELECT * FROM my_table;` returns the same results as `SELECT * FROM MY_TABLE;`. Snowflake recommends using this option if database objects aren't expected to have mixed case names.

Option 2: case-sensitive.

- **Transformation**: Case is preserved. For example, `My_Table` remains `My_Table`. - **Queries**: SQL queries must use double quotes to match the exact case for database objects. For example, `SELECT * FROM "My_Table";`. Do not change this setting after connector ingestion has begun. Changing this setting after ingestion has begun breaks the existing ingestion. If you must change this setting, create a new connector instance. Snapshot Fetching Strategy Determines the snapshot load fetching strategy: - **CONCURRENT_BY_ROWID** (default): Splits tables into chunks bound by ranges of physical row ids, and retrieves each chunk in parallel. - **SEQUENTIAL_BY_PRIMARY_KEY**: Uses fixed-size batches retrieved sequentially by the table's replication key (primary key, unique constraint, unique index, or logical key). Despite the name, this strategy uses whatever key the connector resolved for the table, not specifically the primary key. Concurrent Snapshot Queries Maximum number of concurrent queries to the source database to run in the Snapshot flow. Increasing this can speed up snapshotting large numbers of tables, but will also increase the load on the source database.
### Oracle Source Parameters
Parameter Description Required Oracle Connection URL JDBC URL of the database connection to the DB. The URL must specify the target container (PDB or CDB) that contains the data to be replicated. For example `jdbc:oracle:thin@:/YOUR_DB_NAME` where YOUR_DB_NAME is the name of your PDB or CDB. When SSL is enabled, use the TCPS protocol, for example `jdbc:oracle:thin:@tcps://:/YOUR_DB_NAME`. The connector works within a single database/container. Ensure the JDBC URL points directly to the container that holds the tables to be replicated. Yes Oracle Username Username of the connect user that has access to the XStream Server. Yes Oracle Password Password of the connect user that has access to the XStream Server. Yes Oracle SSL Mode Controls SSL encryption for connections to the Oracle database. - **DISABLED**, which is the default: Connect without SSL. - **VERIFY_CA**: Connect with SSL. Verifies that a trusted Certificate Authority issued the server certificate. - **VERIFY_IDENTITY**: Connect with SSL. Verifies the CA certificate and that the server hostname matches the certificate's subject. When set to VERIFY_CA or VERIFY_IDENTITY, you must also provide the Oracle Wallet Filename parameter. Yes Oracle Wallet Filename Upload the file that contains the Oracle auto-login wallet file (`cwallet.sso`). The wallet must contain the trusted server certificate for SSL connections. For information about creating the wallet, see [](#label-configure-ssl-connections). Required when SSL Mode is not DISABLED Oracle Database Processor Multiplier Core Processor Licensing Factor as described in [Oracle Processor Core Factor Table](https://www.oracle.com/contracts/docs/processor-core-factor-table-070634.pdf) Required for Embedded License only Oracle Database Processor Cores The number of processor cores in your Oracle database. Required for Embedded License only XStream Billing Acknowledgement A confirmation of the licensing agreement Required for Embedded License only XStream Out Server Name The name of the XStream Server that must already exist in Oracle. Yes XStream Out Server URL JDBC URL of the database connection for XStream, must use OCI driver. For example `jdbc:oracle:oci:@:/SID`. When SSL is enabled, use the TCPS protocol, for example `jdbc:oracle:oci:@tcps://:/SID`. When SSL Mode is enabled, the connector automatically adds `SSL_SERVER_DN_MATCH` and `MY_WALLET_DIRECTORY` to the XStream URL. You don't need to include these manually. Yes
## Restart table replication A table in FAILED state — for example, due to a missing primary key or unsupported schema change — does not restart automatically. If a table enters a FAILED state or you need to restart replication from scratch, use the following procedure to remove and re-add the table to replication. If the failure was caused by an issue in the source table such as a missing primary key, resolve that issue in the source database before continuing. 1. Remove the table from replication, using one of the following methods: - Add the table to the **Re-snapshot Table Exclusions** parameter to temporarily exclude it from replication. This is convenient when the table is matched by an **Included Table Regex** that you don't want to change. - In the Ingestion Parameters context, either remove the table from **Included Table Names** or modify the **Included Table Regex** so the table is no longer matched. 2. Verify the table has been removed: 1. In the Openflow runtime canvas, right-click a processor group and choose **Controller Services**. 2. In the table listing controller services, locate the **Table State Store** row, click the three vertical dots on the right side of the row, then choose **View State**. You must wait until the table's state is fully removed from this list before proceeding. Do not continue until this configuration change has completed. 3. Clean up the destination: Once the table's state shows as fully removed, manually [DROP](/sql-reference/sql/drop-table) the destination table in Snowflake. Note that the connector will not overwrite an existing destination table during the snapshot phase; if the table still exists, replication will fail again. Optionally, the journal table and stream can also be removed if they are no longer needed. 4. Re-add the table by reversing the change you made in the first step: either remove the table from **Re-snapshot Table Exclusions**, or add it back to **Included Table Names** or **Included Table Regex**. The connector then re-snapshots the table. 5. Verify the restart: Check the **Table State Store** using the instructions given previously. The state of the table should appear with the status NEW, then transition to SNAPSHOT_REPLICATION, and finally INCREMENTAL_REPLICATION. ## Replicate a subset of columns in a table The connector can filter the data replicated per table to a subset of configured columns. Primary key columns are always included regardless of exclusions. To apply column filters, set the **Column Filter JSON** parameter in the Ingestion Parameters context to a JSON array of filter objects, one per table you want to filter. Columns can be included or excluded by name or by regular expression pattern. You can apply a single condition per table, or combine multiple conditions, with exclusions always taking precedence over inclusions. ## Syntax Each object in the array identifies a table and specifies which columns to include or exclude. Because this connector uses three-part fully qualified names (database, schema, and table), each object can include a `database` or `databasePattern` field in addition to the schema and table fields. ```javascript [ { "database": "" | "databasePattern": "", "schema": "" | "schemaPattern": "", "table": "" | "tablePattern": "", "included": ["", ""], "excluded": ["", ""], "includedPattern": "", "excludedPattern": "" } ] ``` The following rules apply: - Use `database`, `schema`, and `table` for exact name matching, or `databasePattern`, `schemaPattern`, and `tablePattern` for regex matching. You can't use both a field and its pattern variant in the same object (for example, `schema` and `schemaPattern` can't both appear). - At least one of `included`, `excluded`, `includedPattern`, or `excludedPattern` must be provided. - When both included and excluded filters are specified, exclusions take precedence. - When multiple filters match the same table, the last matching filter is used, with exact matches taking precedence over pattern-based filters. - The value can be an array of objects to apply different filters to different tables. ## Examples Include specific columns by name: ```javascript [ { "database": "my_db", "schema": "dbo", "table": "orders", "included": ["account_id", "status", "created_at"] } ] ``` Exclude specific columns by name: ```javascript [ { "database": "my_db", "schema": "dbo", "table": "orders", "excluded": ["internal_note", "debug_flag"] } ] ``` Combine an include pattern with a specific exclusion (for example, include all email columns except `admin_email`): ```javascript [ { "database": "my_db", "schema": "dbo", "table": "contacts", "includedPattern": ".*_email", "excluded": ["admin_email"] } ] ``` Mix a database pattern with an exact schema and table name to apply a filter across databases: ```javascript [ { "databasePattern": "prod_.*", "schema": "dbo", "table": "customers", "excluded": ["internal_note"] } ] ``` Pass multiple filter objects to apply different rules to different tables: ```javascript [ {"database": "my_db", "schema": "dbo", "table": "orders", "included": ["account_id", "status"]}, {"database": "my_db", "schema": "dbo", "table": "customers", "excludedPattern": ".*_internal"} ] ``` ### Including and excluding the same column Removing a column from a table's replicated set (by excluding it or by removing it from the included list) has the same effect on the destination as dropping the column at the source: the connector soft-deletes the column on the destination by renaming it with a suffix (by default, `__SNOWFLAKE_DELETED`). If you then add the column back to the replicated set and later remove it a second time, replication for the affected table fails because the soft-deleted column name is already taken. To recover, restart replication for the affected table. ## Specify a logical key for a table The connector requires a replication key for every table it replicates. By default, the connector picks the replication key automatically, in this order: a primary key, then a qualifying unique constraint, then a qualifying unique index. For the full selection rules, see [](#label-oracle-replication-key-selection). A *logical key* is a user-declared replacement for the auto-detected key. Configure a logical key when: - A table has no primary key and no qualifying unique constraint or unique index, but one or more columns are unique in the data. - A specific column or set of columns should be used as the replication key, regardless of what the connector would auto-detect (for example, to override a synthetic primary key). A logical key takes the highest priority. When the connector finds a logical key for a table, it uses that key and ignores any primary key, unique constraint, or unique index on the table. ### JSON syntax The **Table Key Configuration JSON** value is a JSON array. Each entry maps one table to its logical key columns: ```json [ { "database": "", "schema": "", "table": "
", "logicalKey": ["", ""] } ] ``` The fields are:
The following rules apply: - `database`, `schema`, and `table` matching is **case-sensitive**. Oracle stores identifiers in uppercase by default, so use uppercase names unless the identifiers were created with double-quoted mixed-case or lowercase names. - `logicalKey` column matching is case-insensitive. The connector matches column names against the source table schema regardless of case. - An entry whose `database`, `schema`, and `table` don't match any replicated table is silently ignored. ### Logical key configuration examples A single-column logical key on a table without a primary key: ```json [ { "database": "MYPDB", "schema": "SALES", "table": "AUDIT_LOG", "logicalKey": ["EVENT_ID"] } ] ``` A composite logical key: ```json [ { "database": "MYPDB", "schema": "SALES", "table": "ORDER_LINES", "logicalKey": ["ORDER_ID", "LINE_ITEM_ID"] } ] ``` Logical keys for several tables in one JSON value: ```json [ { "database": "MYPDB", "schema": "SALES", "table": "AUDIT_LOG", "logicalKey": ["EVENT_ID"] }, { "database": "MYPDB", "schema": "SALES", "table": "ORDER_LINES", "logicalKey": ["ORDER_ID", "LINE_ITEM_ID"] } ] ``` ### Restrictions The connector rejects the configuration when any of the following is true: - `logicalKey` is missing, empty, or not an array. - `logicalKey` contains duplicate column names (compared case-insensitively). - `logicalKey` contains the pseudo-column `ROWID`. `ROWID` isn't a reliable replication key because it can change when a row is moved (for example, after a table rebuild or partition operation). - `logicalKey` contains a column name that doesn't exist in the source table. When the configuration is rejected, the connector either fails to enable the controller service (for structural issues detected at enablement time) or holds the table in the `NEW` state (for issues detected when the table is initialized). After you fix the configuration, replication for the table resumes without resetting state. ### Warnings logged for risky configurations The connector accepts the following configurations but logs a warning at table initialization. Verify the data carefully or arrange a periodic full reload to correct drift. When choosing logical-key columns, prefer columns with high cardinality and, where possible, monotonically increasing values. Low-cardinality or non-monotonic keys can degrade snapshot performance if you use the `SEQUENTIAL_BY_PRIMARY_KEY` strategy, which orders rows by the replication key. - A logical-key column is a large-object type (`BLOB`, `CLOB`, `NCLOB`, `LONG`, `LONG RAW`). Using large objects as keys severely degrades MERGE performance. - A logical-key column is a floating-point type (`FLOAT`, `DOUBLE`, `REAL`, `BINARY_FLOAT`, `BINARY_DOUBLE`). Floating-point comparisons can produce inconsistent results because of precision differences. - The composite logical key has more than five columns. Long composite keys often indicate a design issue and might degrade MERGE performance. - The logical key replaces an existing primary key on the table. ### Limitation: Changes to a logical-key value When a source `UPDATE` changes the value of a logical-key column, the connector does **not** soft-delete the row keyed by the old value before inserting the row keyed by the new value. The destination table ends up with two active rows for what's a single row in the source: the original row, still active under its old key value, and a new row under the new key value. This differs from how the connector handles a primary-key value change on tables that don't use a logical key. For more information on that behavior, see [](#label-oracle-replication-key-value-change). To avoid this limitation, choose logical-key columns whose values don't change in the source. If logical-key values do change, periodically run a full reload for the affected tables to reconcile the destination with the source. ### Schema changes that affect a logical key Logical keys reference column names. The connector doesn't follow renames or drops of those columns: - If a logical-key column is dropped on the source, replication for the affected table fails. The table is marked `FAILED`. For recovery steps, see [](#label-oracle-logical-key-invalidated) in the troubleshooting topic. - If a logical-key column is renamed on the source, the configuration still references the old name and replication fails. Update the JSON to use the new name and restart table replication. ## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. ## Next steps - (Optional) [Set up incremental replication without snapshots](/user-guide/data-integration/openflow/connectors/oracle/incremental-replication). - [Monitor the flow](/user-guide/data-integration/openflow/monitor). --- title: InvokeHTTP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/invokehttp.md section: Loading & Unloading Data --- # InvokeHTTP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description An HTTP client processor which can interact with a configurable HTTP Endpoint. The destination URL and HTTP Method are configurable. When the HTTP Method is PUT, POST or PATCH, the FlowFile contents are included as the body of the request and FlowFile attributes are converted to HTTP headers, optionally, based on configuration properties. ## Tags client, http, https, rest ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties true ## Properties
## Relationships
## Writes attributes
--- title: InvokeScriptedProcessor 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/invokescriptedprocessor.md section: Loading & Unloading Data --- # InvokeScriptedProcessor 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-scripting-nar ## Description Experimental - Invokes a script engine for a Processor defined in the given script. The script must define a valid class that implements the Processor interface, and it must set a variable 'processor' to an instance of the class. Processor methods such as onTrigger() will be delegated to the scripted Processor instance. Also any Relationships or PropertyDescriptors defined by the scripted processor will be added to the configuration dialog. The scripted processor can implement public void setLogger(ComponentLog logger) to get access to the parent logger, as well as public void onScheduled(ProcessContext context) and public void onStopped(ProcessContext context) methods to be invoked when the parent InvokeScriptedProcessor is scheduled or stopped, respectively. NOTE: The script will be loaded when the processor is populated with property values, see the Restrictions section for more security implications. Experimental: Impact of sustained usage not yet verified. ## Tags groovy, invoke, script ## Input Requirement ## Supports Sensitive Dynamic Properties true ## Properties
## State management
## Restrictions
## See also - [org.apache.nifi.processors.script.ExecuteScript](/user-guide/data-integration/openflow/processors/executescript) --- title: IPLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/iplookupservice.md section: Loading & Unloading Data --- # IPLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A lookup service that provides several types of enrichment information for IP addresses. The service is configured by providing a MaxMind Database file and specifying which types of enrichment should be provided for an IP Address or Hostname. Each type of enrichment is a separate lookup, so configuring the service to provide all of the available enrichment data may be slower than returning only a portion of the available enrichments. In order to use this service, a lookup must be performed using key of 'ip' and a value that is a valid IP address or hostname. View the Usage of this component and choose to view Additional Details for more information, such as the Schema that pertains to the information that is returned. ## Tags anonymous, cellular, domain, enrich, geo, ip, ipgeo, isp, lookup, maxmind, tor ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: ISPEnrichIP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/ispenrichip.md section: Loading & Unloading Data --- # ISPEnrichIP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-enrich-nar ## Description Looks up ISP information for an IP address and adds the information to FlowFile attributes. The ISP data is provided as a MaxMind ISP database. (Note that this is NOT the same as the GeoLite database utilized by some geo enrichment tools). The attribute that contains the IP address to lookup is provided by the 'IP Address Attribute' property. If the name of the attribute provided is 'X', then the attributes added by enrichment will take the form X.isp.<fieldName> ## Tags ISP, enrich, ip, maxmind ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: JettyWebSocketClient source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/jettywebsocketclient.md section: Loading & Unloading Data --- # JettyWebSocketClient This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Implementation of WebSocketClientService. This service uses Jetty WebSocket client module to provide WebSocket session management throughout the application. ## Tags Jetty, WebSocket, client ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: JettyWebSocketServer source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/jettywebsocketserver.md section: Loading & Unloading Data --- # JettyWebSocketServer This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Implementation of WebSocketServerService. This service uses Jetty WebSocket server module to provide WebSocket session management throughout the application. ## Tags Jetty, WebSocket, server ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: JMSConnectionFactoryProvider source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/jmsconnectionfactoryprovider.md section: Loading & Unloading Data --- # JMSConnectionFactoryProvider This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a generic service to create vendor specific javax.jms. ConnectionFactory implementations. The Connection Factory can be served once this service is configured successfully. ## Tags integration, jms, messaging, publish, queue, subscribe, topic ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted ## Restrictions
## System Resource Considerations This component does not specify system resource considerations. --- title: JndiJmsConnectionFactoryProvider source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/jndijmsconnectionfactoryprovider.md section: Loading & Unloading Data --- # JndiJmsConnectionFactoryProvider This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a service to lookup an existing JMS ConnectionFactory using the Java Naming and Directory Interface (JNDI). ## Tags integration, jms, jndi, messaging, publish, queue, subscribe, topic ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: JoinEnrichment 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/joinenrichment.md section: Loading & Unloading Data --- # JoinEnrichment 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Joins together Records from two different FlowFiles where one FlowFile, the 'original' contains arbitrary records and the second FlowFile, the 'enrichment' contains additional data that should be used to enrich the first. See Additional Details for more information on how to configure this processor and the different use cases that it aims to accomplish. ## Tags combine, enrichment, fork, join, merge, record, recordpath, sql, streams, wrap ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.ForkEnrichment](/user-guide/data-integration/openflow/processors/forkenrichment) --- title: JoltTransformJSON 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/jolttransformjson.md section: Loading & Unloading Data --- # JoltTransformJSON 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-jolt-nar ## Description Applies a list of Jolt specifications to either the FlowFile JSON content or a specified FlowFile JSON attribute. If the JSON transform fails, the original FlowFile is routed to the 'failure' relationship. ## Tags cardinality, chainr, default, jolt, json, remove, shift, sort, transform ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: JoltTransformRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/jolttransformrecord.md section: Loading & Unloading Data --- # JoltTransformRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-jolt-nar ## Description Applies a JOLT specification to each record in the FlowFile payload. A new FlowFile is created with transformed content and is routed to the 'success' relationship. If the transform fails, the original FlowFile is routed to the 'failure' relationship. ## Tags cardinality, chainr, defaultr, jolt, record, removr, shiftr, sort, transform ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: JSLTTransformJSON 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/jslttransformjson.md section: Loading & Unloading Data --- # JSLTTransformJSON 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-jslt-nar ## Description Applies a JSLT transformation to the FlowFile JSON payload. A new FlowFile is created with transformed content and is routed to the 'success' relationship. If the JSLT transform fails, the original FlowFile is routed to the 'failure' relationship. ## Tags jslt, json, transform ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: JsonConfigBasedBoxClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/jsonconfigbasedboxclientservice.md section: Loading & Unloading Data --- # JsonConfigBasedBoxClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides Box client objects through which Box API calls can be used. ## Tags box, client, provider ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: JsonPathReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/jsonpathreader.md section: Loading & Unloading Data --- # JsonPathReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Parses JSON records and evaluates user-defined JSON Path 's against each JSON object. While the reader expects each record to be well-formed JSON, the content of a FlowFile may consist of many records, each as a well-formed JSON array or JSON object with optional whitespace between them, such as the common'JSON-per-line' format. If an array is encountered, each element in that array will be treated as a separate record. User-defined properties define the fields that should be extracted from the JSON in order to form the fields of a Record. Any JSON field that is not extracted via a JSONPath will not be returned in the JSON Records. ## Tags json, jsonpath, parser, reader, record ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: JsonQueryElasticsearch 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/jsonqueryelasticsearch.md section: Loading & Unloading Data --- # JsonQueryElasticsearch 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-elasticsearch-restapi-nar ## Description A processor that allows the user to run a query (with aggregations) written with the Elasticsearch JSON DSL. It does not automatically paginate queries for the user. If an incoming relationship is added to this processor, it will use the flowfile's content for the query. Care should be taken on the size of the query because the entire response from Elasticsearch will be loaded into memory all at once and converted into the resulting flowfiles. ## Tags elasticsearch, elasticsearch7, elasticsearch8, elasticsearch9, get, json, query, read ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.elasticsearch.PaginatedJsonQueryElasticsearch](/user-guide/data-integration/openflow/processors/paginatedjsonqueryelasticsearch) --- title: JsonRecordSetWriter source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/jsonrecordsetwriter.md section: Loading & Unloading Data --- # JsonRecordSetWriter This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Writes the results of a RecordSet as either a JSON Array or one JSON object per line. If using Array output, then even if the RecordSet consists of a single row, it will be written as an array with a single element. If using One Line Per Object output, the JSON objects cannot be pretty-printed. ## Tags json, record, recordset, resultset, row, serialize, writer ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: JsonTableColumnFilter source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/jsontablecolumnfilter.md section: Loading & Unloading Data --- # JsonTableColumnFilter This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a table column filter based on a JSON configuration. The JSON configuration should be an array of objects, where each object represents a table and its column filter. The object should have the following properties: - schema: the schema name of the table - table: the table name - included: an array of column names to include - excluded: an array of column names to exclude - includedPattern: a regular expression pattern to include columns - excludedPattern: a regular expression pattern to exclude columns The schema and table must be provided for each object, and one or more of the *included*, *excluded*, *includedPattern*, or *excludedPattern* properties must be provided. If any column is included as both included and excluded, the column will be excluded. If only a single filter is provided, the JSON configuration may be a single JSON object, rather than an array. ## Tags column, database, filter, snowflake, table ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: JsonTreeReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/jsontreereader.md section: Loading & Unloading Data --- # JsonTreeReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Parses JSON into individual Record objects. While the reader expects each record to be well-formed JSON, the content of a FlowFile may consist of many records, each as a well-formed JSON array or JSON object with optional whitespace between them, such as the common 'JSON-per-line' format. If an array is encountered, each element in that array will be treated as a separate record. If the schema that is configured contains a field that is not present in the JSON, a null value will be used. If the JSON contains a field that is not present in the schema, that field will be skipped. See the Usage of the Controller Service for more information and examples. ## Tags json, parser, reader, record, tree ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: JWTBearerOAuth2AccessTokenProvider source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/jwtbeareroauth2accesstokenprovider.md section: Loading & Unloading Data --- # JWTBearerOAuth2AccessTokenProvider This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides OAuth 2.0 access tokens that can be used as Bearer authorization header in HTTP requests. This controller service is for implementing the OAuth 2.0 JWT Bearer Flow. ## Tags access token, authorization, hjwt, oauth2, provider ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: Kafka3ConnectionService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/kafka3connectionservice.md section: Loading & Unloading Data --- # Kafka3ConnectionService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides and manages connections to Kafka Brokers for producer or consumer operations. ## Tags kafka, openflow ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: ListArchivedHubSpotData 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listarchivedhubspotdata.md section: Loading & Unloading Data --- # ListArchivedHubSpotData 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-hubspot-processors-nar ## Description Lists archived data from HubSpot for the chosen object type and generates one FlowFile per listed object with the corresponding metadata as FlowFile attributes. The object type must be searchable, which means it supports access to the /search endpoint. For more information about searchable object types, see: [https://developers.hubspot.com/docs/reference/api/crm/objects/objects#search](https://developers.hubspot.com/docs/reference/api/crm/objects/objects#search)") ## Tags Preview, hubspot ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## Use cases | This processor is typically used in conjunction with a GenerateFlowFile processor | | --------------------------------------------------------------------------------- | ## See also - [com.snowflake.openflow.runtime.processors.hubspot.GetHubSpotObject](/user-guide/data-integration/openflow/processors/gethubspotobject) - [com.snowflake.openflow.runtime.processors.hubspot.GetHubSpotSchema](/user-guide/data-integration/openflow/processors/gethubspotschema) - [com.snowflake.openflow.runtime.processors.hubspot.ListHubSpotObjects](/user-guide/data-integration/openflow/processors/listhubspotobjects) - [com.snowflake.openflow.runtime.processors.hubspot.PutHubSpot](/user-guide/data-integration/openflow/processors/puthubspot) --- title: ListAzureBlobStorage_v12 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listazureblobstorage_v12.md section: Loading & Unloading Data --- # ListAzureBlobStorage_v12 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Lists blobs in an Azure Blob Storage container. Listing details are attached to an empty FlowFile for use with FetchAzureBlobStorage. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. The processor uses Azure Blob Storage client library v12. ## Tags azure, blob, cloud, microsoft, storage ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.azure.storage.CopyAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/copyazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.DeleteAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/deleteazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.FetchAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/fetchazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.PutAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/putazureblobstorage_v12) --- title: ListAzureDataLakeStorage 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listazuredatalakestorage.md section: Loading & Unloading Data --- # ListAzureDataLakeStorage 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Lists directory in an Azure Data Lake Storage Gen 2 filesystem ## Tags adlsgen2, azure, cloud, datalake, microsoft, storage ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.azure.storage.DeleteAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/deleteazuredatalakestorage) - [org.apache.nifi.processors.azure.storage.FetchAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/fetchazuredatalakestorage) - [org.apache.nifi.processors.azure.storage.PutAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/putazuredatalakestorage) --- title: ListBoxFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listboxfile.md section: Loading & Unloading Data --- # ListBoxFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Lists files in a Box folder. Each listed file may result in one FlowFile, the metadata being written as FlowFile attributes. Or - in case the 'Record Writer' property is set - the entire result is written as records to a single FlowFile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. ## Tags box, storage ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.PutBoxFile](/user-guide/data-integration/openflow/processors/putboxfile) --- title: ListBoxFileInfo 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listboxfileinfo.md section: Loading & Unloading Data --- # ListBoxFileInfo 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Fetches file metadata for each file in a Box Folder. Takes a flowFile with a folder ID attribute and outputs flowFiles with records containing all file metadata. ## Tags box, fetch, files, folder, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) - [org.apache.nifi.processors.box.PutBoxFile](/user-guide/data-integration/openflow/processors/putboxfile) --- title: ListBoxFileMetadataInstances 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listboxfilemetadatainstances.md section: Loading & Unloading Data --- # ListBoxFileMetadataInstances 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Retrieves all metadata instances associated with a Box file. ## Tags box, instances, metadata, storage, templates ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.FetchBoxFileInfo](/user-guide/data-integration/openflow/processors/fetchboxfileinfo) - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) --- title: ListBoxFileMetadataTemplates 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listboxfilemetadatatemplates.md section: Loading & Unloading Data --- # ListBoxFileMetadataTemplates 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Retrieves all metadata templates associated with a Box file. ## Tags box, metadata, storage, templates ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.FetchBoxFileInfo](/user-guide/data-integration/openflow/processors/fetchboxfileinfo) - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) --- title: ListConfluenceGroups 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listconfluencegroups.md section: Loading & Unloading Data --- # ListConfluenceGroups 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-atlassian-processors-nar ## Description Processor listing Confluence groups. ## Tags Preview, atlassian, confluence, groups ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ListDatabaseTables 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listdatabasetables.md section: Loading & Unloading Data --- # ListDatabaseTables 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Generates a set of flow files, each containing attributes corresponding to metadata about a table from a database connection. Once metadata about a table has been fetched, it will not be fetched again until the Refresh Interval (if set) has elapsed, or until state has been manually cleared. ## Tags database, jdbc, list, sql, table ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## Use Cases Involving Other Components | Perform a full load of a database, retrieving all rows from all tables, or a specific set of tables. | | ---------------------------------------------------------------------------------------------------- | --- title: ListDBFSDirectory 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listdbfsdirectory.md section: Loading & Unloading Data --- # ListDBFSDirectory 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-databricks-processors-nar ## Description List file names in a DBFS directory and output a new FlowFile with the filename. ## Tags databricks, dbfs, openflow ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ListDropbox 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listdropbox.md section: Loading & Unloading Data --- # ListDropbox 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-dropbox-processors-nar ## Description Retrieves a listing of files from Dropbox (shortcuts are ignored). Each listed file may result in one FlowFile, the metadata being written as FlowFile attributes. When the 'Record Writer' property is set, the entire result is written as records to a single FlowFile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. ## Tags dropbox, storage ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.dropbox.FetchDropbox](/user-guide/data-integration/openflow/processors/fetchdropbox) - [org.apache.nifi.processors.dropbox.PutDropbox](/user-guide/data-integration/openflow/processors/putdropbox) --- title: ListenFTP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listenftp.md section: Loading & Unloading Data --- # ListenFTP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Starts an FTP server that listens on the specified port and transforms incoming files into FlowFiles. The URI of the service will be ftp://\{hostname\}:\{port\}. The default port is 2221. ## Tags FTP, FTPS, ingest, listen ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ListenHTTP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listenhttp.md section: Loading & Unloading Data --- # ListenHTTP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Starts an HTTP Server and listens on a given base path to transform incoming requests into FlowFiles. The default URI of the Service will be http://\{hostname\}:\{port\}/contentListener. Only HEAD and POST requests are supported. GET, PUT, DELETE, OPTIONS and TRACE will result in an error and the HTTP response status code 405; CONNECT will also result in an error and the HTTP response status code 400. GET is supported on <service_URI>/healthcheck. If the service is available, it returns "200 OK" with the content "OK". The health check functionality can be configured to be accessible via a different port. For details, see the documentation of the "Listening Port for health check requests" property. A Record Reader and Record Writer property can be enabled on the processor to process incoming requests as records. Record processing is not allowed for multipart requests and request in FlowFileV3 format (minifi). If the incoming request contains a FlowFileV3 package format, the data will be unpacked automatically into individual FlowFile(s) contained within the package; the original FlowFile names are restored. ## Tags http, https, ingest, listen, rest ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Use cases | Unpack FlowFileV3 content received in a POST | | -------------------------------------------- | ## Use Cases Involving Other Components | Limit the date flow rate that is accepted | | ----------------------------------------- | --- title: ListenOTLP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listenotlp.md section: Loading & Unloading Data --- # ListenOTLP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-opentelemetry-nar ## Description Collect OpenTelemetry messages over HTTP or gRPC. Supports standard Export Service Request messages for logs, metrics, and traces. Implements OpenTelemetry OTLP Specification 1.0.0 with OTLP/gRPC and OTLP/HTTP. Provides protocol detection using the HTTP Content-Type header. ## Tags OTLP, OTel, OpenTelemetry, logs, metrics, telemetry, traces ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ListenSlack 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listenslack.md section: Loading & Unloading Data --- # ListenSlack 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-slack-nar ## Description Retrieves real-time messages or Slack commands from one or more Slack conversations. The messages are written out in JSON format. Note that this Processor should be used to obtain real-time messages and commands from Slack and does not provide a mechanism for obtaining historical messages. The ConsumeSlack Processor should be used for an initial load of messages from a channel. See Usage / Additional Details for more information about how to configure this Processor and enable it to retrieve messages and commands from Slack. ## Tags command, event, listen, message, real-time, receive, slack, social media, team, text, unstructured ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.slack.ConsumeSlack](/user-guide/data-integration/openflow/processors/consumeslack) --- title: ListenSyslog 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listensyslog.md section: Loading & Unloading Data --- # ListenSyslog 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Listens for Syslog messages being sent to a given port over TCP or UDP. Incoming messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The format of each message is: (<PRIORITY>)(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is optional. The timestamp can be an RFC5424 timestamp with a format of "yyyy-MM-dd 'T'HH:mm:ss. SZ" or "yyyy-MM-dd 'T'HH:mm:ss. S+hh:mm", or it can be an RFC3164 timestamp with a format of "MMM d HH:mm:ss". If an incoming messages matches one of these patterns, the message will be parsed and the individual pieces will be placed in FlowFile attributes, with the original message in the content of the FlowFile. If an incoming message does not match one of these patterns it will not be parsed and the syslog.valid attribute will be set to false with the original message in the content of the FlowFile. Valid messages will be transferred on the success relationship, and invalid messages will be transferred on the invalid relationship. ## Tags listen, logs, syslog, tcp, udp ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.ParseSyslog](/user-guide/data-integration/openflow/processors/parsesyslog) - [org.apache.nifi.processors.standard.PutSyslog](/user-guide/data-integration/openflow/processors/putsyslog) --- title: ListenTCP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listentcp.md section: Loading & Unloading Data --- # ListenTCP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Listens for incoming TCP connections and reads data from each connection using a line separator as the message demarcator. The default behavior is for each message to produce a single FlowFile, however this can be controlled by increasing the Batch Size to a larger value for higher throughput. The Receive Buffer Size must be set as large as the largest messages expected to be received, meaning if every 100kb there is a line separator, then the Receive Buffer Size must be greater than 100kb. The processor can be configured to use an SSL Context Service to only allow secure connections. When connected clients present certificates for mutual TLS authentication, the Distinguished Names of the client certificate's issuer and subject are added to the outgoing FlowFiles as attributes. The processor does not perform authorization based on Distinguished Name values, but since these values are attached to the outgoing FlowFiles, authorization can be implemented based on these attributes. ## Tags listen, ssl, tcp, tls ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ListenUDP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listenudp.md section: Loading & Unloading Data --- # ListenUDP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Listens for Datagram Packets on a given port. The default behavior produces a FlowFile per datagram, however for higher throughput the Max Batch Size property may be increased to specify the number of datagrams to batch together in a single FlowFile. This processor can be restricted to listening for datagrams from a specific remote host and port by specifying the Sending Host and Sending Host Port properties, otherwise it will listen for datagrams from all hosts and ports. ## Tags ingest, listen, source, udp ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ListenUDPRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listenudprecord.md section: Loading & Unloading Data --- # ListenUDPRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Listens for Datagram Packets on a given port and reads the content of each datagram using the configured Record Reader. Each record will then be written to a flow file using the configured Record Writer. This processor can be restricted to listening for datagrams from a specific remote host and port by specifying the Sending Host and Sending Host Port properties, otherwise it will listen for datagrams from all hosts and ports. ## Tags ingest, listen, record, source, udp ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ListenWebSocket 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listenwebsocket.md section: Loading & Unloading Data --- # ListenWebSocket 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-websocket-processors-nar ## Description Acts as a WebSocket server endpoint to accept client connections. FlowFiles are transferred to downstream relationships according to received message types as the WebSocket server configured with this processor receives client requests ## Tags WebSocket, consume, listen, subscribe ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ListFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listfile.md section: Loading & Unloading Data --- # ListFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Retrieves a listing of files from the input directory. For each file listed, creates a FlowFile that represents the file so that it can be fetched in conjunction with FetchFile. This Processor is designed to run on Primary Node only in a cluster when 'Input Directory Location' is set to 'Remote'. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all the data. When 'Input Directory Location' is 'Local', the 'Execution' mode can be anything, and synchronization won't happen. Unlike GetFile, this Processor does not delete any data from the local filesystem. ## Tags file, filesystem, get, ingest, list, source ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.FetchFile](/user-guide/data-integration/openflow/processors/fetchfile) - [org.apache.nifi.processors.standard.GetFile](/user-guide/data-integration/openflow/processors/getfile) - [org.apache.nifi.processors.standard.PutFile](/user-guide/data-integration/openflow/processors/putfile) --- title: ListFTP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listftp.md section: Loading & Unloading Data --- # ListFTP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Performs a listing of the files residing on an FTP server. For each file that is found on the remote server, a new FlowFile will be created with the filename attribute set to the name of the file on the remote server. This can then be used in conjunction with FetchFTP in order to fetch those files. ## Tags files, ftp, ingest, input, list, remote, source ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.FetchFTP](/user-guide/data-integration/openflow/processors/fetchftp) - [org.apache.nifi.processors.standard.GetFTP](/user-guide/data-integration/openflow/processors/getftp) - [org.apache.nifi.processors.standard.PutFTP](/user-guide/data-integration/openflow/processors/putftp) --- title: ListGCSBucket 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listgcsbucket.md section: Loading & Unloading Data --- # ListGCSBucket 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Retrieves a listing of objects from a GCS bucket. For each object that is listed, creates a FlowFile that represents the object so that it can be fetched in conjunction with FetchGCSObject. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. ## Tags gcs, google, google cloud, list, storage ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.gcp.storage.DeleteGCSObject](/user-guide/data-integration/openflow/processors/deletegcsobject) - [org.apache.nifi.processors.gcp.storage.FetchGCSObject](/user-guide/data-integration/openflow/processors/fetchgcsobject) - [org.apache.nifi.processors.gcp.storage.PutGCSObject](/user-guide/data-integration/openflow/processors/putgcsobject) --- title: ListGoogleDrive 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listgoogledrive.md section: Loading & Unloading Data --- # ListGoogleDrive 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Performs a listing of concrete files (shortcuts are ignored) in a Google Drive folder. If the 'Record Writer' property is set, a single Output FlowFile is created, and each file in the listing is written as a single record to the output file. Otherwise, for each file in the listing, an individual FlowFile is created, the metadata being written as FlowFile attributes. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. Please see Additional Details to set up access to Google Drive. ## Tags drive, google, storage ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.gcp.drive.FetchGoogleDrive](/user-guide/data-integration/openflow/processors/fetchgoogledrive) - [org.apache.nifi.processors.gcp.drive.PutGoogleDrive](/user-guide/data-integration/openflow/processors/putgoogledrive) --- title: ListGoogleDriveFileInfo 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listgoogledrivefileinfo.md section: Loading & Unloading Data --- # ListGoogleDriveFileInfo 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-google-drive-nar ## Description Lists all files and folders in a specified Google Drive. The processor requires a Drive ID and can optionally list files recursively through all folders within the drive. ## Tags cloud, drive, files, gcp, google, list, openflow, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [com.snowflake.openflow.runtime.processors.google.CaptureGoogleDriveChanges](/user-guide/data-integration/openflow/processors/capturegoogledrivechanges) - [com.snowflake.openflow.runtime.processors.google.FetchGoogleDriveMetadata](/user-guide/data-integration/openflow/processors/fetchgoogledrivemetadata) --- title: ListGoogleGroups 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listgooglegroups.md section: Loading & Unloading Data --- # ListGoogleGroups 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-google-drive-nar ## Description Lists all of the groups for a given domain in Google Workspace. It supports an optional 'Query' to filter the groups. The retrieved group metadata (id, etag, email, name, directMembersCount, description) are output to a Record Writer. ## Tags cloud, directory, domain, gcp, google, groups, list ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [com.snowflake.openflow.runtime.processors.google.GetGoogleGroupMembers](/user-guide/data-integration/openflow/processors/getgooglegroupmembers) --- title: ListHubSpotObjects 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listhubspotobjects.md section: Loading & Unloading Data --- # ListHubSpotObjects 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-hubspot-processors-nar ## Description Fetches data from HubSpot for specified object types, and generates one FlowFile per listed object with the corresponding metadata as FlowFile attributes. The object type must be searchable, which means it supports access to the /search endpoint. For more information about searchable object types, see: [https://developers.hubspot.com/docs/reference/api/crm/objects/objects#search](https://developers.hubspot.com/docs/reference/api/crm/objects/objects#search)") ## Tags Preview, hubspot ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## Use cases | This processor is typically used in conjunction with a GenerateFlowFile processor | | --------------------------------------------------------------------------------- | ## See also - [com.snowflake.openflow.runtime.processors.hubspot.GetHubSpotObject](/user-guide/data-integration/openflow/processors/gethubspotobject) - [com.snowflake.openflow.runtime.processors.hubspot.GetHubSpotSchema](/user-guide/data-integration/openflow/processors/gethubspotschema) - [com.snowflake.openflow.runtime.processors.hubspot.ListArchivedHubSpotData](/user-guide/data-integration/openflow/processors/listarchivedhubspotdata) - [com.snowflake.openflow.runtime.processors.hubspot.PutHubSpot](/user-guide/data-integration/openflow/processors/puthubspot) --- title: ListMicrosoftDataverseTables 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listmicrosoftdataversetables.md section: Loading & Unloading Data --- # ListMicrosoftDataverseTables 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-dataverse-processors-nar ## Description List Tables from Microsoft Dataverse environments ## Tags dataverse ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: ListS3 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/lists3.md section: Loading & Unloading Data --- # ListS3 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Retrieves a listing of objects from an S3 bucket. For each object that is listed, creates a FlowFile that represents the object so that it can be fetched in conjunction with FetchS3Object. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. ## Tags AWS, Amazon, S3, list ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.aws.s3.CopyS3Object](/user-guide/data-integration/openflow/processors/copys3object) - [org.apache.nifi.processors.aws.s3.DeleteS3Object](/user-guide/data-integration/openflow/processors/deletes3object) - [org.apache.nifi.processors.aws.s3.FetchS3Object](/user-guide/data-integration/openflow/processors/fetchs3object) - [org.apache.nifi.processors.aws.s3.GetS3ObjectMetadata](/user-guide/data-integration/openflow/processors/gets3objectmetadata) - [org.apache.nifi.processors.aws.s3.GetS3ObjectTags](/user-guide/data-integration/openflow/processors/gets3objecttags) - [org.apache.nifi.processors.aws.s3.PutS3Object](/user-guide/data-integration/openflow/processors/puts3object) - [org.apache.nifi.processors.aws.s3.TagS3Object](/user-guide/data-integration/openflow/processors/tags3object) --- title: ListSFDCDataShares 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listsfdcdatashares.md section: Loading & Unloading Data --- # ListSFDCDataShares 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-salesforce-processors-nar ## Description List the available data shares in the organization that are available to the identified user. ## Tags list, objects, preview, salesforce, sfdc ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [com.snowflake.openflow.runtime.processors.salesforce.DeleteQueryJob](/user-guide/data-integration/openflow/processors/deletequeryjob) - [com.snowflake.openflow.runtime.processors.salesforce.DescribeSFDCObject](/user-guide/data-integration/openflow/processors/describesfdcobject) - [com.snowflake.openflow.runtime.processors.salesforce.GetQueryJobResult](/user-guide/data-integration/openflow/processors/getqueryjobresult) - [com.snowflake.openflow.runtime.processors.salesforce.SubmitQueryJob](/user-guide/data-integration/openflow/processors/submitqueryjob) --- title: ListSFDCObjects 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listsfdcobjects.md section: Loading & Unloading Data --- # ListSFDCObjects 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-salesforce-processors-nar ## Description List the available objects in the organization that are available to the identified user. ## Tags list, objects, preview, salesforce, sfdc ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [com.snowflake.openflow.runtime.processors.salesforce.DeleteQueryJob](/user-guide/data-integration/openflow/processors/deletequeryjob) - [com.snowflake.openflow.runtime.processors.salesforce.DescribeSFDCObject](/user-guide/data-integration/openflow/processors/describesfdcobject) - [com.snowflake.openflow.runtime.processors.salesforce.GetQueryJobResult](/user-guide/data-integration/openflow/processors/getqueryjobresult) - [com.snowflake.openflow.runtime.processors.salesforce.SubmitQueryJob](/user-guide/data-integration/openflow/processors/submitqueryjob) --- title: ListSFTP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listsftp.md section: Loading & Unloading Data --- # ListSFTP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Performs a listing of the files residing on an SFTP server. For each file that is found on the remote server, a new FlowFile will be created with the filename attribute set to the name of the file on the remote server. This can then be used in conjunction with FetchSFTP in order to fetch those files. ## Tags files, ingest, input, list, remote, sftp, source ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.FetchSFTP](/user-guide/data-integration/openflow/processors/fetchsftp) - [org.apache.nifi.processors.standard.GetSFTP](/user-guide/data-integration/openflow/processors/getsftp) - [org.apache.nifi.processors.standard.PutSFTP](/user-guide/data-integration/openflow/processors/putsftp) --- title: ListSharepointDrives 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listsharepointdrives.md section: Loading & Unloading Data --- # ListSharepointDrives 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-msgraph-nar ## Description Emits a FlowFile for each Drive present in the specified Sharepoint Site. ## Tags document, graph, microsoft, openflow, sharepoint, unstructured ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [com.snowflake.openflow.runtime.processors.sharepoint.FetchSharepointFile](/user-guide/data-integration/openflow/processors/fetchsharepointfile) - [com.snowflake.openflow.runtime.processors.sharepoint.FindSharepointDriveItem](/user-guide/data-integration/openflow/processors/findsharepointdriveitem) --- title: ListSharepointSiteGroups 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listsharepointsitegroups.md section: Loading & Unloading Data --- # ListSharepointSiteGroups 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-sharepoint-rest-nar ## Description Lists all SharePoint site groups available on a specified SharePoint site. ## Tags groups, list, microsoft, openflow, sharepoint ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [com.snowflake.openflow.runtime.processors.sharepoint.rest.GetSharepointSiteGroupMembers](/user-guide/data-integration/openflow/processors/getsharepointsitegroupmembers) --- title: ListSmb 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listsmb.md section: Loading & Unloading Data --- # ListSmb 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-smb-nar ## Description Lists concrete files shared via SMB protocol. Each listed file may result in one FlowFile, the metadata being written as FlowFile attributes. Or - in case the 'Record Writer' property is set - the entire result is written as records to a single FlowFile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. ## Tags list, samba, smb, cifs, files ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.smb.FetchSmb](/user-guide/data-integration/openflow/processors/fetchsmb) - [org.apache.nifi.processors.smb.GetSmbFile](/user-guide/data-integration/openflow/processors/getsmbfile) - [org.apache.nifi.processors.smb.PutSmbFile](/user-guide/data-integration/openflow/processors/putsmbfile) --- title: ListTableNames 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listtablenames.md section: Loading & Unloading Data --- # ListTableNames 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-database-cdc-processors-nar ## Description Fetches all source table names and matches them with one of the possible configurations: - regexp expression e.g. "(?i)customer.(orders|payments)" - it matches names in case insensitive way. It would match both "CUSTOMER.ORDERS" and "customer.orders" source table names. - comma separated list of source table names. e.g. "customer.orders, customer.payments". It matches source table names in case sensitive way i.e. "customer.orders" source table will be forwarded to MATCH relationship but "customer. ORDERS" won 't match. Matched source tables that cannot be replicated will be routed to FAILURE relationship, each table in a separate FlowFile, with a reason in attributes. Configuration is passed as a FlowFile attribute. Source table name is represented as <schema_name>.<table_name> so both inputs should take that into consideration. Matched source table names are forwarded to MATCHED relationship. Processor generates a single FlowFile with matching tables. Disclaimers - Postgresql allows to define database object names in case sensitive or case insensitive way. When user creates a table using following query'CREATE TABLE ORDERS(id int not null) 'then internally Postgresql stores it using lower case letters i.e. orders. To enforce case sensitivity user has to wrap the table name with double quotes i.e.'CREATE TABLE "ORDERS"(id int not null)'. This is important aspect when configuring table that we would like to replicate. ## Tags ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ListUnityCatalogDirectory 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/listunitycatalogdirectory.md section: Loading & Unloading Data --- # ListUnityCatalogDirectory 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-databricks-processors-nar ## Description List file names in a Unity Catalog directory and output a new FlowFile with the filename. ## Tags databricks, openflow, unity catalog ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: LogAttribute 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/logattribute.md section: Loading & Unloading Data --- # LogAttribute 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Emits attributes of the FlowFile at the specified log level ## Tags attributes, logging ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: LoggingRecordSink source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/loggingrecordsink.md section: Loading & Unloading Data --- # LoggingRecordSink This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a RecordSinkService that can be used to log records to the application log (nifi-app.log, e.g.) using the specified writer for formatting. ## Tags log, record, sink ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: LogMessage 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/logmessage.md section: Loading & Unloading Data --- # LogMessage 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Emits a log message at the specified log level ## Tags attributes, logging ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: LookupAttribute 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/lookupattribute.md section: Loading & Unloading Data --- # LookupAttribute 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Lookup attributes from a lookup service ## Tags Attribute Expression Language, attributes, cache, enrich, join, lookup ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: LookupRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/lookuprecord.md section: Loading & Unloading Data --- # LookupRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Extracts one or more fields from a Record and looks up a value for those fields in a LookupService. If a result is returned by the LookupService, that result is optionally added to the Record. In this case, the processor functions as an Enrichment processor. Regardless, the Record is then routed to either the 'matched' relationship or 'unmatched' relationship (if the 'Routing Strategy' property is configured to do so), indicating whether or not a result was returned by the LookupService, allowing the processor to also function as a Routing processor. The "coordinates" to use for looking up a value in the Lookup Service are defined by adding a user-defined property. Each property that is added will have an entry added to a Map, where the name of the property becomes the Map Key and the value returned by the RecordPath becomes the value for that key. If multiple values are returned by the RecordPath, then the Record will be routed to the 'unmatched' relationship (or 'success', depending on the 'Routing Strategy' property's configuration). If one or more fields match the Result RecordPath, all fields that match will be updated. If there is no match in the configured LookupService, then no fields will be updated. I.e., it will not overwrite an existing value in the Record with a null value. Please note, however, that if the results returned by the LookupService are not accounted for in your schema (specifically, the schema that is configured for your Record Writer) then the fields will not be written out to the FlowFile. ## Tags avro, convert, csv, database, db, enrichment, filter, json, logs, lookup, record, route ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.ConvertRecord](/user-guide/data-integration/openflow/processors/convertrecord) - [org.apache.nifi.processors.standard.SplitRecord](/user-guide/data-integration/openflow/processors/splitrecord) --- title: Maintain Openflow Connector for Amazon Kinesis Data Streams source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kinesis/maintenance.md section: Loading & Unloading Data --- # Maintain %kinesis% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/kinesis/about) - [](/user-guide/data-integration/openflow/connectors/kinesis/setup) - [](/user-guide/data-integration/openflow/connectors/kinesis/troubleshoot) - [](/user-guide/data-integration/openflow/connectors/kinesis/performance-tuning) This topic describes how to maintain the %kinesis% connector, including how to manage and reset the connector state. ## Manage connector state The %kinesis% connector uses DynamoDB to store the consumer application state. ### DynamoDB table created by the connector The connector creates a DynamoDB table with the name specified in `AWS Kinesis Application Name`. The table stores the checkpointed sequence number for each shard in the stream. This tracks which records have been processed. If multiple processors use the same application name, they cooperate to consume data from the stream and share this table. If processors have different application names, each creates its own table to independently track consumed records. ## Reset the connector state If the connector state in DynamoDB becomes corrupted or inconsistent, you may need to reset it. There are two approaches to reset the connector state. ### Reset by changing the application name The simplest way to reset the connector state is to change the AWS Kinesis Application Name parameter: 1. Stop the connector. 2. Navigate to the connector's parameter context. 3. Change the `AWS Kinesis Application Name` parameter value to a new value. 4. Start the connector. The connector creates a new DynamoDB table with the new application name and begins consuming records from the position specified by the [AWS Kinesis Initial Stream Position](#label-kinesis-json-source-parameters) parameter. - When you change the application name, the connector doesn't delete the old DynamoDB table. You must manually delete it through the AWS Console or the AWS CLI. - If your IAM policy restricts DynamoDB access to a specific table name, you must update the policy to allow access to the new table name. For more information on configuring IAM permissions, see [](/user-guide/data-integration/openflow/connectors/kinesis/setup). ### Reset by deleting the DynamoDB table Alternatively, you can delete the existing DynamoDB table to reset the state: 1. Stop the connector. 2. In the AWS Console or using the AWS CLI, delete the DynamoDB table associated with the application name. 3. Start the connector. The connector recreates the table and begins consuming records from the position specified by the [AWS Kinesis Initial Stream Position](#label-kinesis-json-source-parameters) parameter. Resetting the connector state causes the connector to reprocess records from the position specified by the initial stream position. Depending on your [AWS Kinesis Initial Stream Position](#label-kinesis-json-source-parameters) setting, this may result in duplicate data being ingested into Snowflake or data not being ingested at all. --- title: Maintaining the Openflow Connector for Shopify source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/shopify/maintain.md section: Loading & Unloading Data --- # Maintaining the %shopifyof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/shopify/about) - [](/user-guide/data-integration/openflow/connectors/shopify/setup) This topic describes maintenance tasks for the %shopifyof%, including how to force a full reload of connector state. ## Force a full reload The connector maintains an internal state to track which objects have been bulk-loaded and the incremental watermark for each object type. In some situations, you might need to force the connector to perform a fresh bulk load, for example, after resolving a data issue or after the connector has been stopped for an extended period. ### Reload all objects To force a full reload for all objects: 1. Stop all processors in the flow by right-clicking on the connector process group and selecting **Stop**. 2. Ensure that no in-flight FlowFiles are being processed. You can verify this by checking that all queues in the flow are empty. 3. Right-click on the canvas and select **Disable all controller services**. 4. Go to **Controller services** and locate the **Shopify State Service**. 5. Select the menu for **Shopify State Service**, then select **View state** and select **Clear state**. 6. Right-click on the canvas and select **Enable all controller services**, then start all processors to resume the connector. The connector treats a cleared state as a fresh start and performs a bulk load for all configured objects on the next execution. ### Reload a specific object To re-ingest a single object type from scratch without affecting other objects: 1. Stop all processors in the flow. 2. Ensure all queues are empty. 3. Right-click on the canvas and select **Disable all controller services**. 4. Go to **Controller services** and locate the **Shopify State Service**. 5. Select the menu for **Shopify State Service**, then select **View state**. 6. Select the trash icon next to the specific object type (for example, `orders`) to delete its state entry. 7. Re-enable all controller services and start the flow. The connector performs a bulk load for that object type and then resumes incremental updates, while other objects continue from their existing watermark. --- title: Manage Openflow source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/manage.md section: Loading & Unloading Data --- # Manage Openflow This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/setup-openflow-byoc) - [](/user-guide/data-integration/openflow/setup-openflow-spcs) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/version-history) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to manage Openflow components. ## Back up flow definitions and protect runtime state Flow definitions and runtime-local state (including processor configuration and Apache NiFi flow state held on the runtime) live on **Openflow runtime storage**, not in Snowflake tables. If you remove, replace, or manually tear down that infrastructure **without** exporting your flows first, that data can be **lost permanently**. Snowflake does not provide Time Travel or Fail-safe for this storage. Before you delete a deployment, delete or recreate a runtime, or manually remove underlying Snowpark Container Services resources or compute tied to Openflow, **export** your flows from the canvas. Right-click the **process group** %ra% **Version** %ra% **Export** (or use the equivalent command your canvas shows). **Routine upgrades** through the supported Openflow UI ([Upgrade a deployment](#label-update-a-deployment) and [Upgrade a runtime](#label-openflow-upgrading-a-runtime)) are different from destructive removal. You should still export flows regularly as a best practice. Do not run [DROP ROLE](/sql-reference/sql/drop-role) for a role that provisions or owns Openflow objects until you transfer ownership and privileges to another role you intend to keep (for example with `GRANT OWNERSHIP`). Dropping a role revokes grants and can leave deployments in a broken state. ## Runtime availability and autoscaling behavior Openflow runtime nodes are not strictly always-on, single-host processes. Each runtime is a Kubernetes workload that the cluster can reschedule onto a different compute host. When that happens, the runtime briefly restarts while a new pod becomes ready. Plan your flows to tolerate short interruptions rather than assuming the runtime stays on the same host indefinitely. Snowflake doesn't automatically upgrade BYOC runtimes. Upgrades happen only when a deployment owner initiates them through the Openflow UI or the deployment agent. Restarts you observe outside of an upgrade window are typically caused by cluster rebalancing or by host-level events on the underlying compute. For Openflow Snowflake deployments running on [Snowpark Container Services](/developer-guide/snowpark-container-services/overview) (SPCS), runtimes can also be affected briefly by the scheduled SPCS [maintenance window](/developer-guide/snowpark-container-services/working-with-compute-pool#label-spcs-working-with-compute-pool-maintenance-window). ### Causes of runtime restarts
Runtime or deployment upgrades
When the owner of a deployment runs an upgrade, the affected runtime restarts to pick up the new version. See [Upgrade a runtime](#label-openflow-upgrading-a-runtime) and [Upgrade a deployment](#label-update-a-deployment).
Cluster rebalancing and autoscaling
Openflow scales the underlying compute up and down based on demand. See [](/user-guide/data-integration/openflow/cost-byoc) for details on how BYOC deployments scale the EC2 node group. During scale-in, node-drain, or rebalancing events, the cluster can reschedule a runtime pod from one node to another so that the cluster continues to run efficiently.
Cloud provider host events
The virtual machines that host BYOC runtimes are subject to events outside Snowflake's control, including instance retirement, unexpected reboots, and host-level maintenance performed by the cloud service provider. When a host becomes unavailable, the cluster reschedules the affected runtime onto a healthy node.
### What to expect during a restart - Openflow runtimes and connectors maintain data integrity across restarts. In-flight data held in the runtime's persistent storage is preserved, and the flow resumes after the new pod is ready. - Expect a short service interruption while the new pod starts and reattaches its storage. - Diagnostic output may report `LAST_REQUESTED_RESTART_REASON: "nifi.properties changed"` after a reschedule, even when no NiFi configuration was modified. The runtime operator reconciles the underlying StatefulSet whenever the pod identity or node assignment changes, so this message can reflect a reschedule rather than an actual configuration change. ### Design flows for resilience Because brief runtime interruptions are expected, design your flows to recover automatically: - Configure source and destination connectors to checkpoint progress so that processing resumes from the last committed position after a restart. - For streaming sources such as [Kafka](/user-guide/data-integration/openflow/connectors/kafka/about) or [Kinesis](/user-guide/data-integration/openflow/connectors/kinesis/about), rely on consumer-group offsets or sequence numbers rather than in-memory state on the runtime. - [Monitor your runtimes](/user-guide/data-integration/openflow/monitor) so that you're notified if a restart doesn't recover on its own within the expected window. - Choose your caching strategy with restarts in mind. A local, in-memory cache is cleared when a runtime node restarts, and Openflow's locally persisted caches are managed per runtime node rather than shared across the cluster. If your flow depends on cache state surviving restarts or being shared across nodes, use an external cache service such as Redis. ## Delete a deployment Deleting a deployment removes the management compute pool and all deployment-level configuration. You must delete all runtimes first. Any data or objects already integrated into Snowflake aren't affected. Deleting a deployment can't be undone. Before you delete, make sure all runtimes have been removed and you no longer need the deployment configuration. From the AWS Console: 1. Navigate to EC2 Instances. 2. Select the `openflow-agent-{deployment-key}` instance with your deployment key. 3. Click **Connect** at the top of the page. 4. Switch from **EC2 Instance Connect** to **Connect using EC2 Instance Connect Endpoint**. Leave the default EC2 Instance Connect Endpoint in place. 5. Click **Connect**. A new browser tab or window will appear with a command-line interface. 6. Run `./destroy.sh` from the shell. - This may take 20-30 minutes. If your connection is interrupted, the process continues running in the background. - You can log back in and view its status with the command: `journalctl -u docker -f -n 250` - The `destroy` process is complete when you see output of `delete successful`. 7. Navigate to [CloudFormation](https://us-east-1.console.aws.amazon.com/cloudformation/home) in the AWS Console for your region. 8. Delete the CloudFormation stack for your deployment. From Snowsight: 1. In the navigation menu, select **Ingestion** %raa% **Openflow**. 2. Select **Launch Openflow**. 3. Select the **Deployments** tab. 4. In the row of the deployment you want to delete, select the More options icon. 5. Select **Delete**. 6. In the confirmation dialog, type `delete` to confirm deletion. 7. Click **Delete deployment**. ## Upgrade a deployment A deployment includes several components: the agent, deployment service, deployment UI, runtime gateway, and runtime operator. You can upgrade via the UI or, for BYOC deployments, via the deployment agent script. For details on what's included in each release, see [Openflow version history](/user-guide/data-integration/openflow/version-history). Only the owner of a deployment can perform an upgrade. ### Upgrade from the UI 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **Ingestion** %raa% **Openflow**. 3. Select **Launch Openflow**. 4. Select the **Deployments** tab. 5. Look for the upgrade arrow to the left of the deployment name. This indicates an upgrade is available. ![Deployments tab showing the upgrade arrow indicator next to a deployment name](/static/images/openflow/upgrade-available-deployment.png) 6. Select %sf-vertical-more-button% next to the deployment %raa% **Upgrade**. ### Upgrade via the deployment agent (BYOC) For BYOC deployments, use the deployment agent script to upgrade the agent, deployment service, deployment UI, runtime gateway, and runtime operator. #### Connect to the deployment agent 1. Navigate to Openflow. 2. Select the **Deployments** tab. 3. View your deployment details and note the deployment key. 4. In your AWS account, view the EC2 instances and filter using the deployment key. 5. Locate the deployment agent EC2 instance named `openflow-agent-{deployment-key}`. 6. Connect using EC2 Instance Connect Endpoint and accepting all defaults. 7. Run the remaining commands from the new browser tab or window that appears with a command-line interface. #### Check for available upgrades ```bash cat ~/.upgrade ``` The script will display the latest available version of the various deployment components. If no upgrades are available, you will see an output similar to this: ```text AGENT_IMAGE_VERSION_UPGRADE= OPERATOR_CHART_VERSION_UPGRADE= GATEWAY_IMAGE_VERSION_UPGRADE= DPS_CHART_VERSION_UPGRADE= DPUI_CHART_VERSION_UPGRADE= ``` Otherwise, you will see the version that upgraded components will use, such as: ```text AGENT_IMAGE_VERSION_UPGRADE=0.17.0 OPERATOR_CHART_VERSION_UPGRADE=0.31.0 GATEWAY_IMAGE_VERSION_UPGRADE= DPS_CHART_VERSION_UPGRADE= DPUI_CHART_VERSION_UPGRADE= ``` #### Upgrading the AMI for the Openflow BYOC deployment When you upgrade your Openflow BYOC deployment, Openflow will find and upgrade to the latest AMI for Amazon Linux 2023 recommended by [AWS Systems Manager](https://aws.amazon.com/systems-manager/). If a new AMI is found, it will restart all Openflow services in your deployment, and runtimes will be temporarily halted. Openflow runtimes and connectors maintain data integrity across restarts automatically. Snowflake does not automatically upgrade deployments. You determine upgrade timing and frequency. #### Initiate the upgrade If the output indicates that upgrades are available, run the following script to initiate the upgrade. Older Openflow deployments may use the script `upgrade-data-plane.sh` instead. ```bash ./upgrade.sh ``` You will see output similar to this: ```text openflow-data-plane-agent-aws is set to version 0.16.0 Upgrade set to version 0.17.0 openflow-dataplane-service-chart is set to version 0.47.0 No upgrade is available openflow-dataplane-ui-chart is set to version 0.5.0 No upgrade is available openflow-runtime-gateway is set to version 2025.6.8.2 No upgrade is available runtime-operator-chart is set to version 0.30.0 Upgrade set to version 0.31.0 ``` Then, you have two options: - Wait for an automatic upgrade: The system will automatically initiate the upgrade process within approximately 10 minutes. - Manual upgrade: To start the upgrade immediately, run the following command: ```bash ./create.sh ``` #### Monitor the upgrade process To track the progress of the upgrade, use the `journalctl` command: ```bash journalctl -u openflow-apply-infrastructure -f -n 250 ``` #### Verify a successful upgrade A successful upgrade will typically show output similar to this: ```text All resources applied successfully and log uploaded to s3 openflow-apply-infrastructure.service: Deactivated successfully ``` ## Upgrade a runtime Snowflake periodically releases runtime updates that introduce new Openflow processors, newer versions of existing processors, or new runtime functionality. When updates are available, an indicator appears next to the runtime name in the UI. For details on what's included in each release, see [Openflow version history](/user-guide/data-integration/openflow/version-history). Only the owner of a deployment can perform an upgrade. 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **Ingestion** %raa% **Openflow**. 3. Select **Launch Openflow**. 4. Select the **Runtimes** tab. 5. Look for the upgrade arrow to the left of the runtime name. This indicates an upgrade is available. ![Runtimes tab showing the upgrade arrow indicator next to a runtime name](/static/images/openflow/upgrade-available-runtime.png) 6. Select %sf-vertical-more-button% next to the runtime %raa% **Upgrade**. ## Upgrade a connector Connector updates are made available by Snowflake when functionality is added, processing logic is improved, or new processor versions are used–for example, to add support for a new source API version. When connector updates are available, you will see an **Upgrade** icon in your process group on the canvas. You can only upgrade connectors after you have [upgraded their runtime](#label-openflow-upgrading-a-runtime). To upgrade a connector, do the following: 1. In the navigation menu, select **Ingestion** %raa% **Openflow**. 2. Select **Launch Openflow**. 3. Select the **Runtimes** tab. 4. Select the runtime name, or select **View Canvas** in the **More Options** menu to navigate to the canvas. 5. Find the processor groups with a red upgrade arrow next to their names. For each of these groups, change the version: 1. Recommended: Check to see whether the parameter uses a custom value for the Parameter context. If so, make a note of the custom value. You will need to reapply it after the upgrade. 1. Right-click the process group and select **Parameters**. 2. Select **Parameters** in the Parameter Contexts list. 3. Select the **Inheritance** tab, and check if it uses custom values. If so, make a note of the custom values. 2. Right-click the group and select **Version** %ra% **Change Version**. 3. Select the latest available version and select **Change**. 4. Confirm that the connector was upgraded to the latest version. The upgraded version should show a green check mark. 5. Confirm that all processors in the connector's process group are running. If not, start them. You can also validate the version by hovering over the speech bubble at the bottom right of the process group. 6. If you noted a custom parameter value in step 4, reapply the custom value. For more information, see [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors). ### Configure Snowflake Connector Flow Registry Early preview releases of Openflow did not configure a runtime for connector upgrades. If you don't see the Version option when right clicking on a process group, you have to configure the Snowflake Connector Flow Registry and manually enable version control for existing connectors. To configure the Snowflake Connector Flow Registry, do the following: 1. Navigate to the canvas. 2. Click on the menu in the top right corner and select **Controller Settings**. 3. Switch to the **Registry Clients** tab. 4. Click the **+** icon to add a new Registry Client. 5. Select the **ConnectorFlowRegistryClient** and select **Add**. 6. Click **More Options** for the **ConnectorFlowRegistryClient** row and select **Edit**. 7. Enter `/nifi/configuration_resources/connector_flow_registry` as the value for **Storage Location** and select **Apply**. After configuring the Snowflake Connector Flow Registry you can now enable version control for your existing connectors. To enable version control for existing connectors, do the following: 1. Navigate to the canvas and locate the process group where you want to add version control. 2. Right click on the process group and select **Version** %raa% **Set Version**. 3. In the **Set Version** dialog, choose the flow that matches your process group. For example, choose **sqlserver** if you are using the SQL Server connector. Note that flow names do not exactly match the connector name. 4. Select the latest version and then select **Set version** to enable version control. 5. From the canvas, right click on the process group again and select **Version** %raa% **Revert Local Changes** to apply the latest connector version. 6. Review the list of changes and select **Revert**. 7. Confirm that your connector was upgraded to the latest version which should now show a green check mark. You can also validate the version by hovering over the speech bubble at the bottom right of the process group. --- title: MapCacheClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/mapcacheclientservice.md section: Loading & Unloading Data --- # MapCacheClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides the ability to communicate with a MapCacheServer. This can be used in order to share a Map between nodes in a NiFi cluster ## Tags cache, cluster, distributed, map, state ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: MapCacheServer source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/mapcacheserver.md section: Loading & Unloading Data --- # MapCacheServer This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a map (key/value) cache that can be accessed over a socket. Interaction with this service is typically accomplished via a Map Cache Client Service. ## Tags cache, cluster, distributed, key/value, map, server ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: MergeContent 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/mergecontent.md section: Loading & Unloading Data --- # MergeContent 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Merges a Group of FlowFiles together based on a user-defined strategy and packages them into a single FlowFile. It is recommended that the Processor be configured with only a single incoming connection, as Group of FlowFiles will not be created from FlowFiles in different connections. This processor updates the mime.type attribute as appropriate. NOTE: this processor should NOT be configured with Cron Driven for the Scheduling Strategy. ## Tags archive, concatenation, content, correlation, flowfile-stream, flowfile-stream-v3, merge, stream, tar, zip ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## Use cases | Concatenate FlowFiles with textual content together in order to create fewer, larger FlowFiles. | | ----------------------------------------------------------------------------------------------------------------------------------------------- | | Concatenate FlowFiles with binary content together in order to create fewer, larger FlowFiles. | | Reassemble a FlowFile that was previously split apart into smaller FlowFiles by a processor such as SplitText, UnpackContext, SplitRecord, etc. | ## See also - [org.apache.nifi.processors.standard.MergeRecord](/user-guide/data-integration/openflow/processors/mergerecord) - [org.apache.nifi.processors.standard.SegmentContent](/user-guide/data-integration/openflow/processors/segmentcontent) --- title: MergeRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/mergerecord.md section: Loading & Unloading Data --- # MergeRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description This Processor merges together multiple record-oriented FlowFiles into a single FlowFile that contains all of the Records of the input FlowFiles. This Processor works by creating 'bins' and then adding FlowFiles to these bins until they are full. Once a bin is full, all of the FlowFiles will be combined into a single output FlowFile, and that FlowFile will be routed to the 'merged' Relationship. A bin will consist of potentially many 'like FlowFiles'. In order for two FlowFiles to be considered 'like FlowFiles', they must have the same Schema (as identified by the Record Reader) and, if the <Correlation Attribute Name> property is set, the same value for the specified attribute. See Processor Usage and Additional Details for more information. NOTE: this processor should NOT be configured with Cron Driven for the Scheduling Strategy. ## Tags content, correlation, event, merge, record, stream ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## Use cases | Combine together many arbitrary Records in order to create a single, larger file | | -------------------------------------------------------------------------------- | ## Use Cases Involving Other Components | Combine together many Records that have the same value for a particular field in the data, in order to create a single, larger file | | ----------------------------------------------------------------------------------------------------------------------------------- | ## See also - [org.apache.nifi.processors.standard.MergeContent](/user-guide/data-integration/openflow/processors/mergecontent) - [org.apache.nifi.processors.standard.PartitionRecord](/user-guide/data-integration/openflow/processors/partitionrecord) - [org.apache.nifi.processors.standard.SplitRecord](/user-guide/data-integration/openflow/processors/splitrecord) --- title: MergeSnowflakeJournalTable 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/mergesnowflakejournaltable.md section: Loading & Unloading Data --- # MergeSnowflakeJournalTable 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-database-cdc-processors-nar ## Description Triggers a merge operation on changes from journal table to a destination table in Snowflake. The merge operation is performed asynchronously and the processor polls the result of the operation. If the query is still in progress the FlowFile will be penalized. ## Tags ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: MicrosoftClientCertificateOAuth2TokenProvider source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/microsoftclientcertificateoauth2tokenprovider.md section: Loading & Unloading Data --- # MicrosoftClientCertificateOAuth2TokenProvider This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides OAuth2 access tokens for the Microsoft Graph API using client_credentials with a client certificate. ## Tags access token, authorization, graph, http, microsoft, oauth2, provider ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: MicrosoftGraphAuthenticationProvider source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/microsoftgraphauthenticationprovider.md section: Loading & Unloading Data --- # MicrosoftGraphAuthenticationProvider This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides authentication for the Microsoft Graph API, which can be used for interacting with Microsoft 365 services. ## Tags graph, microsoft, openflow ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: Migrate from the legacy Openflow Connector for Jira Cloud source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/jira-cloud/migrate-from-legacy.md section: Loading & Unloading Data --- # Migrate from the legacy %jira% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/jira-cloud/about) - [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-core) - [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-agile) This topic describes how to migrate from the legacy %jira% to the new %jira%. ## Overview The new connector is a complete rewrite that changes how data is stored in Snowflake. It consists of two separate flows: a **core flow** (issues, projects, comments, changelogs, worklogs, users, votes, watchers, remote links, issue security schemes, and optionally deleted issues) and an **agile flow** (boards, sprints, board mappings). The core flow and agile flow can write to the same Snowflake destination schema, since they create tables with different names. The legacy connector and the new connector can run side by side during migration — including on the same Openflow runtime — as long as they write to **separate** destination schemas, so you can validate the new output before decommissioning the legacy connector. ## Feature comparison
## Key differences ### Schema changes The most significant difference is how data is stored in Snowflake:
Field Description
`database` Required. The exact source database (PDB or CDB) name, matching the database in the table's three-part fully qualified name.
`schema` Required. The exact source schema name.
`table` Required. The exact source table name.
`logicalKey` Required. A non-empty array of source column names that uniquely identify rows in the table.
Property Description
Connection Timeout Maximum time to wait for initial socket connection to the HTTP URL.
HTTP Method HTTP request method (GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS). Arbitrary methods are also supported. Methods other than POST, PUT and PATCH will be sent without a message body.
HTTP URL HTTP remote URL including a scheme of http or https, as well as a hostname or IP address with optional port and path elements. Any encoding of the URL must be done by the user.
HTTP/2 Disabled Disable negotiation of HTTP/2 protocol. HTTP/2 requires TLS. HTTP/1.1 protocol supported is required when HTTP/2 is disabled.
OAuth2 Access Token Refresh Strategy Specifies which strategy should be used to refresh the OAuth2 Access Token.
Request Body Enabled Enable sending HTTP request body for PATCH, POST, or PUT methods.
Request Chunked Transfer-Encoding Enabled Enable sending HTTP requests with the Transfer-Encoding Header set to chunked, and disable sending the Content-Length Header. Transfer-Encoding applies to the body in HTTP/1.1 requests as described in RFC 7230 Section 3.3.1
Request Content-Encoding HTTP Content-Encoding applied to request body during transmission. The receiving server must support the selected encoding to avoid request failures.
Request Content-Type HTTP Content-Type Header applied to when sending an HTTP request body for PATCH, POST, or PUT methods. The Content-Type defaults to application/octet-stream when not configured.
Request Date Header Enabled Enable sending HTTP Date Header on HTTP requests as described in RFC 7231 Section 7.1.1.2.
Request Digest Authentication Enabled Enable Digest Authentication on HTTP requests with Username and Password credentials as described in RFC 7616.
Request Failure Penalization Enabled Enable penalization of request FlowFiles when receiving HTTP response with a status code between 400 and 499.
Request Header Attributes Pattern Regular expression that defines which FlowFile attributes to send as HTTP headers in the request. If not defined, no attributes are sent as headers. Dynamic properties will be always be sent as headers. The dynamic property name will be the header key and the dynamic property value, interpreted as Expression Language, will be the header value. Attributes and their values are limited to ASCII characters due to the requirement of the HTTP protocol.
Request Multipart Form-Data Filename Enabled Enable sending the FlowFile filename attribute as the filename parameter in the Content-Disposition Header for multipart/form-data HTTP requests.
Request Multipart Form-Data Name Enable sending HTTP request body formatted using multipart/form-data and using the form name configured.
Request OAuth2 Access Token Provider Enables managed retrieval of OAuth2 Bearer Token applied to HTTP requests using the Authorization Header.
Request Password The password provided for authentication of HTTP requests. Encoded using Base64 for HTTP Basic Authentication as described in RFC 7617.
Request User-Agent HTTP User-Agent Header applied to requests. RFC 7231 Section 5.5.3 describes recommend formatting.
Request Username The username provided for authentication of HTTP requests. Encoded using Base64 for HTTP Basic Authentication as described in RFC 7617.
Response Body Attribute Name FlowFile attribute name used to write an HTTP response body for FlowFiles transferred to the Original relationship.
Response Body Attribute Size Maximum size in bytes applied when writing an HTTP response body to a FlowFile attribute. Attributes exceeding the maximum will be truncated.
Response Body Ignored Disable writing HTTP response FlowFiles to Response relationship
Response Cache Enabled Enable HTTP response caching described in RFC 7234. Caching responses considers ETag and other headers.
Response Cache Size Maximum size of HTTP response cache in bytes. Caching responses considers ETag and other headers.
Response Cookie Strategy Strategy for accepting and persisting HTTP cookies. Accepting cookies enables persistence across multiple requests.
Response FlowFile Naming Strategy Determines the strategy used for setting the filename attribute of FlowFiles transferred to the Response relationship.
Response Generation Required Enable generation and transfer of a FlowFile to the Response relationship regardless of HTTP response status code received.
Response Header Request Attributes Enabled Enable adding HTTP response headers as attributes to FlowFiles transferred to the Original, Retry or No Retry relationships.
Response Header Request Attributes Prefix Prefix to HTTP response headers when included as attributes to FlowFiles transferred to the Original, Retry or No Retry relationships. It is recommended to end with a separator character like '.' or '-'.
Response Redirects Enabled Enable following HTTP redirects sent with HTTP 300 series responses as described in RFC 7231 Section 6.4.
SSL Context Service SSL Context Service provides trusted certificates and client certificates for TLS communication.
Socket Idle Connections Maximum number of idle connections to the HTTP URL.
Socket Idle Timeout Maximum time to wait before closing idle connections to the HTTP URL.
Socket Read Timeout Maximum time to wait for receiving responses from a socket connection to the HTTP URL.
Socket Write Timeout Maximum time to wait for write operations while sending requests from a socket connection to the HTTP URL.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
Name Description
Failure Request FlowFiles transferred when receiving socket communication errors.
No Retry Request FlowFiles transferred when receiving HTTP responses with a status code between 400 an 499.
Original Request FlowFiles transferred when receiving HTTP responses with a status code between 200 and 299.
Response Response FlowFiles transferred when receiving HTTP responses with a status code between 200 and 299. Enabling [Response Generation Required] changes routing behavior, sending unsuccessful responses to their corresponding relationships and also sending FlowFiles to the Response relationship as well, regardless of status code received.
Retry Request FlowFiles transferred when receiving HTTP responses with a status code between 500 and 599.
Name Description
invokehttp.status.code The status code that is returned
invokehttp.status.message The status message that is returned
invokehttp.response.body In the instance where the status code received is not a success (2xx) then the response body will be put to the 'invokehttp.response.body' attribute of the request FlowFile.
invokehttp.request.url The original request URL
invokehttp.request.duration Duration (in milliseconds) of the HTTP call to the external endpoint
invokehttp.response.url The URL that was ultimately requested after any redirects were followed
invokehttp.tx.id The transaction ID that is returned after reading the response
invokehttp.remote.dn The DN of the remote server
invokehttp.java.exception.class The Java exception class raised when the processor fails
invokehttp.java.exception.message The Java exception message raised when the processor fails
user-defined If the 'Put Response Body In Attribute' property is set then whatever it is set to will become the attribute key and the value would be the body of the HTTP response.
Property Description
Module Directory Comma-separated list of paths to files and/or directories which contain modules required by the script.
Script Body Body of script to execute. Only one of Script File or Script Body may be used
Script Engine Language Engine for executing scripts
Script File Path to script file to execute. Only one of Script File or Script Body may be used
Scopes Description
LOCAL Scripts can store and retrieve state using the State Management APIs. Consult the State Manager section of the Developer's Guide for more details.
CLUSTER Scripts can store and retrieve state using the State Management APIs. Consult the State Manager section of the Developer's Guide for more details.
Required Permission Explanation
execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
Display Name API Name Default Value Allowable Values Description
MaxMind Database File * database-file Path to Maxmind IP Enrichment Database File
Lookup Anonymous IP Information * lookup-anonymous-ip false - true - false Specifies whether or not information about whether or not the IP address belongs to an anonymous network should be returned.
Lookup Geo Enrichment * lookup-city true - true - false Specifies whether or not information about the geographic information, such as cities, corresponding to the IP address should be returned
Lookup Connection Type * lookup-connection-type false - true - false Specifies whether or not information about the Connection Type corresponding to the IP address should be returned. If true, the lookup will contain a 'connectionType' field that (if populated) will contain a value of 'Dialup', 'Cable/DSL', 'Corporate', or 'Cellular'
Lookup Domain Name * lookup-domain false - true - false Specifies whether or not information about the Domain Name corresponding to the IP address should be returned. If true, the lookup will contain second-level domain information, such as foo.com but will not contain bar.foo.com
Lookup ISP * lookup-isp false - true - false Specifies whether or not information about the Information Service Provider corresponding to the IP address should be returned
Property Description
IP Address Attribute The name of an attribute whose value is a dotted decimal IP address for which enrichment should occur
Log Level The Log Level to use when an IP is not found in the database. Accepted values: INFO, DEBUG, WARN, ERROR.
MaxMind Database File Path to Maxmind IP Enrichment Database File
Name Description
found Where to route flow files after successfully enriching attributes with data provided by database
not found Where to route flow files after unsuccessfully enriching attributes because no data was found
Name Description
X.isp.lookup.micros The number of microseconds that the geo lookup took
X.isp.asn The Autonomous System Number (ASN) identified for the IP address
X.isp.asn.organization The Organization Associated with the ASN identified
X.isp.name The name of the ISP associated with the IP address provided
X.isp.organization The Organization associated with the IP address provided
Display Name API Name Default Value Allowable Values Description
Authentication Header Charset * Authentication Header Charset US-ASCII The charset for Basic Authentication header base64 string.
Connection Attempt Count * Connection Attempt Count 3 The number of times to try and establish a connection.
Connection Timeout * Connection Timeout 3 sec The timeout to connect the WebSocket URI.
Custom Authorization Custom Authorization Configures a custom HTTP Authorization Header as described in RFC 7235 Section 4.2. Setting a custom Authorization Header excludes configuring the User Name and User Password properties for Basic Authentication.
HTTP Proxy Host HTTP Proxy Host The host name of the HTTP Proxy.
HTTP Proxy Port HTTP Proxy Port The port number of the HTTP Proxy.
Idle Timeout * Idle Timeout 0 sec The maximum amount of time that a WebSocket connection may remain idle before it is closed. A value of 0 sec disables the timeout.
Input Buffer Size * Input Buffer Size 4 kb The size of the input (read from network layer) buffer size.
Max Binary Message Size * Max Binary Message Size 64 kb The maximum size of a binary message during parsing/generating.
Max Text Message Size * Max Text Message Size 64 kb The maximum size of a text message during parsing/generating.
Password Password The user password for Basic Authentication.
SSL Context Service SSL Context Service The SSL Context Service to use in order to secure the server. If specified, the server will accept only WSS requests; otherwise, the server will accept only WS requests
Session Maintenance Interval * Session Maintenance Interval 10 sec The interval between session maintenance activities. A WebSocket session established with a WebSocket server can be terminated due to different reasons including restarting the WebSocket server or timing out inactive sessions. This session maintenance activity is periodically executed in order to reconnect those lost sessions, so that a WebSocket client can reuse the same session id transparently after it reconnects successfully. The maintenance activity is executed until corresponding processors or this controller service is stopped.
Username Username The user name for Basic Authentication.
WebSocket URI * WebSocket URI The WebSocket URI this client connects to.
Display Name API Name Default Value Allowable Values Description
Basic Authentication Enabled * Basic Authentication Enabled false - true - false If enabled, client connection requests are authenticated with Basic authentication using the specified Login Provider.
Basic Authentication Path Spec Basic Authentication Path Spec /* Specify a Path Spec to apply Basic Authentication.
Basic Authentication Roles Basic Authentication Roles `**` The authenticated user must have one of specified role. Multiple roles can be set as comma separated string. '*' represents any role and so does '**' any role including no role.
Client Authentication * Client Authentication no - No Authentication - Want Authentication - Need Authentication Specifies whether or not the Processor should authenticate client by its certificate. This value is ignored if the <SSL Context Service> Property is not specified or the SSL Context provided uses only a KeyStore and not a TrustStore.
Idle Timeout * Idle Timeout 0 sec The maximum amount of time that a WebSocket connection may remain idle before it is closed. A value of 0 sec disables the timeout.
Input Buffer Size * Input Buffer Size 4 kb The size of the input (read from network layer) buffer size.
Login Service Login Service hash - HashLoginService Specify which Login Service to use for Basic Authentication.
Max Binary Message Size * Max Binary Message Size 64 kb The maximum size of a binary message during parsing/generating.
Max Text Message Size * Max Text Message Size 64 kb The maximum size of a text message during parsing/generating.
Port * Port The port number on which this WebSocketServer listens to.
SSL Context Service SSL Context Service The SSL Context Service to use in order to secure the server. If specified, the server will accept only WSS requests; otherwise, the server will accept only WS requests
Users Properties File users-properties-file Specify a property file containing users for Basic Authentication using HashLoginService. See [http://www.eclipse.org/jetty/documentation/current/configuring-security.html](http://www.eclipse.org/jetty/documentation/current/configuring-security.html) for detail.
Display Name API Name Default Value Allowable Values Description
JMS SSL Context Service SSL Context Service The SSL Context Service used to provide client certificate information for TLS/SSL connections.
JMS Broker URI broker URI pointing to the network location of the JMS Message broker. Example for ActiveMQ: '[tcp://myhost:61616](tcp://myhost:61616)'. Examples for IBM MQ: 'myhost(1414)' and 'myhost01(1414),myhost02(1414)'.
JMS Connection Factory Implementation Class * cf The fully qualified name of the JMS ConnectionFactory implementation class (eg. org.apache.activemq.ActiveMQConnectionFactory).
JMS Client Libraries cflib Path to the directory with additional resources (eg. JARs, configuration files etc.) to be added to the classpath (defined as a comma separated list of values). Such resources typically represent target JMS client libraries for the ConnectionFactory implementation.
Required Permission Explanation
reference remote resources Client Library Location can reference resources over HTTP
Display Name API Name Default Value Allowable Values Description
JNDI Name of the Connection Factory * connection.factory.name The name of the JNDI Object to lookup for the Connection Factory.
JNDI Initial Context Factory Class * java.naming.factory.initial The fully qualified class name of the JNDI Initial Context Factory Class (java.naming.factory.initial).
JNDI Provider URL * java.naming.provider.url The URL of the JNDI Provider to use as the value for java.naming.provider.url. See additional details documentation for allowed URL schemes.
JNDI Credentials java.naming.security.credentials The Credentials to use when authenticating with JNDI (java.naming.security.credentials).
JNDI Principal java.naming.security.principal The Principal to use when authenticating with JNDI (java.naming.security.principal).
JNDI / JMS Client Libraries naming.factory.libraries Specifies jar files and/or directories to add to the ClassPath in order to load the JNDI / JMS client libraries. This should be a comma-separated list of files, directories, and/or URLs. If a directory is given, any files in that directory will be included, but subdirectories will not be included (i.e., it is not recursive).
Property Description
Default Decimal Precision When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'precision' denoting number of available digits is required. Generally, precision is defined by column data type definition or database engines default. However undefined precision (0) can be returned from some database engines. 'Default Decimal Precision' is used when writing those undefined precision numbers.
Default Decimal Scale When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'scale' denoting number of available decimal digits is required. Generally, scale is defined by column data type definition or database engines default. However when undefined precision (0) is returned, scale can also be uncertain with some database engines. 'Default Decimal Scale' is used when writing those undefined numbers. If a value has more decimals than specified scale, then the value will be rounded-up, e.g. 1.53 becomes 2 with scale 0, and 1.5 with scale 1.
Enrichment Record Reader The Record Reader for reading the 'enrichment' FlowFile
Insertion Record Path Specifies where in the 'original' Record the 'enrichment' Record's fields should be inserted. Note that if the RecordPath does not point to any existing field in the original Record, the enrichment will not be inserted.
Join Strategy Specifies how to join the two FlowFiles into a single FlowFile
Maximum number of Bins Specifies the maximum number of bins that can be held in memory at any one time
Original Record Reader The Record Reader for reading the 'original' FlowFile
Record Writer The Record Writer to use for writing the results. If the Record Writer is configured to inherit the schema from the Record, the schema that it will inherit will be the result of merging both the 'original' record schema and the 'enrichment' record schema.
SQL The SQL SELECT statement to evaluate. Expression Language may be provided, but doing so may result in poorer performance. Because this Processor is dealing with two FlowFiles at a time, it 's also important to understand how attributes will be referenced. If both FlowFiles have an attribute with the same name but different values, the Expression Language will resolve to the value provided by the' enrichment' FlowFile.
Timeout Specifies the maximum amount of time to wait for the second FlowFile once the first arrives at the processor, after which point the first FlowFile will be routed to the 'timeout' relationship.
Name Description
failure If both the 'original' and 'enrichment' FlowFiles arrive at the processor but there was a failure in joining the records, both of those FlowFiles will be routed to this relationship.
joined The resultant FlowFile with Records joined together from both the original and enrichment FlowFiles will be routed to this relationship
original Both of the incoming FlowFiles ('original' and 'enrichment') will be routed to this Relationship. I.e., this is the 'original' version of both of these FlowFiles.
timeout If one of the incoming FlowFiles (i.e., the 'original' FlowFile or the 'enrichment' FlowFile) arrives to this Processor but the other does not arrive within the configured Timeout period, the FlowFile that did arrive is routed to this relationship.
Name Description
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer
record.count The number of records in the FlowFile
Property Description
Custom Module Directory Comma-separated list of paths to files and/or directories which contain modules containing custom transformations (that are not included on NiFi's classpath).
Custom Transformation Class Name Fully Qualified Class Name for Custom Transformation
JSON Source Specifies whether the Jolt transformation is applied to FlowFile JSON content or to specified FlowFile JSON attribute.
JSON Source Attribute The FlowFile attribute containing JSON to be transformed.
Jolt Specification Jolt Specification for transformation of JSON data. The value for this property may be the text of a Jolt specification or the path to a file containing a Jolt specification. 'Jolt Specification' must be set, or the value is ignored if the Jolt Sort Transformation is selected.
Jolt Transform Specifies the Jolt Transformation that should be used with the provided specification.
Max String Length The maximum allowed length of a string value when parsing the JSON document
Pretty Print Apply pretty print formatting to the output of the Jolt transform
Transform Cache Size Compiling a Jolt Transform can be fairly expensive. Ideally, this will be done only once. However, if the Expression Language is used in the transform, we may need a new Transform for each FlowFile. This value controls how many of those Transforms we cache in memory in order to avoid having to compile the Transform each time.
Name Description
failure If the JSON transformation fails (e.g., due to invalid JSON in the content or attribute), the original FlowFile is routed to this relationship.
success The FlowFile with successfully transformed content or updated attribute will be routed to this relationship
Name Description
mime.type Always set to application/json
Property Description
Custom Module Directory Comma-separated list of paths to files and/or directories which contain modules containing custom transformations (that are not included on NiFi's classpath).
Custom Transformation Class Name Fully Qualified Class Name for Custom Transformation
Jolt Specification Jolt Specification for transformation of JSON data. The value for this property may be the text of a Jolt specification or the path to a file containing a Jolt specification. 'Jolt Specification' must be set, or the value is ignored if the Jolt Sort Transformation is selected.
Jolt Transform Specifies the Jolt Transformation that should be used with the provided specification.
Transform Cache Size Compiling a Jolt Transform can be fairly expensive. Ideally, this will be done only once. However, if the Expression Language is used in the transform, we may need a new Transform for each FlowFile. This value controls how many of those Transforms we cache in memory in order to avoid having to compile the Transform each time.
jolt-record-record-reader Specifies the Controller Service to use for parsing incoming data and determining the data's schema.
jolt-record-record-writer Specifies the Controller Service to use for writing out the records
Name Description
failure If a FlowFile fails processing for any reason (for example, the FlowFile records cannot be parsed), it will be routed to this relationship
original The original FlowFile that was transformed. If the FlowFile fails processing, nothing will be sent to this relationship
success The FlowFile with transformed content will be routed to this relationship
Name Description
record.count The number of records in an outgoing FlowFile
mime.type The MIME Type that the configured Record Writer indicates is appropriate
Property Description
jslt-transform-cache-size Compiling a JSLT Transform can be fairly expensive. Ideally, this will be done only once. However, if the Expression Language is used in the transform, we may need a new Transform for each FlowFile. This value controls how many of those Transforms we cache in memory in order to avoid having to compile the Transform each time.
jslt-transform-pretty_print Apply pretty-print formatting to the output of the JSLT transform
jslt-transform-result-filter A filter for output JSON results using a JSLT expression. This property supports changing the default filter, which removes JSON objects with null values, empty objects and empty arrays from the output JSON. This JSLT must return true for each JSON object to be included and false for each object to be removed. Using a filter value of "true" to disables filtering.
jslt-transform-transformation JSLT Transformation for transform of JSON data. Any NiFi Expression Language present will be evaluated first to get the final transform to be applied. The JSLT Tutorial provides an overview of supported expressions: [https://github.com/schibsted/jslt/blob/master/tutorial.md](https://github.com/schibsted/jslt/blob/master/tutorial.md)
jslt-transform-transformation-strategy Whether to apply the JSLT transformation to the entire FlowFile contents or each JSON object in the root-level array
Name Description
failure If a FlowFile fails processing for any reason (for example, the FlowFile is not valid JSON), it will be routed to this relationship
success The FlowFile with transformed content will be routed to this relationship
Name Description
mime.type Always set to application/json
Display Name API Name Default Value Allowable Values Description
Account ID * Account ID The ID of the Box account which the app will act on behalf of.
App Actor * App Actor impersonated-user - Service Account - Impersonated User Specifies on behalf of whom Box API calls will be made.
App Config File App Config File Full path of an App config JSON file. See Additional Details for more information.
App Config JSON App Config JSON The raw JSON containing an App config. See Additional Details for more information.
Connect Timeout * Connect Timeout 10 secs Maximum amount of time to wait before failing during initial socket connection.
Read Timeout * Read Timeout 30 secs Maximum amount of time to wait before failing while reading socket responses.
Proxy Configuration Service proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Display Name API Name Default Value Allowable Values Description
Allow Comments * Allow Comments false - true - false Whether to allow comments when parsing the JSON document
Date Format Date Format Specifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/01/2017).
Max String Length * Max String Length 20 MB The maximum allowed length of a string value when parsing the JSON document
Schema Access Strategy * Schema Access Strategy infer-schema - Use 'Schema Name' Property - Use 'Schema Text' Property - Schema Reference Reader - Infer Schema Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property
Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier
Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry
Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema
Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
Time Format Time Format Specifies the format to use when reading/writing Time fields. If not specified, Time fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15).
Timestamp Format Timestamp Format Specifies the format to use when reading/writing Timestamp fields. If not specified, Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/01/2017 18:04:15).
Property Description
Aggregation Results Format Format of Aggregation output.
Aggregation Results Split Output a flowfile containing all aggregations or one flowfile for each individual aggregation.
Aggregations One or more query aggregations (or "aggs"), in JSON syntax. Ex: \{"items": \{"terms": \{"field": "product", "size": 10\}\}\}
Client Service An Elasticsearch client service to use for running queries.
Fields Fields of indexed documents to be retrieved, in JSON syntax. Ex: ["user.id", "http.response.*", \{"field": "@timestamp", "format": "epoch_millis"\}]
Index The name of the index to use.
Max JSON Field String Length The maximum allowed length of a string value when parsing a JSON document or attribute.
Output No Hits Output a "hits" flowfile even if no hits found for query. If true, an empty "hits" flowfile will be output even if "aggregations" are output.
Query A query in JSON syntax, not Lucene syntax. Ex: \{"query":\{"match":\{"somefield":"somevalue"\}\}\}. If this parameter is not set, the query will be read from the flowfile content. If the query (property and flowfile content) is empty, a default empty JSON Object will be used, which will result in a "match_all" query in Elasticsearch.
Query Attribute If set, the executed query will be set on each result flowfile in the specified attribute.
Query Clause A "query" clause in JSON syntax, not Lucene syntax. Ex: \{"match":\{"somefield":"somevalue"\}\}. If the query is empty, a default JSON Object will be used, which will result in a "match_all" query in Elasticsearch.
Query Definition Style How the JSON Query will be defined for use by the processor.
Script Fields Fields to created using script evaluation at query runtime, in JSON syntax. Ex: \{"test1": \{"script": \{"lang": "painless", "source": "doc[ 'price'].value * 2"\}\}, "test2": \{"script": \{"lang": "painless", "source": "doc[ 'price'].value * params.factor", "params": \{"factor": 2.0\}\}\}\}
Search Results Format Format of Hits output.
Search Results Split Output a flowfile containing all hits or one flowfile for each individual hit.
Size The maximum number of documents to retrieve in the query. If the query is paginated, this "size" applies to each page of the query, not the "size" of the entire result set.
Sort Sort results by one or more fields, in JSON syntax. Ex: [\{"price" : \{"order" : "asc", "mode" : "avg"\}\}, \{"post_date" : \{"format": "strict_date_optional_time_nanos"\}\}]
Type The type of this document (used by Elasticsearch for indexing and searching).
Name Description
aggregations Aggregations are routed to this relationship.
failure All flowfiles that fail for reasons unrelated to server availability go to this relationship.
hits Search hits are routed to this relationship.
original All original flowfiles that don't cause an error to occur go to this relationship.
Name Description
mime.type application/json
aggregation.name The name of the aggregation whose results are in the output flowfile
aggregation.number The number of the aggregation whose results are in the output flowfile
hit.count The number of hits that are in the output flowfile
elasticsearch.query.error The error message provided by Elasticsearch if there is an error querying the index.
Display Name API Name Default Value Allowable Values Description
Allow Scientific Notation * Allow Scientific Notation false - true - false Specifies whether or not scientific notation should be used when writing numbers
Date Format Date Format Specifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/01/2017).
Pretty Print JSON * Pretty Print JSON false - true - false Specifies whether or not the JSON should be pretty printed
Schema Access Strategy * Schema Access Strategy inherit-record-schema - Inherit Record Schema - Use 'Schema Name' Property - Use 'Schema Text' Property Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
Schema Cache Schema Cache Specifies a Schema Cache to add the Record Schema to so that Record Readers can quickly lookup the schema.
Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property
Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier
Schema Reference Writer * Schema Reference Writer Service implementation responsible for writing FlowFile attributes or content header with Schema reference information
Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry
Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema
Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
Schema Write Strategy * Schema Write Strategy no-schema - Do Not Write Schema - Set 'schema.name' Attribute - Set 'avro.schema' Attribute - Schema Reference Writer Specifies how the schema for a Record should be added to the data.
Time Format Time Format Specifies the format to use when reading/writing Time fields. If not specified, Time fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15).
Timestamp Format Timestamp Format Specifies the format to use when reading/writing Timestamp fields. If not specified, Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/01/2017 18:04:15).
Compression Format * compression-format none - none - gzip - bzip2 - xz-lzma2 - snappy - snappy framed - zstd The compression format to use. Valid values are: GZIP, BZIP2, ZSTD, XZ-LZMA2, LZMA, Snappy, and Snappy Framed
Compression Level * compression-level 1 - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 The compression level to use; this is valid only when using GZIP compression. A lower value results in faster processing but less compression; a value of 0 indicates no compression but simply archiving
Output Grouping * output-grouping output-array - Array - One Line Per Object Specifies how the writer should output the JSON records (as an array or one object per line, e.g.) Note that if 'One Line Per Object' is selected, then Pretty Print JSON must be false.
Suppress Null Values * suppress-nulls never-suppress - Never Suppress - Always Suppress - Suppress Missing Values Specifies how the writer should handle a null field
Display Name API Name Default Value Allowable Values Description
Filter JSON Filter JSON JSON representation of the column filter
Display Name API Name Default Value Allowable Values Description
Allow Comments * Allow Comments false - true - false Whether to allow comments when parsing the JSON document
Date Format Date Format Specifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/01/2017).
Max String Length * Max String Length 20 MB The maximum allowed length of a string value when parsing the JSON document
Schema Access Strategy * Schema Access Strategy infer-schema - Infer Schema - Use 'Schema Name' Property - Use 'Schema Text' Property - Schema Reference Reader Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property
Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier
Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry
Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema
Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
Time Format Time Format Specifies the format to use when reading/writing Time fields. If not specified, Time fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15).
Timestamp Format Timestamp Format Specifies the format to use when reading/writing Timestamp fields. If not specified, Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/01/2017 18:04:15).
Schema Application Strategy * schema-application-strategy SELECTED_PART - Whole JSON - Selected Part Specifies whether the schema is defined for the whole JSON or for the selected part starting from "Starting Field Name".
Schema Inference Cache schema-inference-cache Specifies a Schema Cache to use when inferring the schema. If not populated, the schema will be inferred each time. However, if a cache is specified, the cache will first be consulted and if the applicable schema can be found, it will be used instead of inferring the schema.
Starting Field Name starting-field-name Skips forward to the given nested JSON field (array or object) to begin processing.
Starting Field Strategy * starting-field-strategy ROOT_NODE - Root Node - Nested Field Start processing from the root node or from a specified nested node.
Display Name API Name Default Value Allowable Values Description
Assertion Parameter Name * Assertion Parameter Name assertion Name of the parameter to use for the JWT assertion in the request to the token endpoint.
Audience Audience The audience claim (aud) for the JWT. Space-separated list of audiences if multiple are expected.
Grant Type * Grant Type [urn:ietf:params:oauth:grant-type:jwt-bearer](urn:ietf:params:oauth:grant-type:jwt-bearer) Value to set for the grant_type parameter in the request to the token endpoint.
Issuer Issuer The issuer claim (iss) for the JWT.
JWT Expiration Time * JWT Expiration Time 1 hour Expiration time used to set the corresponding claim of the JWT. In case the returned access token does not includean expiration time, this will be used with the refresh window to re-acquire a new access token.
JWT ID JWT ID The "jti" (JWT ID) claim provides a unique identifier for the JWT. The identifier value must be assigned in amanner that ensures that there's a negligible probability that the same value will be accidentally assigned to adifferent data object; if the application uses multiple issuers, collisions MUST be prevented among values producedby different issuers as well. The "jti" value is a case-sensitive string. If set, it is recommended to set thisvalue to $\{UUID()\}.
Key ID Key ID The ID of the public key used to sign the JWT. It'll be used as the kid header in the JWT.
Private Key Service * Private Key Service The private key service to use for signing JWTs.
Refresh Window * Refresh Window 5 minutes The service will attempt to refresh tokens expiring within the refresh window, subtracting the configured duration from the token expiration.
SSL Context Service * SSL Context Service An instance of SSLContextProvider configured with a certificate that will be used to set the x5t header. Must be using RSA algorithm.
Scope Scope The scope claim (scope) for the JWT.
Set JWT Header X.509 Cert Thumbprint * Set JWT Header X.509 Cert Thumbprint false - true - false If true, will set the JWT header x5t field with the base64url-encoded SHA-256 thumbprint of the X.509 certificate's DER encoding.If set to true, an instance of SSLContextProvider must be configured with a certificate using RSA algorithm.
Signing Algorithm * Signing Algorithm PS256 - RS256 - RS384 - RS512 - PS256 - PS384 - PS512 - ES256 - ES384 - ES512 - Ed25519 The algorithm to use for signing the JWT.
Subject Subject The subject claim (sub) for the JWT.
Token Endpoint URL * Token Endpoint URL The URL of the OAuth2 token endpoint.
Web Client Service * Web Client Service The Web Client Service to use for calling the token endpoint.
Display Name API Name Default Value Allowable Values Description
SSL Context Service SSL Context Service Service supporting SSL communication with Kafka brokers
Acknowledgment Wait Time * ack.wait.time 5 sec After sending a message to Kafka, this indicates the amount of time that the service will wait for a response from Kafka.If Kafka does not acknowledge the message within this time period, the service will throw an exception.
Bootstrap Servers * bootstrap.servers Comma-separated list of Kafka Bootstrap Servers in the format host:port. Corresponds to Kafka bootstrap.servers property
Client Timeout * default.api.timeout.ms 60 sec Default timeout for Kafka client operations. Mapped to Kafka default.api.timeout.ms. The Kafka request.timeout.ms property is derived from half of the configured timeout
Transaction Isolation Level * isolation.level read_committed - Read Committed - Read Uncommitted Specifies how the service should handle transaction isolation levels when communicating with Kafka.The uncommited option means that messages will be received as soon as they are written to Kafka but will be pulled, even if the producer cancels the transactions.The committed option configures the service to not receive any messages for which the producer's transaction was canceled, but this can result in some latency since theconsumer must wait for the producer to finish its entire transaction instead of pulling as the messages become available.Corresponds to Kafka isolation.level property.
Max Metadata Wait Time * max.block.ms 5 sec The amount of time publisher will wait to obtain metadata or wait for the buffer to flush during the 'send' call before failing theentire 'send' call. Corresponds to Kafka max.block.ms property
Max Poll Records * max.poll.records 10000 Maximum number of records Kafka should return in a single poll.
SASL Mechanism * sasl.mechanism GSSAPI - GSSAPI - PLAIN - SCRAM-SHA-256 - SCRAM-SHA-512 SASL mechanism used for authentication. Corresponds to Kafka Client sasl.mechanism property
SASL Password * sasl.password Password provided with configured username when using PLAIN or SCRAM SASL Mechanisms
SASL Username * sasl.username Username provided with configured password when using PLAIN or SCRAM SASL Mechanisms
Security Protocol * security.protocol PLAINTEXT - PLAINTEXT - SSL - SASL_PLAINTEXT - SASL_SSL Security protocol used to communicate with brokers. Corresponds to Kafka Client security.protocol property
Property Description
HubSpot Service HubSpot Client Service.
Object Type HubSpot object type
Updated After Filter objects updated after specified date (format: yyyy-MM-dd)
Scopes Description
CLUSTER Maintains pagination state and last sync timestamp to continue data retrieval from the last known position after restarts and to fetch only changed data.
Name Description
failure HubSpot fail relationship
original The input Flow File is routed to the original relationship.
retry HubSpot retry relationship. FlowFiles that failed to process due to a server timeout or rate limit related error. FlowFiles routed here should be routed back into the processor.
success HubSpot success relationship
Name Description
mime.type application/json
statement.type DELETE
hubspot.object.type HubSpot Object Type for this fetch
hubspot.object.id HubSpot Object ID for this fetch
hubspot.run.id Timestamp of the start of this run. Obtained from the incoming FlowFile or current time if not available
hubspot.is_last Whether this is the last paged object of the ingestion
Property Description
Blob Name Prefix Search prefix for listing
Container Name Name of the Azure storage container. In case of PutAzureBlobStorage processor, container can be created if it does not exist.
Entity Tracking Initial Listing Target Specify how initial listing should be handled. Used by 'Tracking Entities'strategy.
Entity Tracking State Cache Listed entities are stored in the specified cache storage so that this processor can resume listing across NiFi restart or in case of primary node change. 'Tracking Entities'strategy require tracking information of all listed entities within the last 'Tracking Time Window'. To support large number of entities, the strategy uses DistributedMapCache instead of managed state. Cache key format is 'ListedEntities::\{processorId\}(::\{nodeId\})'. If it tracks per node listed entities, then the optional '::\{nodeId\}' part is added to manage state separately. E.g. cluster wide cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b', per node cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b::nifi-node3' The stored cache content is Gzipped JSON string. The cache key will be deleted when target listing configuration is changed. Used by 'Tracking Entities'strategy.
Entity Tracking Time Window Specify how long this processor should track already-listed entities. 'Tracking Entities'strategy can pick any entity whose timestamp is inside the specified time window. For example, if set to '30 minutes', any entity having timestamp in recent 30 minutes will be the listing target when this processor runs. A listed entity is considered 'new/updated' and a FlowFile is emitted if one of following condition meets: 1. does not exist in the already-listed entities, 2. has newer timestamp than the cached entity, 3. has different size than the cached entity. If a cached entity 's timestamp becomes older than specified time window, that entity will be removed from the cached already-listed entities. Used by'Tracking Entities'strategy.
Listing Strategy Specify how to determine new/updated entities. See each strategy descriptions for detail.
Maximum File Age The maximum age that a file must be in order to be pulled; any file older than this amount of time (according to last modification date) will be ignored
Maximum File Size The maximum size that a file can be in order to be pulled
Minimum File Age The minimum age that a file must be in order to be pulled; any file younger than this amount of time (according to last modification date) will be ignored
Minimum File Size The minimum size that a file must be in order to be pulled
Record Writer Specifies the Record Writer to use for creating the listing. If not specified, one FlowFile will be created for each entity that is listed. If the Record Writer is specified, all entities will be written to a single FlowFile instead of adding attributes to individual FlowFiles.
Storage Credentials Controller Service used to obtain Azure Blob Storage Credentials.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
Scopes Description
CLUSTER After performing a listing of blobs, the timestamp of the newest blob is stored if 'Tracking Timestamps' Listing Strategy is in use (by default). This allows the Processor to list only blobs that have been added or modified after this date the next time that the Processor is run. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node can pick up where the previous node left off, without duplicating the data.
Name Description
success All FlowFiles that are received are routed to success
Name Description
azure.container The name of the Azure Blob Storage container
azure.blobname The name of the blob on Azure Blob Storage
azure.primaryUri Primary location of the blob
azure.etag ETag of the blob
azure.blobtype Type of the blob (either BlockBlob, PageBlob or AppendBlob)
mime.type MIME Type of the content
lang Language code for the content
azure.timestamp Timestamp of the blob
azure.length Length of the blob
Property Description
ADLS Credentials Controller Service used to obtain Azure Credentials.
Directory Name Name of the Azure Storage Directory. The Directory Name cannot contain a leading '/'. The root directory can be designated by the empty string value. In case of the PutAzureDataLakeStorage processor, the directory will be created if not already existing.
Entity Tracking Initial Listing Target Specify how initial listing should be handled. Used by 'Tracking Entities'strategy.
Entity Tracking State Cache Listed entities are stored in the specified cache storage so that this processor can resume listing across NiFi restart or in case of primary node change. 'Tracking Entities'strategy require tracking information of all listed entities within the last 'Tracking Time Window'. To support large number of entities, the strategy uses DistributedMapCache instead of managed state. Cache key format is 'ListedEntities::\{processorId\}(::\{nodeId\})'. If it tracks per node listed entities, then the optional '::\{nodeId\}' part is added to manage state separately. E.g. cluster wide cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b', per node cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b::nifi-node3' The stored cache content is Gzipped JSON string. The cache key will be deleted when target listing configuration is changed. Used by 'Tracking Entities'strategy.
Entity Tracking Time Window Specify how long this processor should track already-listed entities. 'Tracking Entities'strategy can pick any entity whose timestamp is inside the specified time window. For example, if set to '30 minutes', any entity having timestamp in recent 30 minutes will be the listing target when this processor runs. A listed entity is considered 'new/updated' and a FlowFile is emitted if one of following condition meets: 1. does not exist in the already-listed entities, 2. has newer timestamp than the cached entity, 3. has different size than the cached entity. If a cached entity 's timestamp becomes older than specified time window, that entity will be removed from the cached already-listed entities. Used by'Tracking Entities'strategy.
File Filter Only files whose names match the given regular expression will be listed
Filesystem Name Name of the Azure Storage File System (also called Container). It is assumed to be already existing.
Include Temporary Files Whether to include temporary files when listing the contents of configured directory paths.
Listing Strategy Specify how to determine new/updated entities. See each strategy descriptions for detail.
Maximum File Age The maximum age that a file must be in order to be pulled; any file older than this amount of time (according to last modification date) will be ignored
Maximum File Size The maximum size that a file can be in order to be pulled
Minimum File Age The minimum age that a file must be in order to be pulled; any file younger than this amount of time (according to last modification date) will be ignored
Minimum File Size The minimum size that a file must be in order to be pulled
Path Filter When 'Recurse Subdirectories' is true, then only subdirectories whose paths match the given regular expression will be scanned
Record Writer Specifies the Record Writer to use for creating the listing. If not specified, one FlowFile will be created for each entity that is listed. If the Record Writer is specified, all entities will be written to a single FlowFile instead of adding attributes to individual FlowFiles.
Recurse Subdirectories Indicates whether to list files from subdirectories of the directory
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
Scopes Description
CLUSTER After performing a listing of files, the timestamp of the newest file is stored. This allows the Processor to list only files that have been added or modified after this date the next time that the Processor is run. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node can pick up where the previous node left off, without duplicating the data.
Name Description
success All FlowFiles that are received are routed to success
Name Description
azure.filesystem The name of the Azure File System
azure.filePath The full path of the Azure File
azure.directory The name of the Azure Directory
azure.filename The name of the Azure File
azure.length The length of the Azure File
azure.lastModified The last modification time of the Azure File
azure.etag The ETag of the Azure File
Property Description
Box Client Service Controller Service used to obtain a Box API connection.
Entity Tracking Initial Listing Target Specify how initial listing should be handled. Used by 'Tracking Entities'strategy.
Entity Tracking State Cache Listed entities are stored in the specified cache storage so that this processor can resume listing across NiFi restart or in case of primary node change. 'Tracking Entities'strategy require tracking information of all listed entities within the last 'Tracking Time Window'. To support large number of entities, the strategy uses DistributedMapCache instead of managed state. Cache key format is 'ListedEntities::\{processorId\}(::\{nodeId\})'. If it tracks per node listed entities, then the optional '::\{nodeId\}' part is added to manage state separately. E.g. cluster wide cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b', per node cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b::nifi-node3' The stored cache content is Gzipped JSON string. The cache key will be deleted when target listing configuration is changed. Used by 'Tracking Entities'strategy.
Entity Tracking Time Window Specify how long this processor should track already-listed entities. 'Tracking Entities'strategy can pick any entity whose timestamp is inside the specified time window. For example, if set to '30 minutes', any entity having timestamp in recent 30 minutes will be the listing target when this processor runs. A listed entity is considered 'new/updated' and a FlowFile is emitted if one of following condition meets: 1. does not exist in the already-listed entities, 2. has newer timestamp than the cached entity, 3. has different size than the cached entity. If a cached entity 's timestamp becomes older than specified time window, that entity will be removed from the cached already-listed entities. Used by'Tracking Entities'strategy.
Folder ID The ID of the folder from which to pull list of files.
Listing Strategy Specify how to determine new/updated entities. See each strategy descriptions for detail.
Minimum File Age The minimum age a file must be in order to be considered; any files younger than this will be ignored.
Record Writer Specifies the Record Writer to use for creating the listing. If not specified, one FlowFile will be created for each entity that is listed. If the Record Writer is specified, all entities will be written to a single FlowFile instead of adding attributes to individual FlowFiles.
Search Recursively When 'true', will include list of files from sub-folders. Otherwise, will return only files that are within the folder defined by the 'Folder ID' property.
Scopes Description
CLUSTER The processor stores necessary data to be able to keep track what files have been listed already. What exactly needs to be stored depends on the 'Listing Strategy'.
Name Description
success All FlowFiles that are received are routed to success
Name Description
box.id The id of the file
filename The name of the file
path The folder path where the file is located
box.size The size of the file
box.timestamp The last modified time of the file
Property Description
Box Client Service Controller Service used to obtain a Box API connection.
Folder ID The ID of the folder from which to fetch files.
Minimum File Age The minimum age a file must be in order to be considered; any files younger than this will be ignored.
Record Writer Specifies the Controller Service to use for writing the metadata records. Must be set.
Search Recursively When 'true', will include files from sub-folders. Otherwise, will return only files that are within the folder defined by the 'Folder ID' property.
Name Description
failure A FlowFile will be routed here if there is an error fetching file metadata from the folder.
not.found FlowFiles for which the specified Box folder was not found will be routed to this relationship.
success A FlowFile containing the file metadata records will be routed to this relationship upon successful processing.
Name Description
box.folder.id The ID of the folder from which files were fetched
record.count The number of records in the FlowFile
mime.type The MIME Type specified by the Record Writer
error.code The error code returned by Box
error.message The error message returned by Box
Property Description
Box Client Service Controller Service used to obtain a Box API connection.
File ID The ID of the file for which to fetch metadata.
Name Description
failure A FlowFile will be routed here if there is an error fetching metadata instances from the file.
not found FlowFiles for which the specified Box file was not found will be routed to this relationship.
success A FlowFile containing the metadata instances records will be routed to this relationship upon successful processing.
Name Description
box.id The ID of the file from which metadata was fetched
record.count The number of records in the FlowFile
mime.type The MIME Type specified by the Record Writer
box.metadata.instances.names Comma-separated list of instances names
box.metadata.instances.count Number of metadata instances found
error.code The error code returned by Box
error.message The error message returned by Box
Property Description
Box Client Service Controller Service used to obtain a Box API connection.
File ID The ID of the file for which to fetch metadata.
Name Description
failure A FlowFile will be routed here if there is an error fetching metadata templates from the file.
not found FlowFiles for which the specified Box file was not found will be routed to this relationship.
success A FlowFile containing the metadata template records will be routed to this relationship upon successful processing.
Name Description
box.file.id The ID of the file from which metadata was fetched
record.count The number of records in the FlowFile
mime.type The MIME Type specified by the Record Writer
box.metadata.templates.names Comma-separated list of template names
box.metadata.templates.count Number of metadata templates found
error.code The error code returned by Box
error.message The error message returned by Box
Property Description
Confluence Client Service Controller service for managing connections to Confluence
Name Description
retry Retryable failure occurred, e.g. rate limiting
success Successfully fetched Confluence group page
Name Description
confluence.group.ids List of identifiers of the Confluence groups.
Property Description
list-db-include-count Whether to include the table's row count as a flow file attribute. This affects performance as a database query will be generated for each table in the retrieved list.
list-db-refresh-interval The amount of time to elapse before resetting the processor state, thereby causing all current tables to be listed. During this interval, the processor may continue to run, but tables that have already been listed will not be re-listed. However new/added tables will be listed as the processor runs. A value of zero means the state will never be automatically reset, the user must Clear State manually.
list-db-tables-catalog The name of a catalog from which to list database tables. The name must match the catalog name as it is stored in the database. If the property is not set, the catalog name will not be used to narrow the search for tables. If the property is set to an empty string, tables without a catalog will be listed.
list-db-tables-db-connection The Controller Service that is used to obtain connection to database
list-db-tables-name-pattern A pattern for matching tables in the database. Within a pattern, "%" means match any substring of 0 or more characters, and "_" means match any one character. The pattern must match the table name as it is stored in the database. If the property is not set, all tables will be retrieved.
list-db-tables-schema-pattern A pattern for matching schemas in the database. Within a pattern, "%" means match any substring of 0 or more characters, and "_" means match any one character. The pattern must match the schema name as it is stored in the database. If the property is not set, the schema name will not be used to narrow the search for tables. If the property is set to an empty string, tables without a schema will be listed.
list-db-tables-types A comma-separated list of table types to include. For example, some databases support TABLE and VIEW types. If the property is not set, tables of all types will be returned.
record-writer Specifies the Record Writer to use for creating the listing. If not specified, one FlowFile will be created for each entity that is listed. If the Record Writer is specified, all entities will be written to a single FlowFile instead of adding attributes to individual FlowFiles.
Scopes Description
CLUSTER After performing a listing of tables, the timestamp of the query is stored. This allows the Processor to not re-list tables the next time that the Processor is run. Specifying the refresh interval in the processor properties will indicate that when the processor detects the interval has elapsed, the state will be reset and tables will be re-listed as a result. This processor is meant to be run on the primary node only.
Name Description
success All FlowFiles that are received are routed to success
Name Description
db.table.name Contains the name of a database table from the connection
db.table.catalog Contains the name of the catalog to which the table belongs (may be null)
db.table.schema Contains the name of the schema to which the table belongs (may be null)
db.table.fullname Contains the fully-qualifed table name (possibly including catalog, schema, etc.)
db.table.type Contains the type of the database table from the connection. Typical types are "TABLE", "VIEW", "SYSTEM TABLE", "GLOBAL TEMPORARY", "LOCAL TEMPORARY", "ALIAS", "SYNONYM"
db.table.remarks Contains the name of a database table from the connection
db.table.count Contains the number of rows in the table
Property Description
DBFS File Path DBFS file path e.g. /directory/file.txt
Databricks Client Databricks Client Service.
Include Directories Include directories in FlowFiles produced.
Recursive Directory Listing Recursively list files in sub directories.
Name Description
failure Databricks failure relationship
original The original FlowFile is routed to this relationship when processing is successful.
success Databricks success relationship
Name Description
filename Base filename of the DBFS file or directory.
path Path to parent directory containing the DBFS file or directory.
absolute.path Full path to the DBFS file or directory.
dbfs.resourceType The type of resource, 'file' or 'directory' of the DBFS resource.
dbfs.size The size of the DBFS file.
dbfs.lastModifiedTime The last modified time of the DBFS file, in milliseconds since epoch in UTC time.
error.code The error code for the SQL statement if an error occurred.
error.message The error message for the SQL statement if an error occurred.
Property Description
Dropbox Credential Service Controller Service used to obtain Dropbox credentials (App Key, App Secret, Access Token, Refresh Token). See controller service's Additional Details for more information.
Entity Tracking Initial Listing Target Specify how initial listing should be handled. Used by 'Tracking Entities'strategy.
Entity Tracking State Cache Listed entities are stored in the specified cache storage so that this processor can resume listing across NiFi restart or in case of primary node change. 'Tracking Entities'strategy require tracking information of all listed entities within the last 'Tracking Time Window'. To support large number of entities, the strategy uses DistributedMapCache instead of managed state. Cache key format is 'ListedEntities::\{processorId\}(::\{nodeId\})'. If it tracks per node listed entities, then the optional '::\{nodeId\}' part is added to manage state separately. E.g. cluster wide cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b', per node cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b::nifi-node3' The stored cache content is Gzipped JSON string. The cache key will be deleted when target listing configuration is changed. Used by 'Tracking Entities'strategy.
Entity Tracking Time Window Specify how long this processor should track already-listed entities. 'Tracking Entities'strategy can pick any entity whose timestamp is inside the specified time window. For example, if set to '30 minutes', any entity having timestamp in recent 30 minutes will be the listing target when this processor runs. A listed entity is considered 'new/updated' and a FlowFile is emitted if one of following condition meets: 1. does not exist in the already-listed entities, 2. has newer timestamp than the cached entity, 3. has different size than the cached entity. If a cached entity 's timestamp becomes older than specified time window, that entity will be removed from the cached already-listed entities. Used by'Tracking Entities'strategy.
Folder The Dropbox identifier or path of the folder from which to pull list of files. 'Folder'should match the following regular expression pattern: /.*|id:.* . Example for folder identifier: id:odTlUvbpIEAAAAAAAAAGGQ. Example for folder path: /Team1/Task1.
Listing Strategy Specify how to determine new/updated entities. See each strategy descriptions for detail.
Minimum File Age The minimum age a file must be in order to be considered; any files newer than this will be ignored.
Record Writer Specifies the Record Writer to use for creating the listing. If not specified, one FlowFile will be created for each entity that is listed. If the Record Writer is specified, all entities will be written to a single FlowFile instead of adding attributes to individual FlowFiles.
Search Recursively Indicates whether to list files from subfolders of the Dropbox folder.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Scopes Description
CLUSTER The processor stores necessary data to be able to keep track what files have been listed already. What exactly needs to be stored depends on the 'Listing Strategy'.
Name Description
success All FlowFiles that are received are routed to success
Name Description
dropbox.id The Dropbox identifier of the file
path The folder path where the file is located
filename The name of the file
dropbox.size The size of the file
dropbox.timestamp The server modified time of the file
dropbox.revision Revision of the file
Property Description
Address The address the FTP server should be bound to. If not set (or set to 0.0.0.0), the server binds to all available addresses (i.e. all network interfaces of the host machine).
Password If the Username is set, then a password must also be specified. The password provided by the client trying to log in to the FTP server will be checked against this password.
Port The Port to listen on for incoming connections. On Linux, root privileges are required to use port numbers below 1024.
SSL Context Service Specifies the SSL Context Service that can be used to create secure connections. If an SSL Context Service is selected, then a keystore file must also be specified in the SSL Context Service. Without a keystore file, the processor cannot be started successfully. Specifying a truststore file is optional. If a truststore file is specified, client authentication is required (the client needs to send a certificate to the server).Regardless of the selected TLS protocol, the highest available protocol is used for the connection. For example if NiFi is running on Java 11 and TLSv1.2 is selected in the controller service as the preferred TLS Protocol, TLSv1.3 will be used (regardless of TLSv1.2 being selected) because Java 11 supports TLSv1.3.
Username The name of the user that is allowed to log in to the FTP server. If a username is provided, a password must also be provided. If no username is specified, anonymous connections will be permitted.
Name Description
success Relationship for successfully received files.
Name Description
filename The name of the file received via the FTP/FTPS connection.
path The path pointing to the file's target directory. E.g.: file.txt is uploaded to /Folder1/SubFolder, then the value of the path attribute will be "/Folder1/SubFolder/" (note that it ends with a separator character).
Property Description
Authorized DN Pattern A Regular Expression to apply against the Subject's Distinguished Name of incoming connections. If the Pattern does not match the Subject DN, the processor will respond with a status of HTTP 403 Forbidden.
Base Path Base path for incoming connections
HTTP Headers to receive as Attributes (Regex) Specifies the Regular Expression that determines the names of HTTP Headers that should be passed along as FlowFile attributes
HTTP Protocols HTTP Protocols supported for Application Layer Protocol Negotiation with TLS
Listening Port The Port to listen on for incoming connections
Max Unconfirmed Flowfile Time The maximum amount of time to wait for a FlowFile to be confirmed before it is removed from the cache
Request Header Maximum Size The maximum supported size of HTTP headers in requests sent to this processor
Return Code The HTTP return code returned after every HTTP call
SSL Context Service SSL Context Service enables support for HTTPS
authorized-issuer-dn-pattern A Regular Expression to apply against the Issuer's Distinguished Name of incoming connections. If the Pattern does not match the Issuer DN, the processor will respond with a status of HTTP 403 Forbidden.
client-authentication Client Authentication policy for TLS connections. Required when SSL Context Service configured.
health-check-port The port to listen on for incoming health check requests. If set, it must be different from the Listening Port. Configure this port if the processor is set to use two-way SSL and a load balancer that does not support client authentication for health check requests is used. Only /<base_path>/healthcheck service is available via this port and only GET and HEAD requests are supported. If the processor is set not to use SSL, SSL will not be used on this port, either. If the processor is set to use one-way SSL, one-way SSL will be used on this port. If the processor is set to use two-way SSL, one-way SSL will be used on this port (client authentication not required).
max-thread-pool-size The maximum number of threads to be used by the embedded Jetty server. The value can be set between 8 and 1000. The value of this property affects the performance of the flows and the operating system, therefore the default value should only be changed in justified cases. A value that is less than the default value may be suitable if only a small number of HTTP clients connect to the server. A greater value may be suitable if a large number of HTTP clients are expected to make requests to the server simultaneously.
multipart-read-buffer-size The threshold size, at which the contents of an incoming file would be written to disk. Only applies for requests with Content-Type: multipart/form-data. It is used to prevent denial of service type of attacks, to prevent filling up the heap or disk space.
multipart-request-max-size The max size of the request. Only applies for requests with Content-Type: multipart/form-data, and is used to prevent denial of service type of attacks, to prevent filling up the heap or disk space
record-reader The Record Reader to use parsing the incoming FlowFile into Records
record-writer The Record Writer to use for serializing Records after they have been transformed
Name Description
success Relationship for successfully received FlowFiles
Property Description
Address Internet Protocol Address on which to listen for OTLP Export Service Requests. The default value enables listening on all addresses.
Batch Size Maximum number of OTLP request resource elements included in each FlowFile produced
Client Authentication Client authentication policy for TLS communication with HTTPS
Port TCP port number on which to listen for OTLP Export Service Requests over HTTP and gRPC
Queue Capacity Maximum number of OTLP request resource elements that can be received and queued
SSL Context Service SSL Context Service enables TLS communication for HTTPS
Worker Threads Number of threads responsible for decoding and queuing incoming OTLP Export Service Requests
Name Description
success Export Service Requests containing OTLP Telemetry
Name Description
mime.type Content-Type set to application/json
resource.type OpenTelemetry Resource Type: LOGS, METRICS, or TRACES
resource.count Count of resource elements included in messages
Property Description
App Token The Application Token that is registered to your Slack application
Bot Token The Bot Token that is registered to your Slack application
Event Type to Receive Specifies the type of Event that the Processor should respond to
Resolve User Details Specifies whether the Processor should lookup details about the Slack User who sent the received message. If true, the output JSON will contain an additional field named 'userDetails'. The 'user' field will still contain the ID of the user. In order to enable this capability, the Bot Token must be granted the 'users:read' and optionally the 'users.profile:read' Bot Token Scope. If the rate limit is exceeded when retrieving this information, the received message will be rejected and must be re-delivered.
Name Description
success All FlowFiles that are created will be sent to this Relationship.
Name Description
mime.type Set to application/json, as the output will always be in JSON format
slack.event.type Set to the type of Slack event that occurred
Property Description
Character Set Specifies the character set of the Syslog messages. Note that Expression language is not evaluated per FlowFile.
Client Auth The client authentication policy to use for the SSL Context. Only used if an SSL Context Service is provided.
Local Network Interface The name of a local network interface to be used to restrict listening to a specific LAN.
Max Batch Size The maximum number of Syslog events to add to a single FlowFile. If multiple events are available, they will be concatenated along with the <Message Delimiter> up to this configured maximum number of messages
Max Size of Message Queue The maximum size of the internal queue used to buffer messages being transferred from the underlying channel to the processor. Setting this value higher allows more messages to be buffered in memory during surges of incoming messages, but increases the total memory used by the processor.
Max Size of Socket Buffer The maximum size of the socket buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped.
Message Delimiter Specifies the delimiter to place between Syslog messages when multiple messages are bundled together (see <Max Batch Size> property).
Parse Messages Indicates if the processor should parse the Syslog messages. If set to false, each outgoing FlowFile will only contain the sender, protocol, and port, and no additional attributes.
Port The port for Syslog communication. Note that Expression language is not evaluated per FlowFile.
Protocol The protocol for Syslog communication.
Receive Buffer Size The size of each buffer used to receive Syslog messages. Adjust this value appropriately based on the expected size of the incoming Syslog messages. When UDP is selected each buffer will hold one Syslog message. When TCP is selected messages are read from an incoming connection until the buffer is full, or the connection is closed.
SSL Context Service The Controller Service to use in order to obtain an SSL Context. If this property is set, syslog messages will be received over a secure connection.
Socket Keep Alive Whether or not to have TCP socket keep alive turned on. Timing details depend on operating system properties.
Worker Threads Number of threads responsible for decoding and queuing incoming syslog messages
Name Description
invalid Syslog messages that do not match one of the expected formats will be sent out this relationship as a FlowFile per message.
success Syslog messages that match one of the expected formats will be sent out this relationship as a FlowFile per message.
Name Description
syslog.priority The priority of the Syslog message.
syslog.severity The severity of the Syslog message derived from the priority.
syslog.facility The facility of the Syslog message derived from the priority.
syslog.version The optional version from the Syslog message.
syslog.timestamp The timestamp of the Syslog message.
syslog.hostname The hostname or IP address of the Syslog message.
syslog.sender The hostname of the Syslog server that sent the message.
syslog.body The body of the Syslog message, everything after the hostname.
syslog.valid An indicator of whether this message matched the expected formats. If this value is false, the other attributes will be empty and only the original message will be available in the content.
syslog.protocol The protocol over which the Syslog message was received.
syslog.port The port over which the Syslog message was received.
mime.type The mime.type of the FlowFile which will be text/plain for Syslog messages.
Property Description
Batching Message Delimiter Specifies the delimiter to place between messages when multiple messages are bundled together (see <Max Batch Size> property).
Character Set Specifies the character set of the received data.
Client Auth The client authentication policy to use for the SSL Context. Only used if an SSL Context Service is provided.
Local Network Interface The name of a local network interface to be used to restrict listening to a specific LAN.
Max Batch Size The maximum number of messages to add to a single FlowFile. If multiple messages are available, they will be concatenated along with the <Message Delimiter> up to this configured maximum number of messages
Max Size of Message Queue The maximum size of the internal queue used to buffer messages being transferred from the underlying channel to the processor. Setting this value higher allows more messages to be buffered in memory during surges of incoming messages, but increases the total memory used by the processor during these surges.
Max Size of Socket Buffer The maximum size of the socket buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped.
Port The port to listen on for communication.
Receive Buffer Size The size of each buffer used to receive messages. Adjust this value appropriately based on the expected size of the incoming messages.
SSL Context Service The Controller Service to use in order to obtain an SSL Context. If this property is set, messages will be received over a secure connection.
Worker Threads The maximum number of worker threads available for servicing TCP connections.
idle-timeout The amount of time a client's connection will remain open if no data is received. The default of 0 seconds will leave connections open until they are closed by the client.
pool-receive-buffers Enable or disable pooling of buffers that the processor uses for handling bytes received on socket connections. The framework allocates buffers as needed during processing.
Name Description
success Messages received successfully will be sent out this relationship.
Name Description
tcp.sender The sending host of the messages.
tcp.port The sending port the messages were received.
client.certificate.issuer.dn For connections using mutual TLS, the Distinguished Name of the Certificate Authority that issued the client's certificate is attached to the FlowFile.
client.certificate.subject.dn For connections using mutual TLS, the Distinguished Name of the client certificate's owner (subject) is attached to the FlowFile.
Property Description
Batching Message Delimiter Specifies the delimiter to place between messages when multiple messages are bundled together (see <Max Batch Size> property).
Character Set Specifies the character set of the received data.
Local Network Interface The name of a local network interface to be used to restrict listening to a specific LAN.
Max Batch Size The maximum number of messages to add to a single FlowFile. If multiple messages are available, they will be concatenated along with the <Message Delimiter> up to this configured maximum number of messages
Max Size of Message Queue The maximum size of the internal queue used to buffer messages being transferred from the underlying channel to the processor. Setting this value higher allows more messages to be buffered in memory during surges of incoming messages, but increases the total memory used by the processor.
Max Size of Socket Buffer The maximum size of the socket buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped.
Port The port to listen on for communication.
Receive Buffer Size The size of each buffer used to receive messages. Adjust this value appropriately based on the expected size of the incoming messages.
Sending Host IP, or name, of a remote host. Only Datagrams from the specified Sending Host Port and this host will be accepted. Improves Performance. May be a system property or an environment variable.
Sending Host Port Port being used by remote host to send Datagrams. Only Datagrams from the specified Sending Host and this port will be accepted. Improves Performance. May be a system property or an environment variable.
Name Description
success Messages received successfully will be sent out this relationship.
Name Description
udp.sender The sending host of the messages.
udp.port The sending port the messages were received.
Property Description
Character Set Specifies the character set of the received data.
Local Network Interface The name of a local network interface to be used to restrict listening to a specific LAN.
Max Size of Message Queue The maximum size of the internal queue used to buffer messages being transferred from the underlying channel to the processor. Setting this value higher allows more messages to be buffered in memory during surges of incoming messages, but increases the total memory used by the processor.
Max Size of Socket Buffer The maximum size of the socket buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped.
Port The port to listen on for communication.
Receive Buffer Size The size of each buffer used to receive messages. Adjust this value appropriately based on the expected size of the incoming messages.
batch-size The maximum number of datagrams to write as records to a single FlowFile. The Batch Size will only be reached when data is coming in more frequently than the Poll Timeout.
poll-timeout The amount of time to wait when polling the internal queue for more datagrams. If no datagrams are found after waiting for the configured timeout, then the processor will emit whatever records have been obtained up to that point.
record-reader The Record Reader to use for reading the content of incoming datagrams.
record-writer The Record Writer to use in order to serialize the data before writing to a flow file.
sending-host IP, or name, of a remote host. Only Datagrams from the specified Sending Host Port and this host will be accepted. Improves Performance. May be a system property or an environment variable.
sending-host-port Port being used by remote host to send Datagrams. Only Datagrams from the specified Sending Host and this port will be accepted. Improves Performance. May be a system property or an environment variable.
Name Description
parse.failure If a datagram cannot be parsed using the configured Record Reader, the contents of the message will be routed to this Relationship as its own individual FlowFile.
success Messages received successfully will be sent out this relationship.
Name Description
udp.sender The sending host of the messages.
udp.port The sending port the messages were received.
record.count The number of records written to the flow file.
mime.type The mime-type of the writer used to write the records to the flow file.
Property Description
server-url-path The WetSocket URL Path on which this processor listens to. Must starts with '/', e.g. '/example'.
websocket-server-controller-service A WebSocket SERVER Controller Service which can accept WebSocket requests.
Name Description
binary message The WebSocket binary message output
connected The WebSocket session is established
disconnected The WebSocket session is disconnected
text message The WebSocket text message output
Name Description
websocket.controller.service.id WebSocket Controller Service id.
websocket.session.id Established WebSocket session id.
websocket.endpoint.id WebSocket endpoint id.
websocket.local.address WebSocket server address.
websocket.remote.address WebSocket client address.
websocket.message.type TEXT or BINARY.
Property Description
Entity Tracking Initial Listing Target Specify how initial listing should be handled. Used by 'Tracking Entities'strategy.
Entity Tracking Node Identifier The configured value will be appended to the cache key so that listing state can be tracked per NiFi node rather than cluster wide when tracking state is scoped to LOCAL. Used by 'Tracking Entities'strategy.
Entity Tracking State Cache Listed entities are stored in the specified cache storage so that this processor can resume listing across NiFi restart or in case of primary node change. 'Tracking Entities'strategy require tracking information of all listed entities within the last 'Tracking Time Window'. To support large number of entities, the strategy uses DistributedMapCache instead of managed state. Cache key format is 'ListedEntities::\{processorId\}(::\{nodeId\})'. If it tracks per node listed entities, then the optional '::\{nodeId\}' part is added to manage state separately. E.g. cluster wide cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b', per node cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b::nifi-node3' The stored cache content is Gzipped JSON string. The cache key will be deleted when target listing configuration is changed. Used by 'Tracking Entities'strategy.
Entity Tracking Time Window Specify how long this processor should track already-listed entities. 'Tracking Entities'strategy can pick any entity whose timestamp is inside the specified time window. For example, if set to '30 minutes', any entity having timestamp in recent 30 minutes will be the listing target when this processor runs. A listed entity is considered 'new/updated' and a FlowFile is emitted if one of following condition meets: 1. does not exist in the already-listed entities, 2. has newer timestamp than the cached entity, 3. has different size than the cached entity. If a cached entity 's timestamp becomes older than specified time window, that entity will be removed from the cached already-listed entities. Used by'Tracking Entities'strategy.
File Filter Only files whose names match the given regular expression will be picked up
Ignore Hidden Files Indicates whether or not hidden files should be ignored
Include File Attributes Whether or not to include information such as the file's Last Modified Time and Owner as FlowFile Attributes. Depending on the File System being used, gathering this information can be expensive and as a result should be disabled. This is especially true of remote file shares.
Input Directory The input directory from which files to pull files
Input Directory Location Specifies where the Input Directory is located. This is used to determine whether state should be stored locally or across the cluster.
Listing Strategy Specify how to determine new/updated entities. See each strategy descriptions for detail.
Maximum File Age The maximum age that a file must be in order to be pulled; any file older than this amount of time (according to last modification date) will be ignored
Maximum File Size The maximum size that a file can be in order to be pulled
Minimum File Age The minimum age that a file must be in order to be pulled; any file younger than this amount of time (according to last modification date) will be ignored
Minimum File Size The minimum size that a file must be in order to be pulled
Path Filter When Recurse Subdirectories is true, then only subdirectories whose path matches the given regular expression will be scanned
Record Writer Specifies the Record Writer to use for creating the listing. If not specified, one FlowFile will be created for each entity that is listed. If the Record Writer is specified, all entities will be written to a single FlowFile instead of adding attributes to individual FlowFiles.
Recurse Subdirectories Indicates whether to list files from subdirectories of the directory
Target System Timestamp Precision Specify timestamp precision at the target system. Since this processor uses timestamp of entities to decide which should be listed, it is crucial to use the right timestamp precision.
max-listing-time The maximum amount of time that listing any single directory is expected to take. If the listing for the directory specified by the 'Input Directory' property, or the listing of any subdirectory (if 'Recurse' is set to true) takes longer than this amount of time, a warning bulletin will be generated for each directory listing that exceeds this amount of time.
max-operation-time The maximum amount of time that any single disk operation is expected to take. If any disk operation takes longer than this amount of time, a warning bulletin will be generated for each operation that exceeds this amount of time.
max-performance-metrics If the 'Track Performance' property is set to 'true', this property indicates the maximum number of files whose performance metrics should be held onto. A smaller value for this property will result in less heap utilization, while a larger value may provide more accurate insights into how the disk access operations are performing
track-performance Whether or not the Processor should track the performance of disk access operations. If true, all accesses to disk will be recorded, including the file being accessed, the information being obtained, and how long it takes. This is then logged periodically at a DEBUG level. While the amount of data will be capped, this option may still consume a significant amount of heap (controlled by the 'Maximum Number of Files to Track' property), but it can be very useful for troubleshooting purposes if performance is poor is degraded.
Scopes Description
LOCAL After performing a listing of files, the timestamp of the newest file is stored. This allows the Processor to list only files that have been added or modified after this date the next time that the Processor is run. Whether the state is stored with a Local or Cluster scope depends on the value of the <Input Directory Location> property.
CLUSTER After performing a listing of files, the timestamp of the newest file is stored. This allows the Processor to list only files that have been added or modified after this date the next time that the Processor is run. Whether the state is stored with a Local or Cluster scope depends on the value of the <Input Directory Location> property.
Name Description
success All FlowFiles that are received are routed to success
Name Description
filename The name of the file that was read from filesystem.
path The path is set to the relative path of the file's directory on filesystem compared to the Input Directory property. For example, if Input Directory is set to /tmp, then files picked up from /tmp will have the path attribute set to "/". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3/".
absolute.path The absolute.path is set to the absolute path of the file's directory on filesystem. For example, if the Input Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "/tmp/". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "/tmp/abc/1/2/3/".
file.owner The user that owns the file in filesystem
file.group The group that owns the file in filesystem
file.size The number of bytes in the file in filesystem
file.permissions The permissions for the file in filesystem. This is formatted as 3 characters for the owner, 3 for the group, and 3 for other users. For example rw-rw-r–
file.lastModifiedTime The timestamp of when the file in filesystem was last modified as 'yyyy-MM-dd'T'HH:mm:ssZ'
file.lastAccessTime The timestamp of when the file in filesystem was last accessed as 'yyyy-MM-dd'T'HH:mm:ssZ'
file.creationTime The timestamp of when the file in filesystem was created as 'yyyy-MM-dd'T'HH:mm:ssZ'
Property Description
Connection Mode The FTP Connection Mode
Connection Timeout Amount of time to wait before timing out while creating a connection
Data Timeout When transferring a file between the local and remote system, this value specifies how long is allowed to elapse without any data being transferred between systems
Entity Tracking Initial Listing Target Specify how initial listing should be handled. Used by 'Tracking Entities'strategy.
Entity Tracking State Cache Listed entities are stored in the specified cache storage so that this processor can resume listing across NiFi restart or in case of primary node change. 'Tracking Entities'strategy require tracking information of all listed entities within the last 'Tracking Time Window'. To support large number of entities, the strategy uses DistributedMapCache instead of managed state. Cache key format is 'ListedEntities::\{processorId\}(::\{nodeId\})'. If it tracks per node listed entities, then the optional '::\{nodeId\}' part is added to manage state separately. E.g. cluster wide cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b', per node cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b::nifi-node3' The stored cache content is Gzipped JSON string. The cache key will be deleted when target listing configuration is changed. Used by 'Tracking Entities'strategy.
Entity Tracking Time Window Specify how long this processor should track already-listed entities. 'Tracking Entities'strategy can pick any entity whose timestamp is inside the specified time window. For example, if set to '30 minutes', any entity having timestamp in recent 30 minutes will be the listing target when this processor runs. A listed entity is considered 'new/updated' and a FlowFile is emitted if one of following condition meets: 1. does not exist in the already-listed entities, 2. has newer timestamp than the cached entity, 3. has different size than the cached entity. If a cached entity 's timestamp becomes older than specified time window, that entity will be removed from the cached already-listed entities. Used by'Tracking Entities'strategy.
File Filter Regex Provides a Java Regular Expression for filtering Filenames; if a filter is supplied, only files whose names match that Regular Expression will be fetched
Follow Symbolic Links If true, will pull even symbolic files and also nested symbolic subdirectories; otherwise, will not read symbolic files and will not traverse symbolic link subdirectories
Hostname The fully qualified hostname or IP address of the remote system
Ignore Dotted Files If true, files whose names begin with a dot (".") will be ignored
Internal Buffer Size Set the internal buffer size for buffered data streams
Listing Strategy Specify how to determine new/updated entities. See each strategy descriptions for detail.
Password Password for the user account
Path Filter Regex When Search Recursively is true, then only subdirectories whose path matches the given Regular Expression will be scanned
Port The port to connect to on the remote host to fetch the data from
Record Writer Specifies the Record Writer to use for creating the listing. If not specified, one FlowFile will be created for each entity that is listed. If the Record Writer is specified, all entities will be written to a single FlowFile instead of adding attributes to individual FlowFiles.
Remote Path The path on the remote system from which to pull or push files
Remote Poll Batch Size The value specifies how many file paths to find in a given directory on the remote system when doing a file listing. This value in general should not need to be modified but when polling against a remote system with a tremendous number of files this value can be critical. Setting this value too high can result very poor performance and setting it too low can cause the flow to be slower than normal.
Search Recursively If true, will pull files from arbitrarily nested subdirectories; otherwise, will not traverse subdirectories
Target System Timestamp Precision Specify timestamp precision at the target system. Since this processor uses timestamp of entities to decide which should be listed, it is crucial to use the right timestamp precision.
Transfer Mode The FTP Transfer Mode
Username Username
ftp-use-utf8 Tells the client to use UTF-8 encoding when processing files and filenames. If set to true, the server must also support UTF-8 encoding.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Scopes Description
CLUSTER After performing a listing of files, the timestamp of the newest file is stored. This allows the Processor to list only files that have been added or modified after this date the next time that the Processor is run. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node will not duplicate the data that was listed by the previous Primary Node.
Name Description
success All FlowFiles that are received are routed to success
Name Description
ftp.remote.host The hostname of the FTP Server
ftp.remote.port The port that was connected to on the FTP Server
ftp.listing.user The username of the user that performed the FTP Listing
file.owner The numeric owner id of the source file
file.group The numeric group id of the source file
file.permissions The read/write/execute permissions of the source file
file.size The number of bytes in the source file
file.lastModifiedTime The timestamp of when the file in the filesystem waslast modified as 'yyyy-MM-dd'T'HH:mm:ssZ'
filename The name of the file on the FTP Server
path The fully qualified name of the directory on the FTP Server from which the file was pulled
Property Description
Entity Tracking Initial Listing Target Specify how initial listing should be handled. Used by 'Tracking Entities'strategy.
Entity Tracking State Cache Listed entities are stored in the specified cache storage so that this processor can resume listing across NiFi restart or in case of primary node change. 'Tracking Entities'strategy require tracking information of all listed entities within the last 'Tracking Time Window'. To support large number of entities, the strategy uses DistributedMapCache instead of managed state. Cache key format is 'ListedEntities::\{processorId\}(::\{nodeId\})'. If it tracks per node listed entities, then the optional '::\{nodeId\}' part is added to manage state separately. E.g. cluster wide cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b', per node cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b::nifi-node3' The stored cache content is Gzipped JSON string. The cache key will be deleted when target listing configuration is changed. Used by 'Tracking Entities'strategy.
Entity Tracking Time Window Specify how long this processor should track already-listed entities. 'Tracking Entities'strategy can pick any entity whose timestamp is inside the specified time window. For example, if set to '30 minutes', any entity having timestamp in recent 30 minutes will be the listing target when this processor runs. A listed entity is considered 'new/updated' and a FlowFile is emitted if one of following condition meets: 1. does not exist in the already-listed entities, 2. has newer timestamp than the cached entity, 3. has different size than the cached entity. If a cached entity 's timestamp becomes older than specified time window, that entity will be removed from the cached already-listed entities. Used by'Tracking Entities'strategy.
GCP Credentials Provider Service The Controller Service used to obtain Google Cloud Platform credentials.
gcp-project-id Google Cloud Project ID
gcp-retry-count How many retry attempts should be made before routing to the failure relationship.
gcs-bucket Bucket of the object.
gcs-prefix The prefix used to filter the object list. In most cases, it should end with a forward slash ( '/').
gcs-use-generations Specifies whether to use GCS Generations, if applicable. If false, only the latest version of each object will be returned.
listing-strategy Specify how to determine new/updated entities. See each strategy descriptions for detail.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
record-writer Specifies the Record Writer to use for creating the listing. If not specified, one FlowFile will be created for each entity that is listed. If the Record Writer is specified, all entities will be written to a single FlowFile instead of adding attributes to individual FlowFiles.
storage-api-url Overrides the default storage URL. Configuring an alternative Storage API URL also overrides the HTTP Host header on requests as described in the Google documentation for Private Service Connections.
Scopes Description
CLUSTER After performing a listing of keys, the timestamp of the newest key is stored, along with the keys that share that same timestamp. This allows the Processor to list only keys that have been added or modified after this date the next time that the Processor is run. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node can pick up where the previous node left off, without duplicating the data.
Name Description
success FlowFiles are routed to this relationship after a successful Google Cloud Storage operation.
Name Description
filename The name of the file
gcs.bucket Bucket of the object.
gcs.key Name of the object.
gcs.size Size of the object.
gcs.cache.control Data cache control of the object.
gcs.component.count The number of components which make up the object.
gcs.content.disposition The data content disposition of the object.
gcs.content.encoding The content encoding of the object.
gcs.content.language The content language of the object.
mime.type The MIME/Content-Type of the object
gcs.crc32c The CRC32C checksum of object's data, encoded in base64 in big-endian order.
gcs.create.time The creation time of the object (milliseconds)
gcs.update.time The last modification time of the object (milliseconds)
gcs.encryption.algorithm The algorithm used to encrypt the object.
gcs.encryption.sha256 The SHA256 hash of the key used to encrypt the object
gcs.etag The HTTP 1.1 Entity tag for the object.
gcs.generated.id The service-generated for the object
gcs.generation The data generation of the object.
gcs.md5 The MD5 hash of the object's data encoded in base64.
gcs.media.link The media download link to the object.
gcs.metageneration The metageneration of the object.
gcs.owner The owner (uploader) of the object.
gcs.owner.type The ACL entity type of the uploader of the object.
gcs.acl.owner A comma-delimited list of ACL entities that have owner access to the object. Entities will be either email addresses, domains, or project IDs.
gcs.acl.writer A comma-delimited list of ACL entities that have write access to the object. Entities will be either email addresses, domains, or project IDs.
gcs.acl.reader A comma-delimited list of ACL entities that have read access to the object. Entities will be either email addresses, domains, or project IDs.
gcs.uri The URI of the object as a string.
Property Description
Entity Tracking Initial Listing Target Specify how initial listing should be handled. Used by 'Tracking Entities'strategy.
Entity Tracking State Cache Listed entities are stored in the specified cache storage so that this processor can resume listing across NiFi restart or in case of primary node change. 'Tracking Entities'strategy require tracking information of all listed entities within the last 'Tracking Time Window'. To support large number of entities, the strategy uses DistributedMapCache instead of managed state. Cache key format is 'ListedEntities::\{processorId\}(::\{nodeId\})'. If it tracks per node listed entities, then the optional '::\{nodeId\}' part is added to manage state separately. E.g. cluster wide cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b', per node cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b::nifi-node3' The stored cache content is Gzipped JSON string. The cache key will be deleted when target listing configuration is changed. Used by 'Tracking Entities'strategy.
Entity Tracking Time Window Specify how long this processor should track already-listed entities. 'Tracking Entities'strategy can pick any entity whose timestamp is inside the specified time window. For example, if set to '30 minutes', any entity having timestamp in recent 30 minutes will be the listing target when this processor runs. A listed entity is considered 'new/updated' and a FlowFile is emitted if one of following condition meets: 1. does not exist in the already-listed entities, 2. has newer timestamp than the cached entity, 3. has different size than the cached entity. If a cached entity 's timestamp becomes older than specified time window, that entity will be removed from the cached already-listed entities. Used by'Tracking Entities'strategy.
Listing Strategy Specify how to determine new/updated entities. See each strategy descriptions for detail.
Record Writer Specifies the Record Writer to use for creating the listing. If not specified, one FlowFile will be created for each entity that is listed. If the Record Writer is specified, all entities will be written to a single FlowFile instead of adding attributes to individual FlowFiles.
connect-timeout Maximum wait time for connection to Google Drive service.
folder-id The ID of the folder from which to pull list of files. Please see Additional Details to set up access to Google Drive and obtain Folder ID. WARNING: Unauthorized access to the folder is treated as if the folder was empty. This results in the processor not creating outgoing FlowFiles. No additional error message is provided.
gcp-credentials-provider-service The Controller Service used to obtain Google Cloud Platform credentials.
min-age The minimum age a file must be in order to be considered; any files younger than this will be ignored.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
read-timeout Maximum wait time for response from Google Drive service.
recursive-search When 'true', will include list of files from concrete sub-folders (ignores shortcuts). Otherwise, will return only files that have the defined 'Folder ID' as their parent directly. WARNING: The listing may fail if there are too many sub-folders (500+).
Scopes Description
CLUSTER The processor stores necessary data to be able to keep track what files have been listed already. What exactly needs to be stored depends on the 'Listing Strategy'. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node can pick up where the previous node left off, without duplicating the data.
Name Description
success All FlowFiles that are received are routed to success
Name Description
drive.id The id of the file
filename The name of the file
mime.type The MIME type of the file
drive.size The size of the file. Set to 0 when the file size is not available (e.g. externally stored files).
drive.size.available Indicates if the file size is known / available
drive.timestamp The last modified time or created time (whichever is greater) of the file. The reason for this is that the original modified date of a file is preserved when uploaded to Google Drive. 'Created time' takes the time when the upload occurs. However uploaded files can still be modified later.
drive.created.time The file's creation time
drive.modified.time The file's last modification time
drive.path The path of the file's directory from the base directory. The path contains the folder names in URL encoded form because Google Drive allows special characters in file names, including '/' (slash) and '' (backslash). The URL encoded folder names are separated by '/' in the path.
drive.owner The owner of the file
drive.last.modifying.user The last modifying user of the file
drive.web.view.link Web view link to the file
drive.web.content.link Web content link to the file
drive.parent.folder.id The id of the file's parent folder
drive.parent.folder.name The name of the file's parent folder
drive.listed.folder.id The id of the base folder that was listed
drive.listed.folder.name The name of the base folder that was listed
drive.shared.drive.id The id of the shared drive (if the file is located on a shared drive)
drive.shared.drive.name The name of the shared drive (if the file is located on a shared drive)
Property Description
Drive ID The ID of the drive to list files from. This can be a shared drive ID.
GCP Credentials Service The Controller Service used to obtain Google Cloud Platform credentials.
Include Folders When 'true', both files and folders will be included in the results. When 'false', only files (not folders) will be included.
Minimum File Age The minimum age a file must be in order to be considered; any files younger than this will be ignored.
Record Writer Specifies the Controller Service to use for writing the metadata records. Must be set.
Search Recursively When 'true', will recursively list files in all folders within the drive. When 'false', will only list files at the root level of the drive.
Name Description
failure A FlowFile will be routed here if there is an error fetching file metadata.
retry A FlowFile is routed here if the processor should retry the request (e.g., after rate limiting).
success A FlowFile containing the file metadata records will be routed to this relationship upon successful processing.
Name Description
google.drive.drive.id The ID of the drive from which files were listed
record.count The number of records in the FlowFile
mime.type The MIME Type specified by the Record Writer
google.drive.error.code The error code if the request to Google Drive API fails
google.drive.error.message The error message if the request to Google Drive API fails
Property Description
Custom Query Custom query to filter the returned groups. For example, 'email=test-*'. See Google's Admin SDK Directory API documentation for supported syntax.
GCP Credentials Service Controller Service used to obtain Google Cloud Platform credentials.
Google Domain Domain name to list Google Groups (e.g., 'example.com').
Record Writer Record writer used for writing out the records of retrieved Google Groups.
Name Description
failure FlowFiles are routed here if the processor fails to retrieve Google Groups.
retry FlowFiles are routed here if a transient failure occurs (e.g. rate-limited, socket timeouts) and should be retried.
success A FlowFile containing a record set of the groups is routed here upon success.
Name Description
record.count The number of records (groups) returned.
mime.type The MIME type for the resulting FlowFile.
Property Description
HubSpot Service HubSpot Client Service.
Object Type HubSpot object type
Updated After Filter objects updated after specified date (format: yyyy-MM-dd)
Scopes Description
CLUSTER Maintains pagination state and last sync timestamp to continue data retrieval from the last known position after restarts and to fetch only changed data.
Name Description
failure HubSpot fail relationship
original The input Flow File is routed to the original relationship.
retry HubSpot retry relationship. FlowFiles that failed to process due to a server timeout or rate limit related error. FlowFiles routed here should be routed back into the processor.
success HubSpot success relationship
Name Description
mime.type application/json
statement.type Always 'UPSERT' for this processor
hubspot.object.type HubSpot Object Type for this fetch
hubspot.object.id HubSpot Object ID for this fetch
hubspot.run.id Timestamp of the start of this run. Obtained from the incoming FlowFile or current time if not available
hubspot.is_last Whether this is the last paged object of the ingestion
Property Description
Environment URL URL to Microsoft Dataverse Environment
OAuth2 Access Token Provider Enables managed retrieval of OAuth2 Bearer Token.
Tables Filter Strategy List of table names. Output will be limited to those names if defined.
Tables Filter Value Value of Table Names filter. It is regexp or separated list, depending on selected filtering strategy.
Web Client Service Provider Creates instance of web client.
Name Description
failure FlowFile with errors occurred while fetching from Dataverse.
success FlowFile with listed tables from Dataverse.
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Bucket The S3 Bucket to interact with
Communications Timeout The amount of time to wait in order to establish a connection to AWS or receive data from AWS before timing out.
Custom Signer Class Name Fully qualified class name of the custom signer class. The signer must implement com.amazonaws.auth. Signer interface.
Custom Signer Module Location Comma-separated list of paths to files and/or directories which contain the custom signer's JAR file and its dependencies (if any).
Delimiter The string used to delimit directories within the bucket. Please consult the AWS documentation for the correct use of this field.
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
Entity Tracking Initial Listing Target Specify how initial listing should be handled. Used by 'Tracking Entities'strategy.
Entity Tracking State Cache Listed entities are stored in the specified cache storage so that this processor can resume listing across NiFi restart or in case of primary node change. 'Tracking Entities'strategy require tracking information of all listed entities within the last 'Tracking Time Window'. To support large number of entities, the strategy uses DistributedMapCache instead of managed state. Cache key format is 'ListedEntities::\{processorId\}(::\{nodeId\})'. If it tracks per node listed entities, then the optional '::\{nodeId\}' part is added to manage state separately. E.g. cluster wide cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b', per node cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b::nifi-node3' The stored cache content is Gzipped JSON string. The cache key will be deleted when target listing configuration is changed. Used by 'Tracking Entities'strategy.
Entity Tracking Time Window Specify how long this processor should track already-listed entities. 'Tracking Entities'strategy can pick any entity whose timestamp is inside the specified time window. For example, if set to '30 minutes', any entity having timestamp in recent 30 minutes will be the listing target when this processor runs. A listed entity is considered 'new/updated' and a FlowFile is emitted if one of following condition meets: 1. does not exist in the already-listed entities, 2. has newer timestamp than the cached entity, 3. has different size than the cached entity. If a cached entity 's timestamp becomes older than specified time window, that entity will be removed from the cached already-listed entities. Used by'Tracking Entities'strategy.
List Type Specifies whether to use the original List Objects or the newer List Objects Version 2 endpoint.
Listing Batch Size If not using a Record Writer, this property dictates how many S3 objects should be listed in a single batch. Once this number is reached, the FlowFiles that have been created will be transferred out of the Processor. Setting this value lower may result in lower latency by sending out the FlowFiles before the complete listing has finished. However, it can significantly reduce performance. Larger values may take more memory to store all of the information before sending the FlowFiles out. This property is ignored if using a Record Writer, as one of the main benefits of the Record Writer is being able to emit the entire listing as a single FlowFile.
Listing Strategy Specify how to determine new/updated entities. See each strategy descriptions for detail.
Maximum Object Age The maximum age that an S3 object can be in order to be considered; any object older than this amount of time (according to last modification date) will be ignored
Minimum Object Age The minimum age that an S3 object must be in order to be considered; any object younger than this amount of time (according to last modification date) will be ignored
Prefix The prefix used to filter the object list. Do not begin with a forward slash '/'. In most cases, it should end with a forward slash '/'.
Record Writer Specifies the Record Writer to use for creating the listing. If not specified, one FlowFile will be created for each entity that is listed. If the Record Writer is specified, all entities will be written to a single FlowFile instead of adding attributes to individual FlowFiles.
Region The AWS Region to connect to.
Requester Pays If true, indicates that the requester consents to pay any charges associated with listing the S3 bucket. This sets the 'x-amz-request-payer' header to 'requester'. Note that this setting is not applicable when 'Use Versions' is 'true'.
SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections
Signer Override The AWS S3 library uses Signature Version 4 by default but this property allows you to specify the Version 2 signer to support older S3-compatible services or even to plug in your own custom signer implementation.
Use Versions Specifies whether to use S3 versions, if applicable. If false, only the latest version of each object will be returned.
Write Object Tags If set to 'True', the tags associated with the S3 object will be written as FlowFile attributes
Write User Metadata If set to 'True', the user defined metadata associated with the S3 object will be added to FlowFile attributes/records
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Scopes Description
CLUSTER After performing a listing of keys, the timestamp of the newest key is stored, along with the keys that share that same timestamp. This allows the Processor to list only keys that have been added or modified after this date the next time that the Processor is run. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node can pick up where the previous node left off, without duplicating the data.
Name Description
success FlowFiles are routed to this Relationship after they have been successfully processed.
Name Description
s3.bucket The name of the S3 bucket
s3.region The region of the S3 bucket
filename The name of the file
s3.etag The ETag that can be used to see if the file has changed
s3.isLatest A boolean indicating if this is the latest version of the object
s3.lastModified The last modified time in milliseconds since epoch in UTC time
s3.length The size of the object in bytes
s3.storeClass The storage class of the object
s3.version The version of the object, if applicable
s3.tag.___ If 'Write Object Tags' is set to 'True', the tags associated to the S3 object that is being listed will be written as part of the flowfile attributes
s3.user.metadata.___ If 'Write User Metadata' is set to 'True', the user defined metadata associated to the S3 object that is being listed will be written as part of the flowfile attributes
Property Description
Salesforce Data Cloud Client Salesforce Data Cloud Client to interact with the APIs
Name Description
success FlowFile containing the list of available objects will be routed to this relationship
Name Description
nbObjects The number of data shares listed in the organization that are available to the identified user.
Property Description
Salesforce Client Salesforce Client to interact with the APIs
Name Description
success FlowFile containing the list of available objects will be routed to this relationship
Name Description
nbObjects The number of objects listed in the organization that are available to the identified user.
Property Description
Algorithm Negotiation Configuration strategy for SSH algorithm negotiation
Ciphers Allowed A comma-separated list of Ciphers allowed for SFTP connections. Leave unset to allow all. Available options are: 3des-cbc, aes128-cbc, aes128-ctr, [aes128-gcm@openssh.com](mailto:aes128-gcm@openssh.com), aes192-cbc, aes192-ctr, aes256-cbc, aes256-ctr, [aes256-gcm@openssh.com](mailto:aes256-gcm@openssh.com), arcfour128, arcfour256, blowfish-cbc, [chacha20-poly1305@openssh.com](mailto:chacha20-poly1305@openssh.com), none
Connection Timeout Amount of time to wait before timing out while creating a connection
Data Timeout When transferring a file between the local and remote system, this value specifies how long is allowed to elapse without any data being transferred between systems
Entity Tracking Initial Listing Target Specify how initial listing should be handled. Used by 'Tracking Entities'strategy.
Entity Tracking State Cache Listed entities are stored in the specified cache storage so that this processor can resume listing across NiFi restart or in case of primary node change. 'Tracking Entities'strategy require tracking information of all listed entities within the last 'Tracking Time Window'. To support large number of entities, the strategy uses DistributedMapCache instead of managed state. Cache key format is 'ListedEntities::\{processorId\}(::\{nodeId\})'. If it tracks per node listed entities, then the optional '::\{nodeId\}' part is added to manage state separately. E.g. cluster wide cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b', per node cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b::nifi-node3' The stored cache content is Gzipped JSON string. The cache key will be deleted when target listing configuration is changed. Used by 'Tracking Entities'strategy.
Entity Tracking Time Window Specify how long this processor should track already-listed entities. 'Tracking Entities'strategy can pick any entity whose timestamp is inside the specified time window. For example, if set to '30 minutes', any entity having timestamp in recent 30 minutes will be the listing target when this processor runs. A listed entity is considered 'new/updated' and a FlowFile is emitted if one of following condition meets: 1. does not exist in the already-listed entities, 2. has newer timestamp than the cached entity, 3. has different size than the cached entity. If a cached entity 's timestamp becomes older than specified time window, that entity will be removed from the cached already-listed entities. Used by'Tracking Entities'strategy.
File Filter Regex Provides a Java Regular Expression for filtering Filenames; if a filter is supplied, only files whose names match that Regular Expression will be fetched
Follow Symbolic Links If true, will pull even symbolic files and also nested symbolic subdirectories; otherwise, will not read symbolic files and will not traverse symbolic link subdirectories
Host Key File If supplied, the given file will be used as the Host Key; otherwise, if 'Strict Host Key Checking' property is applied (set to true) then uses the 'known_hosts' and 'known_hosts2' files from ~/.ssh directory else no host key file will be used
Hostname The fully qualified hostname or IP address of the remote system
Ignore Dotted Files If true, files whose names begin with a dot (".") will be ignored
Key Algorithms Allowed A comma-separated list of Key Algorithms allowed for SFTP connections. Leave unset to allow all. Available options are: ecdsa-sha2-nistp256, [ecdsa-sha2-nistp256-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp256-cert-v01@openssh.com), ecdsa-sha2-nistp384, [ecdsa-sha2-nistp384-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp384-cert-v01@openssh.com), ecdsa-sha2-nistp521, [ecdsa-sha2-nistp521-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp521-cert-v01@openssh.com), rsa-sha2-256, [rsa-sha2-256-cert-v01@openssh.com](mailto:rsa-sha2-256-cert-v01@openssh.com), rsa-sha2-512, [rsa-sha2-512-cert-v01@openssh.com](mailto:rsa-sha2-512-cert-v01@openssh.com), [sk-ecdsa-sha2-nistp256@openssh.com](mailto:sk-ecdsa-sha2-nistp256@openssh.com), [sk-ssh-ed25519@openssh.com](mailto:sk-ssh-ed25519@openssh.com), ssh-dss, [ssh-dss-cert-v01@openssh.com](mailto:ssh-dss-cert-v01@openssh.com), ssh-ed25519, [ssh-ed25519-cert-v01@openssh.com](mailto:ssh-ed25519-cert-v01@openssh.com), ssh-rsa, [ssh-rsa-cert-v01@openssh.com](mailto:ssh-rsa-cert-v01@openssh.com)
Key Exchange Algorithms Allowed A comma-separated list of Key Exchange Algorithms allowed for SFTP connections. Leave unset to allow all. Available options are: curve25519-sha256, [curve25519-sha256@libssh.org](mailto:curve25519-sha256@libssh.org), curve448-sha512, diffie-hellman-group-exchange-sha1, diffie-hellman-group-exchange-sha256, diffie-hellman-group1-sha1, diffie-hellman-group14-sha1, diffie-hellman-group14-sha256, diffie-hellman-group15-sha512, diffie-hellman-group16-sha512, diffie-hellman-group17-sha512, diffie-hellman-group18-sha512, ecdh-sha2-nistp256, ecdh-sha2-nistp384, ecdh-sha2-nistp521, mlkem1024nistp384-sha384, mlkem768nistp256-sha256, mlkem768x25519-sha256, sntrup761x25519-sha512, [sntrup761x25519-sha512@openssh.com](mailto:sntrup761x25519-sha512@openssh.com)
Listing Strategy Specify how to determine new/updated entities. See each strategy descriptions for detail.
Maximum File Age The maximum age that a file must be in order to be pulled; any file older than this amount of time (according to last modification date) will be ignored
Maximum File Size The maximum size that a file can be in order to be pulled
Message Authentication Codes Allowed A comma-separated list of Message Authentication Codes allowed for SFTP connections. Leave unset to allow all. Available options are: hmac-md5, hmac-md5-96, hmac-sha1, hmac-sha1-96, [hmac-sha1-etm@openssh.com](mailto:hmac-sha1-etm@openssh.com), hmac-sha2-256, [hmac-sha2-256-etm@openssh.com](mailto:hmac-sha2-256-etm@openssh.com), hmac-sha2-512, [hmac-sha2-512-etm@openssh.com](mailto:hmac-sha2-512-etm@openssh.com)
Minimum File Age The minimum age that a file must be in order to be pulled; any file younger than this amount of time (according to last modification date) will be ignored
Minimum File Size The minimum size that a file must be in order to be pulled
Password Password for the user account
Path Filter Regex When Search Recursively is true, then only subdirectories whose path matches the given Regular Expression will be scanned
Port The port that the remote system is listening on for file transfers
Private Key Passphrase Password for the private key
Private Key Path The fully qualified path to the Private Key file
Record Writer Specifies the Record Writer to use for creating the listing. If not specified, one FlowFile will be created for each entity that is listed. If the Record Writer is specified, all entities will be written to a single FlowFile instead of adding attributes to individual FlowFiles.
Remote Path The path on the remote system from which to pull or push files
Search Recursively If true, will pull files from arbitrarily nested subdirectories; otherwise, will not traverse subdirectories
Send Keep Alive On Timeout Send a Keep Alive message every 5 seconds up to 5 times for an overall timeout of 25 seconds.
Strict Host Key Checking Indicates whether or not strict enforcement of hosts keys should be applied
Target System Timestamp Precision Specify timestamp precision at the target system. Since this processor uses timestamp of entities to decide which should be listed, it is crucial to use the right timestamp precision.
Use Compression Indicates whether or not ZLIB compression should be used when transferring files
Username Username
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Scopes Description
CLUSTER After performing a listing of files, the timestamp of the newest file is stored. This allows the Processor to list only files that have been added or modified after this date the next time that the Processor is run. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node will not duplicate the data that was listed by the previous Primary Node.
Name Description
success All FlowFiles that are received are routed to success
Name Description
sftp.remote.host The hostname of the SFTP Server
sftp.remote.port The port that was connected to on the SFTP Server
sftp.listing.user The username of the user that performed the SFTP Listing
file.owner The numeric owner id of the source file
file.group The numeric group id of the source file
file.permissions The read/write/execute permissions of the source file
file.size The number of bytes in the source file
file.lastModifiedTime The timestamp of when the file in the filesystem waslast modified as 'yyyy-MM-dd'T'HH:mm:ssZ'
filename The name of the file on the SFTP Server
path The fully qualified name of the directory on the SFTP Server from which the file was pulled
mime.type The MIME Type that is provided by the configured Record Writer
Property Description
Authentication Service The service that provides authentication for the SharePoint API.
Site URL The URL of the Sharepoint Site.
Name Description
success FlowFiles for each Drive are routed to this relationship
Name Description
sharepoint.site.url The URL of the Sharepoint Site.
sharepoint.site.id The ID of the Sharepoint Site.
sharepoint.drive.name The name of the Sharepoint Drive.
sharepoint.drive.id The ID of the Sharepoint Drive.
Property Description
OAuth2 Access Token Provider Enables managed retrieval of OAuth2 Bearer Token.
Record Writer Record writer used for writing out the records of retrieved Sharepoint Site Groups.
Site URL The URL of the SharePoint site.
Web Client Service The Web Client Service to use for communicating with Sharepoint.
Name Description
success Successfully listed all SharePoint site groups. Each group will be represented as a separate FlowFile.
Name Description
record.count The number of records (groups) returned.
mime.type The MIME type for the resulting FlowFile.
Property Description
Entity Tracking Initial Listing Target Specify how initial listing should be handled. Used by 'Tracking Entities'strategy.
Entity Tracking State Cache Listed entities are stored in the specified cache storage so that this processor can resume listing across NiFi restart or in case of primary node change. 'Tracking Entities'strategy require tracking information of all listed entities within the last 'Tracking Time Window'. To support large number of entities, the strategy uses DistributedMapCache instead of managed state. Cache key format is 'ListedEntities::\{processorId\}(::\{nodeId\})'. If it tracks per node listed entities, then the optional '::\{nodeId\}' part is added to manage state separately. E.g. cluster wide cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b', per node cache key ='ListedEntities::8dda2321-0164-1000-50fa-3042fe7d6a7b::nifi-node3' The stored cache content is Gzipped JSON string. The cache key will be deleted when target listing configuration is changed. Used by 'Tracking Entities'strategy.
Entity Tracking Time Window Specify how long this processor should track already-listed entities. 'Tracking Entities'strategy can pick any entity whose timestamp is inside the specified time window. For example, if set to '30 minutes', any entity having timestamp in recent 30 minutes will be the listing target when this processor runs. A listed entity is considered 'new/updated' and a FlowFile is emitted if one of following condition meets: 1. does not exist in the already-listed entities, 2. has newer timestamp than the cached entity, 3. has different size than the cached entity. If a cached entity 's timestamp becomes older than specified time window, that entity will be removed from the cached already-listed entities. Used by'Tracking Entities'strategy.
Listing Strategy Specify how to determine new/updated entities. See each strategy descriptions for detail.
Record Writer Specifies the Record Writer to use for creating the listing. If not specified, one FlowFile will be created for each entity that is listed. If the Record Writer is specified, all entities will be written to a single FlowFile instead of adding attributes to individual FlowFiles.
Target System Timestamp Precision Specify timestamp precision at the target system. Since this processor uses timestamp of entities to decide which should be listed, it is crucial to use the right timestamp precision.
directory The network folder from which to list files. This is the remaining relative path after the share: [smb://HOSTNAME:PORT/SHARE/[DIRECTORY]/sub/directories](smb://HOSTNAME:PORT/SHARE/[DIRECTORY]/sub/directories). It is also possible to add subdirectories. The given path on the remote file share must exist. This can be checked using verification. You may mix Windows and Linux-style directory separators.
file-filter Only files whose names match the given regular expression will be listed.
file-name-suffix-filter Files ending with the given suffix will be omitted. Can be used to make sure that files that are still uploading are not listed multiple times, by having those files have a suffix and remove the suffix once the upload finishes. This is highly recommended when using 'Tracking Entities' or 'Tracking Timestamps' listing strategies.
initial-listing-strategy Specifies how to handle existing files on the SMB share when the processor is started for the first time (or its state has been cleared).
initial-listing-timestamp The timestamp from which the files will be listed when the processor is started for the first time (or its state has been cleared). The value can be specified as an epoch timestamp in milliseconds or as a UTC datetime in a format such as 2025-02-01T00:00:00Z
max-file-age Any file older than the given value will be omitted.
max-file-size Any file larger than the given value will be omitted.
min-file-age The minimum age that a file must be in order to be listed; any file younger than this amount of time will be ignored.
min-file-size Any file smaller than the given value will be omitted.
path-filter Only files whose paths (up to the file's parent directory) match the given regular expression will be listed.
smb-client-provider-service Specifies the SMB client provider to use for creating SMB connections.
Scopes Description
CLUSTER After performing a listing of files, the state of the previous listing can be stored in order to list files continuously without duplication.
Name Description
success All FlowFiles that are received are routed to success
Name Description
filename The name of the file that was read from filesystem.
shortName The short name of the file that was read from filesystem.
path The path is set to the relative path of the file's directory on the remote filesystem compared to the Share root directory. For example, for a given remote locationsmb://HOSTNAME:PORT/SHARE/DIRECTORY, and a file is being listed from smb://HOSTNAME:PORT/SHARE/DIRECTORY/sub/folder/file then the path attribute will be set to "DIRECTORY/sub/folder".
serviceLocation The SMB URL of the share.
lastModifiedTime The timestamp of when the file's content changed in the filesystem as 'yyyy-MM-dd'T'HH:mm:ss'.
creationTime The timestamp of when the file was created in the filesystem as 'yyyy-MM-dd'T'HH:mm:ss'.
lastAccessTime The timestamp of when the file was accessed in the filesystem as 'yyyy-MM-dd'T'HH:mm:ss'.
changeTime The timestamp of when the file's attributes was changed in the filesystem as 'yyyy-MM-dd'T'HH:mm:ss'.
size The size of the file in bytes.
allocationSize The number of bytes allocated for the file on the server.
Property Description
Connection Pool The Controller Service that is used to obtain a connection to the database.
Included Comma Separated Source Table Names The list of comma separated list of tables to replicate. A single table should be formatted as <schema_name>.<table_name> e.g. customer.orders, customer.payments. This is combined with the regular expression to include any matching table.
Included Source Table Pattern Regular Expression for specifying table names to replicate e.g. customer.(orders|payments). This is combined with the comma-separated list to include any matching table.
Name Description
failure If a FlowFile attribute cannot be read or is incorrect, it will be routed to this Relationship.
matched Successfully created FlowFile, with a list of matching tables found in the source database.
Name Description
source.schema.name Name of the schema of the table from which an event originated
source.table.name Name of the table from which an event originated
source.entry The original entry that was attempted to parse when processing table names
reason Reason why table cannot be replicated
source.database.version.major The major version of the source database.
mime.type The MIME type of the FlowFile content.
Property Description
Databricks Client Databricks Client Service.
Include Directories Include directories in FlowFiles produced.
Recursive Directory Listing Recursively list files in sub directories.
Unity Catalog Directory Path Unity Catalog directory path e.g. /Volumes/catalog/schema/volume_name/directory
Name Description
failure Databricks failure relationship
original The original FlowFile is routed to this relationship when processing is successful.
success Databricks success relationship
Name Description
filename Base filename of the Unity Catalog file or directory.
path Path to parent directory containing the Unity Catalog file or directory.
absolute.path Full path to the Unity Catalog file or directory.
uc.resourceType The type of resource, 'file' or 'directory' of the Unity Catalog resource.
uc.size The size of the Unity Catalog file.
uc.lastModifiedTime The last modified time of the Unity Catalog file in milliseconds since epoch in UTC time.
error.code The error code for the SQL statement if an error occurred.
error.message The error message for the SQL statement if an error occurred.
Property Description
Attributes to Ignore A comma-separated list of Attributes to ignore. If not specified, no attributes will be ignored unless _Attributes to Ignore by Regular Expression_ is modified. There's an OR relationship between the two properties.
Attributes to Log A comma-separated list of Attributes to Log. If not specified, all attributes will be logged unless _Attributes to Log by Regular Expression_ is modified. There's an AND relationship between the two properties.
Log FlowFile Properties Specifies whether or not to log FlowFile "properties", such as Entry Date, Lineage Start Date, and content size
Log Level The Log Level to use when logging the Attributes
Log Payload If true, the FlowFile's payload will be logged, in addition to its attributes; otherwise, just the Attributes will be logged.
Log prefix Log prefix appended to the log lines. It helps to distinguish the output of multiple LogAttribute processors.
Output Format Specifies the format to use for logging FlowFile attributes
attributes-to-ignore-regex A regular expression indicating the Attributes to Ignore. If not specified, no attributes will be ignored unless _Attributes to Ignore_ is modified. There's an OR relationship between the two properties.
attributes-to-log-regex A regular expression indicating the Attributes to Log. If not specified, all attributes will be logged unless _Attributes to Log_ is modified. There's an AND relationship between the two properties.
character-set The name of the CharacterSet to use
Name Description
success All FlowFiles are routed to this relationship
Display Name API Name Default Value Allowable Values Description
Log Level * logsink-log-level INFO - TRACE - DEBUG - INFO - WARN - ERROR - FATAL - NONE The Log Level at which to log records (INFO, DEBUG, e.g.)
Record Writer * record-sink-record-writer Specifies the Controller Service to use for writing out the records.
Property Description
log-level The Log Level to use when logging the message: [trace, debug, info, warn, error]
log-message The log message to emit
log-prefix Log prefix appended to the log lines. It helps to distinguish the output of multiple LogMessage processors.
Name Description
success All FlowFiles are routed to this relationship
Property Description
include-empty-values Include null or blank values for keys that are null or blank
lookup-service The lookup service to use for attribute lookups
Name Description
failure FlowFiles with failing lookups are routed to this relationship
matched FlowFiles with matching lookups are routed to this relationship
unmatched FlowFiles with missing lookups are routed to this relationship
Property Description
Root Record Path A RecordPath that points to a child Record within each of the top-level Records in the FlowFile. If specified, the additional RecordPath properties will be evaluated against this child Record instead of the top-level Record. This allows for performing enrichment against multiple child Records within a single top-level Record.
lookup-service The Lookup Service to use in order to lookup a value in each Record
record-path-lookup-miss-result-cache-size Specifies how many lookup values/records should be cached. Setting this property to zero means no caching will be done and the table will be queried for each lookup value in each record. If the lookup table changes often or the most recent data must be retrieved, do not use the cache.
record-reader Specifies the Controller Service to use for reading incoming data
record-update-strategy This property defines the strategy to use when updating the record with the value returned by the Lookup Service.
record-writer Specifies the Controller Service to use for writing out the records
result-contents When a result is obtained that contains a Record, this property determines whether the Record itself is inserted at the configured path or if the contents of the Record (i.e., the sub-fields) will be inserted at the configured path.
result-record-path A RecordPath that points to the field whose value should be updated with whatever value is returned from the Lookup Service. If not specified, the value that is returned from the Lookup Service will be ignored, except for determining whether the FlowFile should be routed to the 'matched' or 'unmatched' Relationship.
routing-strategy Specifies how to route records after a Lookup has completed
Name Description
failure If a FlowFile cannot be enriched, the unchanged FlowFile will be routed to this relationship
success All records will be sent to this Relationship if configured to do so, unless a failure occurs
Name Description
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer
record.count The number of records in the FlowFile
Display Name API Name Default Value Allowable Values Description
Communications Timeout * Communications Timeout 30 secs Specifies how long to wait when communicating with the remote server before determining that there is a communications failure if data cannot be sent or received
SSL Context Service SSL Context Service If specified, indicates the SSL Context Service that is used to communicate with the remote server. If not specified, communications will not be encrypted
Server Hostname * Server Hostname The name of the server that is running the DistributedMapCacheServer service
Server Port * Server Port 4557 The port on the remote server that is to be used when communicating with the DistributedMapCacheServer service
Display Name API Name Default Value Allowable Values Description
Eviction Strategy * Eviction Strategy Least Frequently Used - Least Frequently Used - Least Recently Used - First In, First Out Determines which strategy should be used to evict values from the cache to make room for new entries
Maximum Cache Entries * Maximum Cache Entries 10000 The maximum number of cache entries that the cache can hold
Persistence Directory Persistence Directory If specified, the cache will be persisted in the given directory; if not specified, the cache will be in-memory only
Port * Port 4557 The port to listen on for incoming connections
SSL Context Service SSL Context Service If specified, this service will be used to create an SSL Context that will be used to secure communications; if not specified, communications will not be secure
Maximum Read Size maximum-read-size 1 MB The maximum number of network bytes to read for a single cache item
Property Description
Attribute Strategy Determines which FlowFile attributes should be added to the bundle. If 'Keep All Unique Attributes' is selected, any attribute on any FlowFile that gets bundled will be kept unless its value conflicts with the value from another FlowFile. If 'Keep Only Common Attributes' is selected, only the attributes that exist on all FlowFiles in the bundle, with the same value, will be preserved.
Bin Termination Check Specifies an Expression Language Expression that is to be evaluated against each FlowFile. If the result of the expression is 'true', the bin that the FlowFile corresponds to will be terminated, even if the bin has not met the minimum number of entries or minimum size. Note that if the FlowFile that triggers the termination of the bin is itself larger than the Maximum Bin Size, it will be placed into its own bin without triggering the termination of any other bin. When using this property, it is recommended to use Prioritizers in the flow's connections to ensure that the ordering is as desired.
Compression Level Specifies the compression level to use when using the Zip Merge Format; if not using the Zip Merge Format, this value is ignored
Correlation Attribute Name If specified, like FlowFiles will be binned together, where 'like FlowFiles' means FlowFiles that have the same value for this Attribute. If not specified, FlowFiles are bundled by the order in which they are pulled from the queue.
Delimiter Strategy Determines if Header, Footer, and Demarcator should point to files containing the respective content, or if the values of the properties should be used as the content.
Demarcator File Filename or text specifying the demarcator to use. If not specified, no demarcator is supplied.
FlowFile Insertion Strategy If a given FlowFile terminates the bin based on the <Bin Termination Check> property, specifies where the FlowFile should be included in the bin.
Footer File Filename or text specifying the footer to use. If not specified, no footer is supplied.
Header File Filename or text specifying the header to use. If not specified, no header is supplied.
Keep Path If using the Zip or Tar Merge Format, specifies whether or not the FlowFiles' paths should be included in their entry names.
Max Bin Age The maximum age of a Bin that will trigger a Bin to be complete. Expected format is <duration> <time unit> where <duration> is a positive integer and time unit is one of seconds, minutes, hours
Maximum Group Size The maximum size for the bundle. If not specified, there is no maximum.
Maximum Number of Entries The maximum number of files to include in a bundle
Maximum number of Bins Specifies the maximum number of bins that can be held in memory at any one time
Merge Format Determines the format that will be used to merge the content.
Merge Strategy Specifies the algorithm used to merge content. The 'Defragment' algorithm combines fragments that are associated by attributes back into a single cohesive FlowFile. The 'Bin-Packing Algorithm' generates a FlowFile populated by arbitrarily chosen FlowFiles
Minimum Group Size The minimum size for the bundle
Minimum Number of Entries The minimum number of files to include in a bundle
Tar Modified Time If using the Tar Merge Format, specifies if the Tar entry should store the modified timestamp either by expression (e.g. $\{file.lastModifiedTime\} or static value, both of which must match the ISO8601 format 'yyyy-MM-dd'T 'HH:mm:ssZ'.
mergecontent-metadata-strategy For FlowFiles whose input format supports metadata (Avro, e.g.), this property determines which metadata should be added to the bundle. If 'Use First Metadata' is selected, the metadata keys/values from the first FlowFile to be bundled will be used. If 'Keep Only Common Metadata' is selected, only the metadata that exists on all FlowFiles in the bundle, with the same value, will be preserved. If 'Ignore Metadata' is selected, no metadata is transferred to the outgoing bundled FlowFile. If 'Do Not Merge Uncommon Metadata' is selected, any FlowFile whose metadata values do not match those of the first bundled FlowFile will not be merged.
Name Description
failure If the bundle cannot be created, all FlowFiles that would have been used to created the bundle will be transferred to failure
merged The FlowFile containing the merged content
original The FlowFiles that were used to create the bundle
Name Description
filename When more than 1 file is merged, the filename comes from the segment.original.filename attribute. If that attribute does not exist in the source FlowFiles, then the filename is set to the number of nanoseconds matching system time. Then a filename extension may be applied:if Merge Format is TAR, then the filename will be appended with .tar, if Merge Format is ZIP, then the filename will be appended with .zip, if Merge Format is FlowFileStream, then the filename will be appended with .pkg
merge.count The number of FlowFiles that were merged into this bundle
merge.bin.age The age of the bin, in milliseconds, when it was merged and output. Effectively this is the greatest amount of time that any FlowFile in this bundle remained waiting in this processor before it was output
merge.uuid UUID of the merged flow file that will be added to the original flow files attributes.
merge.reason This processor allows for several thresholds to be configured for merging FlowFiles. This attribute indicates which of the Thresholds resulted in the FlowFiles being merged. For an explanation of each of the possible values and their meanings, see the Processor's Usage / documentation and see the 'Additional Details' page.
Property Description
Attribute Strategy Determines which FlowFile attributes should be added to the bundle. If 'Keep All Unique Attributes' is selected, any attribute on any FlowFile that gets bundled will be kept unless its value conflicts with the value from another FlowFile. If 'Keep Only Common Attributes' is selected, only the attributes that exist on all FlowFiles in the bundle, with the same value, will be preserved.
correlation-attribute-name If specified, two FlowFiles will be binned together only if they have the same value for this Attribute. If not specified, FlowFiles are bundled by the order in which they are pulled from the queue.
max-bin-age The maximum age of a Bin that will trigger a Bin to be complete. Expected format is <duration> <time unit> where <duration> is a positive integer and time unit is one of seconds, minutes, hours
max-bin-size The maximum size for the bundle. If not specified, there is no maximum. This is a 'soft limit' in that if a FlowFile is added to a bin, all records in that FlowFile will be added, so this limit may be exceeded by up to the number of bytes in last input FlowFile.
max-records The maximum number of Records to include in a bin. This is a 'soft limit' in that if a FlowFIle is added to a bin, all records in that FlowFile will be added, so this limit may be exceeded by up to the number of records in the last input FlowFile.
max.bin.count Specifies the maximum number of bins that can be held in memory at any one time. This number should not be smaller than the maximum number of concurrent threads for this Processor, or the bins that are created will often consist only of a single incoming FlowFile.
merge-strategy Specifies the algorithm used to merge records. The 'Defragment' algorithm combines fragments that are associated by attributes back into a single cohesive FlowFile. The 'Bin-Packing Algorithm' generates a FlowFile populated by arbitrarily chosen FlowFiles
min-bin-size The minimum size of for the bin
min-records The minimum number of records to include in a bin
record-reader Specifies the Controller Service to use for reading incoming data
record-writer Specifies the Controller Service to use for writing out the records
Name Description
failure If the bundle cannot be created, all FlowFiles that would have been used to created the bundle will be transferred to failure
merged The FlowFile containing the merged records
original The FlowFiles that were used to create the bundle
Name Description
record.count The merged FlowFile will have a 'record.count' attribute indicating the number of records that were written to the FlowFile.
mime.type The MIME Type indicated by the Record Writer
merge.count The number of FlowFiles that were merged into this bundle
merge.bin.age The age of the bin, in milliseconds, when it was merged and output. Effectively this is the greatest amount of time that any FlowFile in this bundle remained waiting in this processor before it was output
merge.uuid UUID of the merged FlowFile that will be added to the original FlowFiles attributes
merge.completion.reason This processor allows for several thresholds to be configured for merging FlowFiles. This attribute indicates which of the Thresholds resulted in the FlowFiles being merged. For an explanation of each of the possible values and their meanings, see the Processor's Usage / documentation and see the 'Additional Details' page.
<Attributes from Record Writer> Any Attribute that the configured Record Writer returns will be added to the FlowFile.
Property Description
Destination Database Name The name of the Snowflake database where the data is being ingested to.
Merge Query Retry Count Indicates how many times the merge query should be retried if it fails.
Object Identifier Resolution Controls how source object identifiers (schemas, tables, columns) are stored and queried in Snowflake. This setting determines whether you will need to use double quotes in your SQL queries. The 'Case-Sensitive' option is the default, production behavior — 'Case-Insensitive' is considered preview for the time being.
Placeholder Value The value of the payload placeholder to look for in a MERGE. This will be converted to the destination column's data type.
Snowflake Connection Pool The Controller Service that is used to obtain a connection to the Snowflake database to perform merge operation.
Unchanged Value Strategy Determines how the MERGE query should handle unchanged values in journal columns. By default it expects full values.
Name Description
ddl DDL to execute.
deleted during compaction FlowFile deleted during compaction based on table name and generation.
failure Failure query execution.
failure retry Retry failure query execution.
poll query result Scheduled async query execution.
success Success query execution.
unknown file type Unknown file type.
Name Description
merge.query.id The ID of the query that is used to merge the journal table into the target table.
Display Name API Name Default Value Allowable Values Description
Client ID * Client ID The Client ID for the Microsoft Graph API
Refresh Window * Refresh Window 5 s The service will attempt to refresh tokens expiring within the refresh window, subtracting the configured duration from the token expiration.
SSL Context Service * SSL Context Service An instance of SSLContextProvider configured with a certificate and a private key which will be used to sign the JWT assertion. The keys must use RSA algorithm.
Tenant ID * Tenant ID The Tenant ID for the Microsoft Graph API
Token Scope * Token Scope The scope of the requested token.For Graph API should be: [https://graph.microsoft.com/.defaultFor](https://graph.microsoft.com/.defaultFor) Sharepoint should in the following format: [https://organization.sharepoint.com/.default](https://organization.sharepoint.com/.default)
Web Client Service * Web Client Service The Web Client Service to retrieve access tokens.
Display Name API Name Default Value Allowable Values Description
Authentication Mechanism * Authentication Mechanism Client Secret - Client Secret - Username / Password The mechanism to use for authenticating with the Microsoft Graph API
Client ID * Client ID The Client ID for the Microsoft Graph API
Client Secret * Client Secret The Client Secret for the Microsoft Graph API
Password * Password The password to use for authentication
Tenant ID * Tenant ID The Tenant ID for the Microsoft Graph API
Username * Username The username to use for authentication
Aspect Legacy connector New connector
Entities Issues only (with optional worklog enrichment). Core flow: issues, projects, users, comments, changelogs, worklogs, votes, watchers, remote links, security schemes, permissions, project components, project versions, user groups, deleted issues. Agile flow: boards, sprints, board-sprint, board-project, board-issue mappings.
Concurrency Single-threaded. Parallel per-project issue fetching, with optional multi-node distribution.
Schema strategy Raw JSON in an `OBJECT` column with a dynamically generated flattened view. Explicit column schemas per entity, evolved additively from the API responses.
Deletion tracking Not supported. Tracks deleted issues via Jira audit log polling (optional).
Agile data Not supported. Available through a separate agile flow.
Aspect Legacy connector New connector
Issues table Single table with an `ISSUE` column containing the full raw JSON as an `OBJECT` type. A flattened `_VIEW` is auto-generated. Explicit columns per field. Column names are derived from Jira field display names. No raw JSON fallback.
Other entities Not available. Comments and worklogs are embedded in the issue JSON. Separate tables: `PROJECT`, `USER`, `FIELD`, `COMMENT`, `CHANGELOG`, `WORKLOG`, `ISSUE_VOTE`, `ISSUE_WATCHER`, `ISSUE_REMOTE_LINK`, `ISSUE_SECURITY_SCHEME`, `PERMISSION`, `PROJECT_COMPONENT`, `PROJECT_VERSION`, `USER_GROUP`, `DELETED_ISSUE`, `BOARD`, `SPRINT`, `BOARD_SPRINT`, `BOARD_PROJECT`, `BOARD_ISSUE`. See [](#label-jira-entities) for the full inventory.
Views Auto-generated `_VIEW` with all issue fields flattened. Any queries that reference the legacy `ISSUE` column (for example, `SELECT issue:fields:summary`) or the auto-generated `_VIEW` must be rewritten to use the new column names directly (for example, `SELECT SUMMARY`). ### Parameter changes The following parameters from the legacy connector are not available in the new connector:
The following parameters are introduced in the new connector:
Additionally, agile data (boards, sprints, and board mappings) is now available through a separate agile flow rather than a parameter toggle. See [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-agile) for details on installing and configuring the agile flow. ### API token scopes If you're using API tokens with scopes, the new connector may require additional scopes depending on the features you enable. See [](#label-jira-core-api-scopes) for the core flow scopes and [](#label-jira-agile-api-scopes) for the agile flow scopes. ### Snowflake privileges The new connector requires only `CREATE TABLE` on the destination schema. The legacy connector additionally required `CREATE VIEW` to create flattened issue views. The new connector doesn't create views, so the `CREATE VIEW` privilege is no longer needed. If you're reusing an existing role, you can revoke `CREATE VIEW` after the legacy connector is decommissioned. ## Migration steps 1. **Set up the new connector.** Install the core flow on the same or a different Openflow runtime. If you need agile data, also install the agile flow. Configure both flows to write to a **different destination schema** than the one used by the legacy connector. This allows the legacy and new connectors to run simultaneously. 2. **Map your legacy configuration to the new parameters.** - Copy the `Jira Email`, `Jira API Token`, and `Environment URL` values from the legacy connector to the new core flow. If using the agile flow, configure these values separately for that flow as well. - If the legacy connector uses `Project Names`, convert them to project keys for the `Project Keys Filter` parameter. - If the legacy connector uses a `JQL Query`, evaluate whether `Project Keys Filter` covers your use case. If your JQL filters by criteria other than project (for example, status or custom fields), those filters aren't available in the new connector. All matching issues from the configured projects are ingested. - Set `Issue Fields` to match your previous configuration. The default changed from `*all` (legacy) to `*standard`. - Configure the Snowflake destination parameters (database, schema, warehouse, credentials) for each flow. 3. **Start the new connector.** Run the core flow and allow the initial load to complete. If using the agile flow, start it as well. 4. **Validate the data.** Compare the data in the new destination tables against the legacy destination table to check for completeness. Expect some differences: the legacy connector didn't track deletes, so issues that were deleted in Jira still appear in the legacy table but not in the new `ISSUE` table (or they appear with `_SNOWFLAKE_DELETED = TRUE` if delete tracking is enabled). Row counts will not match exactly when any issues have been deleted. ```sql -- Compare issue counts (expect differences if issues were deleted in Jira) SELECT COUNT(*) AS legacy_count FROM legacy_schema.JIRA_ISSUES; SELECT COUNT(*) AS new_count FROM new_schema.ISSUE; -- Spot-check specific issues SELECT KEY, SUMMARY, STATUS FROM new_schema.ISSUE WHERE KEY = 'PROJ-123'; ``` 5. **Update downstream queries.** Rewrite any queries, views, dashboards, or pipelines that reference the legacy table structure. Key changes: - Replace references to the legacy `ISSUE` `OBJECT` column or `_VIEW` with direct column references. - Replace `FLATTEN`-based queries with standard `SELECT` statements. - Add `JOIN` statements to combine data across the new entity tables (for example, join `ISSUE` with `COMMENT` on `ISSUE_ID`). - If you want queries to ignore deleted issues, filter on the new `_SNOWFLAKE_DELETED` column (`WHERE _SNOWFLAKE_DELETED = FALSE`). The legacy connector didn't track deletes at all, so legacy queries against `JIRA_ISSUES` returned issues that had since been removed in Jira. 6. **Stop the legacy connector.** Once you've confirmed that the new data is complete and downstream consumers have been updated, stop the legacy connector process group. Both new flows (core and agile) can continue running independently. 7. **Clean up.** Optionally, drop the legacy destination table and view after confirming they're no longer needed. When the legacy connector and the new connector use the same Jira API token, they share the same Jira API rate limits. Running both simultaneously roughly doubles the API call volume, which may cause rate limiting on Jira instances with heavy API usage. Consider reducing the legacy ingestion frequency during the migration period, or run the new connector with a separate API token whose rate budget you can manage independently. --- title: ModifyBytes 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/modifybytes.md section: Loading & Unloading Data --- # ModifyBytes 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Discard byte range at the start and end or all content of a binary file. ## Tags binary, discard, keep ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: ModifyCompression 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/modifycompression.md section: Loading & Unloading Data --- # ModifyCompression 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-compress-nar ## Description Changes the compression algorithm used to compress the contents of a FlowFile by decompressing the contents of FlowFiles using a user-specified compression algorithm and recompressing the contents using the specified compression format properties. This processor operates in a very memory efficient way so very large objects well beyond the heap size are generally fine to process ## Tags brotli, bzip2, compress, content, deflate, gzip, lz4-framed, lzma, recompress, snappy, snappy framed, snappy-hadoop, xz-lzma2, zstd ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: MongoDBControllerService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/mongodbcontrollerservice.md section: Loading & Unloading Data --- # MongoDBControllerService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a controller service that configures a connection to MongoDB and provides access to that connection to other Mongo-related components. ## Tags mongo, mongodb, service ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: MongoDBLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/mongodblookupservice.md section: Loading & Unloading Data --- # MongoDBLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a lookup service based around MongoDB. Each key that is specified will be added to a query as-is. For example, if you specify the two keys, user and email, the resulting query will be \{ "user": "tester", "email": "[tester@test.com](mailto:tester@test.com)" \}. The query is limited to the first result (findOne in the Mongo documentation). If no "Lookup Value Field" is specified then the entire MongoDB result document minus the _id field will be returned as a record. ## Tags lookup, mongo, mongodb, record ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: Monitor connectors using the Openflow Connectors Dashboard source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors-dashboard.md section: Loading & Unloading Data --- " /> # Monitor connectors using the Openflow Connectors Dashboard This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) The Openflow Connectors Dashboard provides a high-level view of all installed connectors, health snapshots, and key performance indicators, such as the aggregated average throughput and total data ingested by all connectors matching the filter criteria. ## Prerequisites To use the Openflow Connectors Dashboard, the following prerequisites must be met: - You need at least read-only permissions on the event table. - You must have the following minimum Openflow versions: - BYOC deployment: 1.36.0 - Snowflake deployment: 1.26.0 - Runtime: 2026.3.17.13 - You must have the following minimum connector versions. These versions apply to change data capture (CDC) connectors only. Other connector types don't have a minimum version requirement for dashboard support.
See [](/user-guide/data-integration/openflow/version-history) for more information. ## Access the Openflow Connectors Dashboard 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **Ingestion** %raa% **Openflow** and navigate to the **Connector Observability** tab. The Openflow Connectors Dashboard appears. ## The Openflow Connectors Dashboard overview The Openflow Connectors Dashboard displays the following information:
**Status**
Shows the number of connectors with the following statuses: - **Healthy**: Didn't encounter any errors during the selected time period. - **Unhealthy**: Logged errors in the event table during the selected time period or has one or more tables in **Failed** state (change data capture (CDC) connectors only). - **Upgrade required**: Openflow deployment, runtime, or connector aren't running the minimum required versions to display health and performance metrics. Review the version prerequisites and upgrade as needed.
**Average throughput**
Measures the rate at which data is read from source systems and sent to Snowflake across all connectors. - The **Average throughput** %raa% **Ingested** metric measures how fast data is sent to Snowflake across all connectors that match the primary filter criteria (time frame and event table). - The **Average throughput** %raa% **Read** metric measures how fast Openflow reads data from source systems across all connectors that match the primary filter criteria (time frame and event table).
**Total data ingested**
Shows how much data all connectors that match the primary filter criteria for time frame and event table have sent to Snowflake during the selected time period. Use this metric to quickly identify ingestion anomalies over a specific time period.
For custom telemetry queries beyond the dashboard, see [](/user-guide/data-integration/openflow/monitor). - **Total data ingested** and **Average throughput** metrics include both raw payload and structural overhead such as JSON keys, braces, and delimiters. Because these metrics track the total transmitted volume, these figures might be higher than the uncompressed data reported by Snowpipe Streaming or the final storage volume in your destination table. - The connectors appear in the list if they match the selected filter criteria and have recorded telemetry events during the selected time frame. - If you examine longer time frames, the list might show connectors that were previously deleted. For example, you deployed a connector six days ago, and then deleted that connector two days ago. If you set the time frame to **Last 7 days**, the connector appears in the list because it recorded telemetry events in the last 7 days. ### Filtering connectors The Openflow Connectors Dashboard supports the following filters:
**Event table**
The Openflow connectors event table you want to monitor. This filter displays event tables that are associated with at least one Openflow deployment, as well as the default event table and the account event table. You can select only one event table at a time. Event table views are also supported. The event table is set when you set up Openflow. To view the event table associated with an Openflow deployment, use the [](/sql-reference/sql/desc-oflow-data-plane-integration) command. See [Set up Openflow - Snowflake Deployment](/user-guide/data-integration/openflow/setup-openflow-spcs-deployment) or [](/user-guide/data-integration/openflow/setup-openflow-byoc) for more information on configuring event tables.
Time frame
Use this filter to identify relevant connectors in a specific time frame. To get the most up-to-date results about the connector health, select the **Last Hour** time period.
**Status**
Enables filtering for **Healthy**, **Unhealthy**, or **All** connectors.
**Source**
Enables filtering by the source system based on known deployed connectors. The filter only shows sources that are used by your connectors.
**Deployment**
Enables filtering by Snowflake Openflow deployments. This filter displays data plane integration names, which are composed of the prefix `OPENFLOW_DATAPLANE_` followed by the deployment ID. To find the deployment ID, navigate to Openflow, select the **Deployments** tab, then select **View Details**.
**Runtime**
Enables filtering by Snowflake Openflow runtimes. This filter displays the runtime keys. To match runtime keys with Openflow runtime names in the UI, navigate to Openflow, select the **Runtimes** tab, then select **View Details**, and find the corresponding key.
**Type**
Enables filtering by connector type: Databases, SaaS, Streaming, Unstructured, Other.
- Primary filters (event table and time frame) are applied before secondary filters (status, source, deployment, runtime, or type). - The secondary filters (status, source, deployment, runtime, type) don't apply to the throughput and data ingested visuals. ## Monitoring Openflow connectors To monitor the connector details, select %vertical-more-icon% %raa% **View Details**. ### Change data capture connectors The details page shows the following information for each table that is part of the change data capture configuration:
**Table replication status**
Tables can either be in **Active** or **Failed** replication status. The replication status is based on the most recent telemetry event that is available for the table. Events that cause replication to fail for a table immediately result in a **Failed** replication status in the dashboard. Use the **Failure Reason** message to identify the issue.
**Error distribution**
Helps you understand when the connector experienced issues, so that you can identify any potential problems with source systems, connector configuration, or the Snowflake destination.
**Table name**
Shows the schema and table names for all tables that are configured to be replicated by the connector. The list matches the **Included Table Names** or **Included Table Regex** configuration parameters of the connector.
**Replication status**
Shows whether each table is in **Active** or **Failed** replication status.
**Replication phase**
Shows the current table replication phase. After configuration in the connector, tables enter the **New** replication phase, progress to the **Snapshot Load** phase, perform the initial load, and ultimately enter the **Incremental Replication** phase when individual change data capture events are processed.
**Last Ingested**
Shows the timestamp of the last inserted record into the destination table during the selected time frame. When looking at this metric, consider a short delay between the records being ingested and events being logged and available to query.
You can use the **Replication status**, **Replication phase**, and time frame filters to narrow down the table list. ### All connectors
**Connector status**
Shows the connector health status: **Healthy** if no error messages were encountered during the selected time frame, or **Unhealthy** if any error messages were encountered.
**Error distribution**
Shows a count of how many errors this connector encountered during the selected time period.
**Average throughput**
Measures the rate at which data is read from source systems and ingested into Snowflake for the selected connector. - The **Average throughput** %raa% **Ingested** metric measures how fast the selected connector ingests data into Snowflake. - The **Average throughput** %raa% **Read** metric measures how fast the selected connector reads data from source systems.
**Total data ingested**
Shows how much data the selected connector has ingested into Snowflake during the selected time period. Use this metric to quickly identify ingestion anomalies over a specific time period.
### Custom flows Custom flows built on the Openflow canvas can also be monitored on the dashboard, but only if they are actively version-controlled in a customer Git repository using the Openflow Git integration. Flows that aren't version-controlled don't appear in the dashboard. For more information, see [](/user-guide/data-integration/openflow/version-control-custom-flows). ## Debugging Openflow connectors The Openflow Connectors Dashboard serves as an entry point for debugging connector-specific issues and makes all connector logs easily accessible to users. ### Viewing the connector errors To view all errors that a connector encountered in the selected time frame, first navigate to the connector details page by selecting %vertical-more-icon% %raa% **View Details**, and then select the **Issues** tab. The error headline tells you what type of error the connector encountered, and the content provides the entire stacktrace of the error. ### Viewing the connector logs You might also want to look at additional connector logs to understand the context around an error message. To view all logs for the selected connector, select %vertical-more-icon% %raa% **View logs**. After you open the log explorer, you can also change the filters to view logs for different connectors or for entire runtimes or deployments. The log explorer supports Openflow-specific filters like the dataplane ID, the runtime key, and the process group ID. ### Accessing the Openflow canvas When you identify a connector issue, you probably need to navigate to the Openflow canvas to fix it; for example, adjust some configuration parameters or upgrade to a newer connector version. To navigate to the selected connector in the Openflow canvas, select %vertical-more-icon% %raa% **Go to canvas**. ## Optimizing performance ### Select a larger warehouse Use the warehouse selector in the top right section of the screen to choose a different warehouse to run the queries. While larger warehouses run queries faster, they take longer to resume, which might increase the initial page load time. ### Set up clustering on the Openflow event table By using clustering keys, you can avoid unnecessary scanning of micro-partitions during querying, significantly accelerating the performance of queries that reference these columns. For more information, see [](#label-data-clustering). Run the following query, replacing the placeholders with your Openflow event table: ```sqlsyntax ALTER TABLE .. CLUSTER BY ( DATE_TRUNC('HOUR', timestamp), RECORD_TYPE, CAST(record_attributes:"metricNameHash" AS STRING) ); ``` - Automatic clustering consumes Snowflake credits using serverless compute resources. To learn how many credits per compute-hour are consumed, refer to the "Serverless Feature Credit Table" in the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf). - After you enable clustering on your event table, a background process starts that takes some time to complete. After the process is complete, you should see improved performance when using the dashboard. ### Reduce the queried time frame Selecting a smaller time frame in the filter scans less data and leads to faster query performance. Use the **Last Hour** filter for the best performance and the most up-to-date view of your connector health and performance. ## Limitations - The Openflow Connectors Dashboard uses data stored in event tables to provide insight into Openflow connectors. Depending on the selected time period and event table, information provided on the dashboard might not reflect the current status of a connector. - Detailed health monitoring is currently only available for Database CDC connectors. - The **Deployment** and **Runtime** filters use internal names that differ from the display names in the Openflow UI. For details on matching these names, see [Filtering connectors](#label-openflow-dashboard-filtering). ## Known issues - After upgrading the deployment, runtime, and connector to the versions mentioned in the prerequisites, the error count metric is only accurate for errors encountered after the upgrade. --- title: Monitor Openflow source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/monitor-overview.md section: Loading & Unloading Data --- # Monitor Openflow This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/setup-openflow-byoc) - [](/user-guide/data-integration/openflow/setup-openflow-spcs) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) Openflow provides two approaches for monitoring your data integration pipelines:
[](/user-guide/data-integration/openflow/connectors-dashboard)
Use the Openflow Connectors Dashboard in Snowsight to get a high-level view of connector health, throughput, and data ingestion. The dashboard provides filtering, error distribution, and per-connector detail pages.
[](/user-guide/data-integration/openflow/monitor)
Query the Openflow telemetry data stored in your event table to monitor logs, application metrics, JVM and system metrics, and build custom queries tailored to your environment.
--- title: Monitor Openflow using telemetry data source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/monitor.md section: Loading & Unloading Data --- # Monitor Openflow using telemetry data This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/setup-openflow-byoc) - [](/user-guide/data-integration/openflow/setup-openflow-spcs) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/processors/index) - [](/user-guide/data-integration/openflow/controllers/index) This topic describes how to monitor the state of Openflow and troubleshoot problems. ## Accessing Openflow logs Snowflake sends Openflow logs to the event table you configured when you set up Openflow ([BYOC](#label-openflow-event-table) | [Snowflake deployment](#label-openflow-spcs-event-table)). Snowflake recommends that you include a timestamp in the WHERE clause of event table queries. This is particularly important because of the potential volume of data generated by various Snowflake components. By applying filters, you can retrieve a smaller subset of data, which improves query performance. To get started quickly with Openflow's telemetry, see [Example Queries](#label-openflow-example-queries) below. ## Openflow Telemetry Schema For information about the event table columns, see [](/developer-guide/logging-tracing/event-table-columns). The following sections describe how Openflow structures telemetry in an Event Table. ### Resource Attributes Describes the event metadata set by Openflow. For general information on other types of resource attributes see [](#label-event-table-resource-attributes-column) in the Event Table columns documentation.
Resource Attributes Example:
```json { "application": "openflow", "cloud.service.provider": "aws", "container.id": "a1b2c3d4e5f6", "container.image.name": "example-openflow-prod.registry-internal.snowflakecomputing.com/openflow/openflow/openflow_repo/runtime-server", "container.image.tag": "2026.3.17.13", "deployment.version": "1.35.0", "k8s.container.name": "pg-dev-server", "k8s.container.restart_count": "0", "k8s.namespace.name": "runtime-pg-dev", "k8s.node.name": "ip-10-10-62-36.us-east-2.compute.internal", "k8s.pod.name": "pg-dev-0", "k8s.pod.start_time": "2025-04-25T22:14:29Z", "k8s.pod.uid": "94610175-1685-4c8f-b0a1-42898d1058e6", "openflow.dataplane.id": "abeddb4f-95ae-45aa-95b1-b4752f30c64a" } ```
### Scope
Scope Example:
```json { "name": "runtime" } ```
### Record Type Depending on the type of Openflow telemetry represented by this row, this will be one of: - LOG - METRIC Openflow does not collect TRACE records, but that is also a valid type for this column in Snowflake Event Tables. ### Record Optional. This JSON object describes the type of metric represented by this row.
Record Example:
```json { "metric": { "name": "connection.queued.duration.max", "unit": "millisecond" }, "metric_type": "gauge", "value_type": "INT" } ```
### Record Attributes #### Logs Record attributes for Logs will typically indicate where this log was sourced. For example, logs from an Openflow Runtime named *testruntime* could have Record Attributes of:
```json { "log.file.path": "/var/log/pods/runtime-testruntime_testruntime-0_66d80cdb-9484-40a4-bdba-f92eb0af14c7/testruntime-server/0.log", "log.iostream": "stdout", "logtag": "F" } ```
#### System Metrics System metrics like CPU usage will typically not set Record Attributes, so this will be *null*. #### Openflow Application Metrics Record Attributes for Application or "Flow" metrics provide details about the component in the data pipeline that produced the metric. This will vary based on the type of component. See [Application Metrics](#label-openflow-application-metrics)
```json { "component": "PutSnowpipeStreaming", "execution.node": "ALL", "group.id": "c052f9d7-7f76-3013-a2c5-d3b064fa7326", "id": "c69e2913-22a9-36bb-a159-6a5ed1fb9d63", "name": "PutSnowpipeStreaming", "type": "processor" } ```
### Value This column contains the raw value of the telemetry. For metrics, this will be a numeric value (integer or double). For logs, this will either be a semi-structured string value or a well-formatted JSON string. #### Openflow Runtime Logs Openflow Runtimes emit most logs as JSON, so applying Snowflake's [](/sql-reference/functions/try_parse_json) to the *VALUE* column allows you to further break this value into the following structured fields:
## Application Metrics The following list covers all application metrics available for Openflow Runtimes. Runtimes only emit a subset of metrics relevant to Openflow Connectors to persist in a Snowflake Event Table. Snowflake's OpenTelemetry Reporting Task can send some or all metrics to any OTLP destination. ### Connection Metrics
### Connection Record Attributes Each Connection metric includes the following Record Attributes:
### Input and Output Port Metrics Input Port and Output Ports are technically two separate types of components. For consistency, metrics and attributes for Input and Output Ports are the same, with the exception of the *type* attribute that indicates whether it is an input port or an output port.
### Input and Output Port Record Attributes Each Port metric includes the following Record Attributes:
### Process Group Metrics
### Process Group Record Attributes Each Process Group metric includes the following Record Attributes:
### Processor Metrics
### Processor Record Attributes Each Processor metric includes the following Record Attributes:
### Additional Attributes for Counters​ In addition to the standard Processor attributes above, *processor.counter* metrics include the following:
### Remote Process Group Metrics
### Remote Process Group Record Attributes Each Remote Process Group metric includes the following Record Attributes:
### JVM Metrics
### JVM Record Attributes JVM metrics do not provide Record Attributes. ### CPU Metrics
### CPU Record Attributes
### Storage Metrics
### Storage Record Attributes
## Example Queries The following queries are examples to get you started with Openflow Telemetry. All queries assume that Openflow is configured to send telemetry to the default Event Table of *SNOWFLAKE.TELEMETRY.EVENTS*. If your Snowflake Account or Openflow Deployment is configured with a different Event Table, substitute that table name where you see *SNOWFLAKE.TELEMETRY.EVENTS*. ### Find Stuck FlowFiles This query returns connections with FlowFiles that have been queued for more than some threshold, indicating that they may be stuck and require intervention. Adjust the 30 minute threshold as needed for your use case. ```sql SELECT * FROM ( SELECT resource_attributes:"openflow.dataplane.id" as Deployment_ID, resource_attributes:"k8s.namespace.name" as Runtime_Key, record_attributes:name as Connection_Name, record_attributes:id as Connection_ID, MAX(TO_NUMBER(value / 60 / 1000)) as Max_Queued_File_Minutes FROM snowflake.telemetry.events WHERE true AND record_type = 'METRIC' AND record:metric:name = 'connection.queued.duration.max' AND timestamp > dateadd(minutes, -30, sysdate()) GROUP BY 1, 2, 3, 4 ORDER BY Max_Queued_File_Minutes DESC ) WHERE Max_Queued_File_Minutes > 30; ``` ### Find Error Logs for Openflow Runtimes ```sql SELECT timestamp, Deployment_ID, Runtime_Key, parsed_log:level as log_level, parsed_log:loggerName as logger, parsed_log:formattedMessage as message, parsed_log FROM ( SELECT timestamp, resource_attributes:"openflow.dataplane.id" as Deployment_ID, resource_attributes:"k8s.namespace.name" as Runtime_Key, TRY_PARSE_JSON(value) as parsed_log FROM snowflake.telemetry.events WHERE true AND timestamp > dateadd('minutes', -30, sysdate()) AND record_type = 'LOG' AND resource_attributes:"k8s.namespace.name" like 'runtime-%' ORDER BY timestamp DESC ) WHERE log_level = 'ERROR'; ``` ### Find Running and Non-Running Processors Some flows expect that all processors are in a "running" state, even if they are not actively processing data. This query helps you find any processors that are running or in another state, such as: - stopped - invalid - disabled ```sql SELECT timestamp, resource_attributes:"openflow.dataplane.id" as Deployment_ID, resource_attributes:"k8s.namespace.name" as Runtime_Key, record_attributes:component as Processor, record_attributes:id as Processor_ID, TO_NUMBER(value) as Running FROM snowflake.telemetry.events WHERE true AND record:metric:name = 'processor.run.status.running' AND record_type = 'METRIC' AND timestamp > dateadd(minutes, -30, sysdate()); ``` ### Find High CPU Usage for Openflow Runtimes Slow data flows or reduced throughput may be the result of a bottleneck on the CPU. Openflow Runtimes scale up automatically, based on the number of minimum and maximum nodes you have configured. If an Openflow Runtime is using its maximum number of nodes and still CPU usage remains high, consider: 1. Increasing the maximum number of nodes allocated to the Runtime 2. Troubleshoot the Connector or flow to identify the bottleneck Snowsight Charts provide an easy way to visualize query results for CPU usage over time. ```sql SELECT timestamp, resource_attributes:"openflow.dataplane.id" as Deployment_ID, resource_attributes:"k8s.namespace.name" as Runtime_Key, resource_attributes:"k8s.pod.name" as Runtime_Pod, TO_NUMBER(value, 10, 3) * 100 as CPU_Usage_Percentage FROM snowflake.telemetry.events WHERE true AND timestamp > dateadd(minute, -30, sysdate()) AND record_type = 'METRIC' AND record:metric:name ilike 'container.cpu.usage' AND resource_attributes:"k8s.namespace.name" ilike 'runtime-%' AND resource_attributes:"k8s.container.name" ilike '%-server' ORDER BY timestamp desc, CPU_Usage_Percentage desc; ``` --- title: MonitorActivity 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/monitoractivity.md section: Loading & Unloading Data --- # MonitorActivity 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Monitors the flow for activity and sends out an indicator when the flow has not had any data for some specified amount of time and again when the flow's activity is restored ## Tags active, activity, detection, flow, inactive, monitor ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
--- title: MoveAzureDataLakeStorage 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/moveazuredatalakestorage.md section: Loading & Unloading Data --- # MoveAzureDataLakeStorage 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Moves content within an Azure Data Lake Storage Gen 2. After the move, files will be no longer available on source location. ## Tags adlsgen2, azure, cloud, datalake, microsoft, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.azure.storage.DeleteAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/deleteazuredatalakestorage) - [org.apache.nifi.processors.azure.storage.FetchAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/fetchazuredatalakestorage) - [org.apache.nifi.processors.azure.storage.ListAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/listazuredatalakestorage) --- title: Notify 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/notify.md section: Loading & Unloading Data --- # Notify 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Caches a release signal identifier in the distributed cache, optionally along with the FlowFile's attributes. Any flow files held at a corresponding Wait processor will be released once this signal in the cache is discovered. ## Tags cache, distributed, map, notify, release, signal ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.Wait](/user-guide/data-integration/openflow/processors/wait) --- title: Object definition overrides for the Openflow Connector for Shopify source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/shopify/object-definitions.md section: Loading & Unloading Data --- # Object definition overrides for the %shopifyof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/shopify/setup) This topic describes the **Object Definitions Override** parameter in detail, including the full schema, promoted column and child field definitions, and a complete example. The **Object Definitions Override** parameter accepts a JSON array of object definitions. Each definition can add a new object type or fully replace an existing catalog entry. ## Object definition schema The **Object Definitions Override** value must be valid JSON. If the JSON is malformed, the connector fails to start. Validate your JSON before applying the override. The following fields are supported in each object definition:
## Promoted columns Promoted columns extract specific values from the raw JSON payload into dedicated typed columns in the destination table. This makes frequently queried fields available as first-class Snowflake columns for efficient filtering and aggregation. Each promoted column has the following fields:
## Child fields Child field definitions extract nested connections (such as order line items) into separate Snowflake tables. Each child table includes a `__PARENT_ID` column linking records back to the parent. Each child field has the following fields:
## Example: Register a custom object type with promoted columns The following override customizes an existing catalog entry to add scalar fields, nested object selections, a metafield alias, and promoted columns. One promoted column extracts a value directly from the aliased metafield. ```json [ { "apiType": "draftOrders", "tableName": "DRAFT_ORDERS", "gidTypeName": "DraftOrder", "supportsBulk": true, "supportsIncremental": true, "incrementalField": "updatedAt", "ignoredFields": [], "sortKeys": ["UPDATED_AT", "ID"], "supportsDeletes": false, "graphqlFields": [ "id", "createdAt", "updatedAt", "name", "status", "email", "currencyCode", "totalQuantityOfLineItems", "customer { id }", "totalPriceSet { shopMoney { amount currencyCode } }", "billingAddress { address1 city countryCode zip }", "draft_po_number: metafield(key: \"custom.draft_po_number\") { key namespace compareDigest createdAt id jsonValue legacyResourceId updatedAt value definition { id description key pinnedPosition } }" ], "promotedColumns": [ { "name": "STATUS", "path": "$.status", "type": "string" }, { "name": "NAME", "path": "$.name", "type": "string" }, { "name": "CUSTOMER_ID", "path": "$.customer.id", "type": "gid" }, { "name": "TOTAL_PRICE_AMOUNT", "path": "$.totalPriceSet.shopMoney.amount", "type": "money" }, { "name": "DRAFT_PO_NUMBER", "path": "$.draft_po_number.value", "type": "string" } ], "childFields": [] } ] ``` ## Example: Override an object with child fields The following override customizes the `orders` object to extract line items and fulfillments into separate tables. Line items use a paginated connection (`edges`); fulfillments are an inline array in the parent response (`array`). ```json [ { "apiType": "orders", "tableName": "ORDERS", "gidTypeName": "Order", "graphqlFields": [ "id", "createdAt", "updatedAt", "name", "email", "lineItems(first: 250) { edges { cursor node { id title quantity originalUnitPriceSet { shopMoney { amount currencyCode } } } } }", "fulfillments { id status createdAt }" ], "childFields": [ { "fieldName": "lineItems", "tableName": "ORDER_LINE_ITEMS", "gidTypeName": "LineItem", "connectionType": "edges" }, { "fieldName": "fulfillments", "tableName": "ORDER_FULFILLMENTS", "gidTypeName": "Fulfillment", "connectionType": "array" } ] } ] ``` --- title: OpenAiTranscribeAudio 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/openaitranscribeaudio.md section: Loading & Unloading Data --- # OpenAiTranscribeAudio 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-openai-nar ## Description Transcribes audio into English text. The audio data must be in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm ## Tags audio, flac, m4a, mp3, mp4, mpeg, mpga, ogg, openai, openflow, speech-to-text, text, transcribe, translate, wav, webm ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Use Cases Involving Other Components | Create embeddings for audio data and insert them into Pinecone so that the audio can be made available to a large language model (LLM) such as OpenAI's GPT models. | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --- title: Openflow BYOC - Set up custom ingress source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-byoc-custom-ingress.md section: Loading & Unloading Data --- # Openflow BYOC - Set up custom ingress This feature is not available in the People's Republic of China. Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about-byoc) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/troubleshoot) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the considerations for and steps required to set up an Openflow BYOC deployment with a custom ingress solution managed within your own AWS account. ## Benefits Custom ingress for Openflow BYOC deployments provides your organization with: - Stronger security with network-level restrictions that can limit access to only your VPN or private network. - Full control over the URL and TLS certificate used to access Openflow to meet your security and compliance requirements. ## Considerations With Snowflake managed ingress, Openflow creates the necessary DNS records, public load balancer, and manages the TLS certificate for the Openflow runtimes in your BYOC deployment. When you enable custom ingress, Openflow will no longer automatically manage external DNS records, will not create a public load balancer automatically, and will no longer manage certificates for the Openflow runtimes. You must manage these resources within your own AWS account. ![Openflow Managed Ingress compared with Custom Ingress, highlighting the additional requirements for DNS, load balancers, and certificates.](/static/images/connectivity/openflow-byoc-custom-ingress.svg) ## Configure custom ingress in Snowflake Openflow 1. Enable custom ingress during deployment creation. - During deployment creation, enable **Custom ingress** and specify your preferred fully qualified domain name (FQDN) in the **Hostname** field. - You must be able to manage this DNS record and create a TLS certificate for this FQDN. Do not use a subdomain of `snowflakecomputing.com`. - You must not include the protocol **https://** or a trailing slash **/** in the FQDN. - For example, if you specify `openflow01.your-domain.org`, you will access a runtime named "My Runtime" at `https://openflow01.your-domain.org/my-runtime/nifi/`. 2. Download the CloudFormation template. This file has all of the settings required for Openflow to run as your custom ingress domain. ## Configure custom ingress in AWS `{deployment-key}` represents the Openflow unique identifier applied to cloud resources created and managed by Openflow for a particular deployment. This is in the `DataPlaneKey` parameter of the CloudFormation template, also available in Openflow through the **View Details** menu option for the deployment. 1. Add the following tag to the private subnets for your Openflow deployment: - Key: **kubernetes.io/role/internal-elb** - Value: `1` 2. If your private subnets are used by other EKS clusters, you must also tag them with the name of the Openflow cluster. This allows Openflow to create a load balancer alongside other load balancers. - Key: **kubernetes.io/cluster/\{deployment-key\}** - Value: `1` 3. Upload the CloudFormation template. Wait approximately 30 minutes for Openflow to create the internal network load balancer. - You can find the internal network load balancer in the AWS Console under **EC2** %ra% **Load Balancers**. - The load balancer will be named `runtime-ingress-{deployment-key}`. 4. Obtain the internal IP address of the Openflow-managed AWS internal network load balancer. - Under **EC2** %ra% **Load Balancers**, navigate to the details page and copy the **DNS name** of the Load Balancer. - Log into your agent EC2 instance (identified as **openflow-agent-\{deployment-key\}**) and run the command `nslookup {openflow-load-balancer-dns-name}`. - Copy the IP addresses of the Openflow-managed AWS internal network load balancer. These are destinations for the target group of the load balancer you will create in a following step. 5. Provision a TLS certificate. - Obtain a TLS certificate for the load balancer that will handle traffic to the Openflow runtime UIs. You can generate a certificate using AWS Certificate Manager (ACM) or import an existing certificate. 6. Create a network load balancer that will route traffic to the Openflow-managed AWS internal network load balancer. 1. In your AWS account, create a Network Load Balancer with the following configuration: - Name: We recommend the naming convention `custom-ingress-external-{deployment-key}`, where `{deployment-key}` is the key of your Openflow deployment. - Type: **Network Load Balancer** - Scheme: **Internal** or **Internet-facing**, depending on your requirements. - VPC: Select the VPC of your deployment - Availability Zones: Select both Availability Zones where your Openflow deployment is running. - Subnets: Select the private subnets of your VPC for an **Internal** Load Balancer, or the public subnets of your VPC for an **Internet-facing** Load Balancer. - Security groups: Select or create a security group that allows traffic on port `443` - Default SSL/TLS server certificate: Import your SSL/TLS certificate - Target group: Create a new target group with the following settings: - Target type: **IP addresses** - Protocol: **TLS** - Port: **443** - VPC: Verify the VPC matches your deployment - Type the IP address of the internal network load balancer created by Openflow (obtained in the previous step) as the target and select **Include as pending below**. 2. Once the load balancer is created, copy the DNS name for the load balancer to use in the next step. 3. For more information on how to create a network load balancer, see [Create a Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/create-network-load-balancer.html). 7. Create a DNS CNAME record that maps your custom ingress FQDN to the AWS load balancer's DNS name. - For detailed DNS configuration instructions in Route 53, see [Create records in Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resource-record-sets-creating.html). ## Verification 1. The Openflow deployment shows a status of **Active** in the **Deployments** page. 2. Create a runtime in the Openflow deployment. 3. Once the runtime is **Active**, click on the runtime name or use the **View canvas** menu option to access the runtime's UI. 4. Openflow directs you to the runtime with the hostname specified during deployment creation. For example, `https://openflow01.your-domain.org/my-runtime/nifi/`. ## Troubleshooting The following sections provide troubleshooting steps for common issues with custom ingress. If you are still experiencing issues after performing these checks, file a [Snowflake Support](https://docs.snowflake.com/user-guide/contacting-support) case. ### Load balancer target health check The target group for your network load balancer should list the IP addresses of the Openflow-managed internal network load balancer as targets. All of these targets should show as **Healthy**. If targets are **Unhealthy**, use the following checks to narrow down where traffic is failing. 1. In the AWS console, open **EC2** %ra% **Load Balancers**. 2. Locate the Openflow-managed load balancer that manages ingress to the Kubernetes cluster. This load balancer is named `runtime-ingress-{deployment-key}`. 3. Review the target health for that load balancer under the **Resource map** tab. 4. If the Openflow-managed load balancer is not active or has **Unhealthy** targets: - Traffic may be blocked between the Openflow-managed load balancer and the BYOC cluster, or a service inside the cluster may not be ready. - Generate a diagnostic bundle by running `./diagnostics.sh` from the **openflow-agent-\{deployment-key\}** EC2 instance and attach it to a [Snowflake Support](https://docs.snowflake.com/user-guide/contacting-support) case. 5. If the Openflow-managed load balancer is active and has healthy targets, check the target health for your load balancer. 6. If your load balancer's targets are **Unhealthy**, the path from your load balancer to the Openflow-managed load balancer is the most likely problem: - **Incorrect or stale IP addresses in your target group.** The Openflow-managed load balancer exposes multiple IP addresses that can change over time. To get the latest values, run `nslookup` with the **DNS name** of the Openflow-managed load balancer. Update your load balancer's targets as necessary. - **Security group rules.** Confirm that inbound rules on the Openflow-managed load balancer's security groups allow TCP `443` from your load balancer. Traffic can fail if your load balancer can't reach the Openflow load balancer on port `443`. ### Browser security blocking Some problems with custom ingress are caused by corporate browser security, firewalls, or web proxies that block or inspect traffic to your custom hostname. Those policies are separate from AWS load balancer configuration. You may find that users can't open the Openflow UI even when AWS load balancers report healthy targets. To verify connectivity through the load balancers to the Openflow services: 1. In the AWS console, open **EC2** %ra% **Load Balancers** to get the DNS name of the load balancer that is serving traffic and the TLS certificate for your custom ingress domain name. - This is **not** the **runtime-ingress-\{deployment-key\}** load balancer. 2. From the **openflow-agent-\{deployment-key\}** EC2 instance, verify connectivity through the load balancers to the Openflow deployment. Run the command: ```bash curl -kv https://{your-load-balancer-dns-name} ``` - If the command outputs the expected certificate information and a successful 404 status code response, you have successfully verified connectivity to your Openflow deployment. - If the command times out or returns an error, create a [Snowflake Support](https://docs.snowflake.com/user-guide/contacting-support) case and attach a diagnostic bundle generated by running `./diagnostics.sh` from the Openflow Agent instance. 3. From the Openflow Agent instance, you can also verify the DNS CNAME record for your custom ingress FQDN. Run the command: ```bash source ~/.env && nslookup $DOMAIN ``` - If the command returns the IP addresses of the load balancer that is performing TLS termination for your custom ingress domain name, you have successfully verified the DNS CNAME record. - If the command returns no results, the DNS CNAME record is not configured correctly. Check the DNS record for your custom ingress FQDN and ensure it points to your load balancer's DNS name. If the Openflow Agent connected successfully through your load balancer's DNS and you have verified the DNS CNAME record, a security policy or firewall is likely blocking traffic from your browser to the Openflow BYOC deployment. Work with your security team to allowlist your custom ingress FQDN. --- title: Openflow BYOC - Set up encrypted EBS volumes source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-byoc-encrypted-volumes.md section: Loading & Unloading Data --- # Openflow BYOC - Set up encrypted EBS volumes This feature is not available in the People's Republic of China. Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about-byoc) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up an Openflow BYOC deployment with encrypted Elastic Block Storage (EBS) volumes using one of the following methods: - [](#label-openflow-byoc-encrypted-ebs-kms-key) - [](#label-openflow-byoc-encrypted-ebs-default-encryption) Both of these solutions provide encrypted EBS volumes that meet the following storage requirements of Openflow BYOC: - Root volume for the Openflow Agent EC2 instance - Root volumes for the EC2 instances in each EKS Cluster Node Group - Persistent volumes for Openflow's runtimes and supporting components - `$AWS_ACCOUNT_ID` represents the AWS Account ID of the account where Openflow is deployed. - `$AWS_REGION` represents the AWS Region of the account, for example `us-west-2`. - `$AWS_KMS_KEY_ARN` represents the Amazon Resource Name (ARN) of the Amazon Key Management Service (AWS KMS) key that Openflow will use for encrypted EBS volumes. - `$DEPLOYMENT_KEY` represents the Openflow unique identifier applied to cloud resources created and managed by Openflow for a particular deployment. This is in the `DataPlaneKey` parameter of the CloudFormation template, also available in Openflow through the **View Details** menu option for the deployment. ## Prerequisites This topic assumes that you have completed the prerequisites for setting up Openflow BYOC. For more information, see [](/user-guide/data-integration/openflow/setup-openflow-byoc). You must also have access to an AWS KMS key that Openflow will use for encrypted EBS volumes. ## Provide a specific AWS KMS Key for Encrypted EBS Volumes When uploading the CloudFormation template for your Openflow BYOC Deployment, you can provide the ARN for the AWS KMS key that Openflow uses for encrypted EBS volumes. Using this configuration, Openflow makes requests for encrypted EBS volumes, ensuring that all SCP policies are satisfied. Snowflake recommends this approach for most customers. This allows you to use different KMS keys for different applications, reducing the risk of a single key being compromised. To ensure that Openflow has the necessary permissions to use this key, perform the following tasks: 1. Ensure that the AWS KMS key grants permissions to the AWS Autoscaling Service Role. The Key Policy must include the following statement: ```json { "Sid": "Allow Autoscaling to use the key", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::$AWS_ACCOUNT_ID:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling" }, "Action": [ "kms:CreateGrant", "kms:Decrypt", "kms:Encrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:DescribeKey" ], "Resource": "*" } ``` 2. Enter the ARN of the AWS KMS key in the `EBSKMSKeyArn` parameter of the CloudFormation stack when uploading the template. For example, `arn:aws:kms:$AWS_REGION:$AWS_ACCOUNT_ID:key/1a1a11aa-aa1a-aaa1a-a1a1-000000000000`. Approximately 20 minutes after uploading the CloudFormation template, the Openflow BYOC Deployment creates a new IAM Role with the name `$DEPLOYMENT_KEY-eks-role`. 3. Add the following statement to the KMS key policy to grant permissions for Openflow to use the key: ```json { "Sid": "Allow Openflow Deployment to encrypt EBS volumes", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::$AWS_ACCOUNT_ID:role/$DEPLOYMENT_KEY-eks-role" }, "Action": [ "kms:Decrypt", "kms:Encrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:CreateGrant", "kms:DescribeKey" ], "Resource": "*" } ``` Openflow automatically detects the new permissions for the KMS key and continues the installation process. The Openflow BYOC deployment will become `Active` after approximately 20 minutes. ## Enable Encrypted EBS Volumes by default for your AWS Account AWS accounts can encrypt new EBS volumes by default by following the [AWS EBS encryption by default documentation](https://docs.aws.amazon.com/ebs/latest/userguide/encryption-by-default.html). With this configuration, Openflow makes requests for unencrypted EBS volumes, but the AWS API will return an encrypted EBS volume. The following steps ensure that Openflow has permissions to use the KMS key for these encrypted volumes. Whether you choose to use the AWS managed key `aws/ebs` or your own KMS key, you must attach an IAM Policy to the Openflow IAM Role `$DEPLOYMENT_KEY-eks-role` that grants the necessary permissions to use the key. 1. Create an IAM Policy to allow Openflow to use the KMS key by replacing `$AWS_KMS_KEY_ARN` with the ARN of the KMS key. ```json { "Sid": "Allow Openflow EKS Role to encrypt EBS volumes", "Effect": "Allow", "Action": [ "kms:Decrypt", "kms:Encrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:CreateGrant", "kms:DescribeKey" ], "Resource": "$AWS_KMS_KEY_ARN" } ``` 2. Ensure that the AWS KMS key grants permissions to the AWS Autoscaling Service Role. The Key Policy must include the following statement: ```json { "Sid": "Allow Autoscaling to use the key", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::$AWS_ACCOUNT_ID:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling" }, "Action": [ "kms:CreateGrant", "kms:Decrypt", "kms:Encrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:DescribeKey" ], "Resource": "*" } ``` 3. When uploading the Openflow BYOC CloudFormation template: - Leave the optional `EBSKMSKeyArn` parameter blank. - Set the `AdditionalEksRolePolicyArns` parameter to the ARN of the new IAM Policy created previously. For example, `arn:aws:iam::$AWS_ACCOUNT_ID:policy/openflow-kms-key-access-policy`. Approximately 20 minutes after uploading the CloudFormation template, the Openflow BYOC Deployment creates a new IAM Role with the name `$DEPLOYMENT_KEY-eks-role`. 4. Add the following statement to the KMS key policy to grant permissions for Openflow to use the key: ```json { "Sid": "Allow Openflow Deployment to encrypt EBS volumes", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::$AWS_ACCOUNT_ID:role/$DEPLOYMENT_KEY-eks-role" }, "Action": [ "kms:Decrypt", "kms:Encrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:CreateGrant", "kms:DescribeKey" ], "Resource": "*" } ``` Openflow automatically detects the new permissions for the KMS key and continues the installation process. The Openflow BYOC deployment will become `Active` after approximately 20 minutes. --- title: Openflow BYOC cost and scaling considerations source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/cost-byoc.md section: Loading & Unloading Data --- # Openflow BYOC cost and scaling considerations This feature is not available in the People's Republic of China. Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about-byoc) - [](/user-guide/data-integration/openflow/setup-openflow-byoc) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) Snowflake Openflow BYOC has cost considerations in multiple areas, including infrastructure, compute, data ingestion and others. Scaling Openflow involves understanding these costs. The following sections describe Openflow BYOC costs in general, and provide a number of examples of scaling Openflow BYOC runtimes and associated costs. ## Openflow BYOC costs When using Openflow, you can incur the following types of costs:
## Openflow BYOC scaling The runtimes and scaling behavior you choose are crucial for managing costs effectively. Openflow supports different runtime types, each with its own scaling characteristics. ### Runtime types and the associated costs The following table illustrates the scaling behavior of various runtimes and their associated costs: | Runtimes | Activity | Snowflake costs | Cloud costs | | --------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------- | | No runtimes | None | No cost | Compute and storage of Dataplane | | 1 small runtime (1vCPU)
(min 1 max 2) | Active for 1 hour
Runtime does not scale to 2. | 1 runtime x 1 node x 1 vCPU x 1 hour = 1
Total = 1 vCPU-hour | Compute and storage of Dataplane | | 2 small runtime (1 vCPU) (min/max=2)
1 large runtime (8 vCPU) (min/max=10) | Small: 2 nodes active for 1 hour
Large: 10 nodes active for 1 hour | 2 runtime2 x 2 node x 2 vCPU x 1 hour = 4 vCPU
1 runtime x 10 nodes x 8 vCPU x 1 hour = 80 vCPU
Total = 84 vCPU-hours | Compute and storage of Dataplane | | 1 medium (4vCPU)
(min =1 max=2) | First 20 minutes, 1 node is running
Scales to 2 nodes for the remaining 40 minutes of the hour
Total 1 hour
| 20 minutes = 1/3 hour
1 runtime x 1 node x 4 vCPU x 1/3 hour = 4/3
1 runtime x 2 nodes x 4 vCPU x 2/3 hour = 16/3
Total = 6 2/3 vCPU-hours | Compute and storage of Dataplane | | 1 medium (4vCPU)
(min/max=2) | First 30 minutes 2 nodes running
Suspends after first 30 minutes. | 30 minutes = 1/2 hour
1 runtime x 2 nodes x 4 vCPU x 1/2 hour = 4
Total = 4 vCPU-hours | Compute and storage of Dataplane | ### Mapping runtimes to EC2 instance types Choosing a runtime type (t-shirt size) results in the runtime pods being scheduled on the associated EC2 node group \{key\}-sm-group, \{key\}-md-group, or \{key\}-lg-group with resources described in the following table: | Runtime type | vCPUs | Available memory (GB) | EC2 instance type | EC2 node group | EC2 node - CPUs | EC2 node - memory (GB) | | ------------ | ----- | --------------------- | ----------------- | ---------------- | --------------- | ---------------------- | | Small | 1 | 2 | m7i.xlarge | \{key\}-sm-group | 4 | 16 | | Medium | 4 | 10 | m7i.4xlarge | \{key\}-md-group | 16 | 64 | | Large | 8 | 20 | m7i.8xlarge | \{key\}-lg-group | 32 | 128 | The type of runtime that you choose impacts the number of cores (vCPUs) consumed each second. Openflow scales the underlying EC2 node group when additional pods need to be scheduled, based on CPU consumption, and up to the maximum node setting set during runtime creation. EKS node groups are configured with a minimum size of 0 nodes and a maximum of 50 nodes. The desired size is dynamically adjusted depending on the runtime required CPU and memory. Customers are charged by their cloud service provider for the underlying nodes that host their runtime. The underlying EC2 instances are created when the first runtime of a respective size is scheduled. ### Examples for calculating Openflow BYOC runtime consumption
A user requests a BYOC deployment from Openflow and then installs the Openflow agent and deployment
- The user has not created any runtimes. 0 vCPUs are allocated, so there is no Openflow software cost. - The user is charged by their cloud service provider for the provisioned compute and storage of the Openflow BYOC deployment. - Total Openflow consumption = 0 vCPU-hours
A user creates one small runtime with Min Nodes = 1 and Max Nodes = 2. Runtime stays at 1 node for 1 hour.
- 1 small runtime = 1 vCPU - Total Openflow consumption = 1 vCPU-hour
A user creates 2 small runtimes with min/max of 2 nodes each, and one large runtime with min/max of 10 nodes. These Runtimes are active for 1 hour
- 2 small runtimes at 2 nodes = 2 Runtimes x 2 nodes x 1 vCPU = 4 vCPUs - 1 large runtime at 10 nodes = 1 Runtime x 10 nodes x 8 vCPU = 80 vCPUs - Total Openflow consumption = (4 vCPU + 80 vCPU) x 1 hour = 84 vCPU-hours
A user creates 1 medium runtime with 1 node. After 20 minutes, it scales to 2 nodes and remains at 2 nodes for the rest of the hour.
- 1 medium runtime = 4 vCPUs - 20 minutes = 1/3 hour; 40 minutes = 2/3 hour - (1 node x 4 vCPU x 1/3 hour) + (2 nodes x 4 vCPU x 2/3 hour) - 4/3 vCPU-hours + 16/3 vCPU-hours - Total Openflow consumption = 20/3 vCPU-hours, so approximately 6.67 vCPU-hours
A user creates 1 medium runtime with 2 nodes, then suspends it after 30 minutes
- 1 medium runtime = 4 vCPU - 30 minutes = 1/2 hour - Total Openflow consumption = (2 nodes x 4 vCPU x 1/2 hour) = 4 vCPU-hours
--- title: Openflow Connector for Amazon Kinesis Data Streams source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kinesis/about.md section: Loading & Unloading Data --- # %kinesis% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/kinesis/setup) - [](/user-guide/data-integration/openflow/connectors/kinesis/performance-tuning) - [](/user-guide/data-integration/openflow/connectors/kinesis/maintenance) - [](/user-guide/data-integration/openflow/connectors/kinesis/troubleshoot) ## About This topic describes the basic concepts of %kinesis%, including its workflow and limitations. You can use [Amazon Kinesis Data Streams](https://docs.aws.amazon.com/streams/latest/dev/introduction.html) to collect and process large streams of data records in real time. Producers continually push data to Kinesis Data Streams, and consumers process the data in real time. A Kinesis data stream is a set of [shards](https://docs.aws.amazon.com/streams/latest/dev/key-concepts.html#shard). Each shard has a sequence of data records. A data record is the unit of data stored in a Kinesis data stream. Data records are composed of a sequence number, a partition key, and a data blob, which is an immutable sequence of bytes. %kinesis% reads data from Kinesis streams and writes it into Snowflake tables using the [Snowpipe Streaming](/user-guide/snowpipe-streaming/snowpipe-streaming-high-performance-overview) architecture. Use this connector if you're looking to do the following: - Ingest real-time events from Amazon Kinesis into Snowflake for near real-time analytics - Ingest real-time events from Amazon Kinesis into Snowflake-managed Iceberg™ tables - Accelerate your ingestion even more by combining Openflow speed with the Interactive Tables feature - Use Single Message Transforms to enrich or filter data before it lands in Snowflake. ## Limitations - One connector supports only ingestion from a single stream. - The connector does not support schema evolution for Apache Iceberg™ tables. - Autoscaling is not supported. The number of Openflow runtime min and max nodes should be constant for the runtime where %kinesis% is deployed. - The connector supports routing Kinesis traffic through Snowflake outbound AWS PrivateLink. DynamoDB traffic must use the public endpoint because Amazon DynamoDB doesn't support Private DNS. For more information, see [](#label-kinesis-configure-aws-privatelink). ### Limitations of fault tolerance with the connector Kinesis Streams can be configured with a retention time. If for any reason the %kinesis% is not able to ingest data for more than the retention time, then expired records will not be loaded. ### Supported data types and authentication methods The connector by default is configured to work with the JSON data type and supports authentication using AWS Credentials: Access Key ID and Secret Access Key. Connector can be customized to work with other data types and authentication methods. ## Next steps - [](/user-guide/data-integration/openflow/connectors/kinesis/setup) --- title: Openflow Connector for MySQL: Data mapping source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/mysql/data-mapping.md section: Loading & Unloading Data --- # %mysql%: Data mapping This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/mysql/about) - [](/user-guide/data-integration/openflow/connectors/mysql/setup) This topic describes MySQL data types are mapped to Snowflake data types. ## MySQL to Snowflake data type mapping The following table shows how MySQL data types are mapped to Snowflake data types when replicating data.
Any MySQL data types not listed in this table are mapped to TEXT by default. --- title: Openflow Connector for MySQL: Maintenance source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/mysql/maintenance.md section: Loading & Unloading Data --- # %mysql%: Maintenance This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/mysql/setup) - [](/user-guide/data-integration/openflow/connectors/mysql/data-mapping) This topic describes important maintenance considerations and best practices for maintaining the %mysql% such as reinstalling the connector or setting the starting binary log position for loading. These operations are often used in conjunction with [Incremental replication without snapshots](/user-guide/data-integration/openflow/connectors/mysql/incremental-replication). ## Check the replication status of a table Interim failures, such as connection errors, do not prevent table replication. However, permanent failures, such as unsupported data types, prevent table replication. To troubleshoot replication issues or verify that a table has been successfully removed from the replication flow, check the Table State Store: 1. In the Openflow runtime canvas, right-click a processor group and choose **Controller Services**. A table listing controller services displays. 2. Locate the row labeled **Table State Store**, click the **More** %sf-vertical-more-button% button on the right side of the row, and then choose **View State**. A list of tables and their current states displays. Type in the search box to filter the list by table name. The possible states are: - **NEW**: The table is scheduled for replication but replication hasn't started. - **SNAPSHOT_REPLICATION**: The connector is copying existing data. This status displays until all records are stored in the destination table. - **INCREMENTAL_REPLICATION**: The connector is actively replicating changes. This status displays after snapshot replication ends and continues to display indefinitely until a table is either removed from replication or replication fails. - **FAILED**: Replication has permanently stopped due to an error. The Openflow runtime canvas doesn't display table status changes — only the current table status. However, table status changes are recorded in logs when they occur. Look for the following log message: ```text Replication state for table .. changed from to ``` If a permanent failure prevents table replication, remove the table from replication. After you address the problem that caused the failure, you can add the table back to replication. For more information, see [Restart table replication](#label-of-mysql-restart-table-replication). ## Reinstall the connector This section provides instructions on how to reinstall the connector, and continue replicating data for the same tables without having to snapshot them again. It covers situations where the new connector is installed in the same runtime, as well as moved to a new runtime. For the connector to continue replicating from the same CDC stream position where it stopped before reinstallation, the source database must retain the binary log long enough to cover the time since the prior connector was stopped and the new connector is started. Make sure the `binlog_expire_logs_seconds` parameter of the MySQL server is high enough, and keep the reinstallation time to a minimum. The value of `binlog_expire_logs_seconds` needs to be longer than the expected time expected to reinstall the connector. Typically 86400s, a day is seconds, is sufficient, however longer times might be appropriate to ensure time to reinstall. ### Prerequisites Review and note connector parameter context values. If you're reinstalling the connector in the same runtime, you can reuse the existing context. If the new instance is located in a different runtime, you must re-enter all parameters. 1. Finish processing all in-flight FlowFiles in the existing connector, then stop the connector. 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **Ingestion** %raa% **Openflow**. 3. Select **Launch Openflow**. 4. In the **Openflow** pane select the **Runtimes** tab. 5. Select the runtime containing the connector. 6. Select the connector. 7. Stop the topmost processor **Set Tables for Replication** in the **Snapshot Load** group. 8. Stop the topmost processor **Read MySQL CDC Stream** in the **Incremental Load** group. 9. If you changed the value of the **Merge Task Schedule CRON** parameter, return it to `* * * * * ?`, otherwise queues won't be emptied until the next scheduled run. Wait until all FlowFiles in the connector have been processed, and all queues are empty. When all FlowFiles have been processed, the **Queued** value on the connector's processor group becomes zero. If there are any items left in the original connector's queues, there may be data gaps when the new connector starts. 10. Stop all processors and controller services in the connector. The existing connector can remain in the runtime and doesn't interfere with the new instance, as long as it remains stopped. 2. Create a new instance of the connector. If you're using the same runtime as the original connector, you can choose to keep the existing parameter contexts and reuse the settings. 3. If you're installing into a different runtime or you deleted the previous parameter contexts, enter the configuration settings into the new parameter contexts, including the table names and patterns as described in [](/user-guide/data-integration/openflow/connectors/mysql/setup). 4. Navigate to the `MySQL Ingestion Parameters` context, and set the following parameters: - Set the `Ingestion Type` parameter to `incremental`. For more information on the concerns see [](#label-mysql-incremental-replication). - Set the `Starting Binlog Position` parameter to `Earliest`. For more information and potential concerns see [](#label-mysql-connector-start-restart-incremental-load-from-earliest-available-binary-log-position). 5. Start the new connector. ### Usage notes The new connector uses the existing destination tables that were created by the original connector, but the connector creates new journal tables. ## Specify load from binary log position The %mysql% connector allows you to select the starting position where MySQL binary logs are read. By default the connector reads from the latest available position. Alternatively, you can choose the earliest position available on the source instance. Choosing to start from the earliest position is common when reinstalling the connector. This allows the new instance to catch up and continue replicating existing tables without having to snapshot each again. Note that switching a running connector from latest to earliest position cause the entire available binary log to be re-read, re-processed, and re-applied to the destination table. While the binary log is being re-read, the columns and data in affected destination tables can become out of sync with their sources until all events have been re-processed and merged. The following parameters control snapshot loads are available in the `Ingestion Parameters` context:
To determine whether the connector finished re-reading the binary log: 1. Navigate to the Openflow canvas. 2. Open the **Incremental Load** process group. 3. Right-click the topmost processor named **Read MySQL CDC Stream**, then select **View state**. 4. Compare the state entries: - **binlog.position.rewind**: the latest position the processor read before re-reading of the binary log started. - **binlog.position.dml**: the current latest position read by the processor. As long as this value is lower than the rewind value above, the processor is still re-reading the binary log. ### Usage notes - After a running connector is switched to read from the earliest position, and starts running, the process can't be reconfigured or cancelled, and will continue until the currently-read position reaches the position from before it started. - Switching to the earliest position on a running connector will, for any tables being re-processed, finish their existing journals, and create new journal tables. - If the binary log contains events from a previous table that was dropped and re-created in the source database, the re-reading the stream re-processes all events in the current destination. The connector can't distinguish between a previous and current source table if they share the same name. --- title: Openflow Connector for MySQL: Set up incremental replication without snapshots source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/mysql/incremental-replication.md section: Loading & Unloading Data --- # %mysql%: Set up incremental replication without snapshots This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/mysql/setup) - [](/user-guide/data-integration/openflow/connectors/mysql/data-mapping) You can configure the %mysql% connector to immediately replicate incremental changes for newly added tables, bypassing snapshots. Use incremental load to continue replication without snapshotting every table again when you reinstall the connector over previously replicated data. To enable incremental replication in a new connector instance: 1. Set up the connector as described in [](/user-guide/data-integration/openflow/connectors/mysql/setup). 2. In the `MySQL Ingestion Parameters` context, set the `Ingestion Type` parameter to `incremental`. ## Enable incremental replication without snapshots To enable incremental replication on an existing connector: 1. sign in to %sf-web-interface-link%. 2. in the navigation menu, select **Ingestion** %raa% **Openflow**. 3. In the **Openflow** pane select the **Runtimes** tab. 4. Select the runtime containing the connector. 5. Select the connector. 6. In the `Ingestion Parameters` context, specify `Ingestion Type` = `incremental`. 7. Add new replication tables. These tables immediately switch to their incremental load. To return to replicating tables with the snapshot load, change **Ingestion Type** from `incremental` to `full`. # Usage notes - Changing the value of **Ingestion Type** does not impact any tables that have begun replicating data. Tables currently in the snapshot phase continue until the snapshot load is complete. - While **Ingestion Type** is set to `incremental`, new tables added to the list of replicated tables bypass the snapshot phase. This includes new tables added to the source database that match the `Included Table Regex` parameter. Ensure that the ingestion type is set to `incremental` to bypass the snapshot phase. Connectors should only remain in `incremental` mode as long as required as it bypasses snapshots. Once customer needs for incremental updates have been satisfied the connector should be returned to `full` mode. - For tables that bypass snapshot load, the connector creates a destination table in Snowflake, by executing `CREATE TABLE IF NOT EXISTS`, only if no destination table already exists. Tables going through the snapshot require that no destination table exist. --- title: Openflow Connector for Oracle: Configure the Oracle database source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/setup-oracledb.md section: Loading & Unloading Data --- # %oracleofc%: Configure the Oracle database This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). The %oracleofc% is also subject to additional terms of service beyond the standard connector terms of service. For more information, see the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/). - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/oracle/about) - [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms) - [](/user-guide/data-integration/openflow/connectors/oracle/setup-tasks) - [](/user-guide/data-integration/openflow/connectors/oracle/setup-snowflake) This topic describes how to set up the Oracle database for %oracleofc%. Your Oracle database setup depends on your organization's security policies and database architecture. For example, if tables reside in a Container Database (CDB), a Pluggable Database (PDB), multiple PDBs, or a combination. The steps provided in this topic are examples only. Modify them as required for your environment. As an Oracle database administrator, perform the following procedures on your source database: 1. [](#label-set-up-archived-redo-logs-retention-period) 2. [](#label-enable-xstream-and-supplemental-logging) 3. [](#label-create-xstream-administrator-user) 4. [](#label-granting-xstream-administrator-privileges) 5. [](#label-configure-xstream-server-connect-user) 6. [](#label-create-xstream-outbound-server) 7. [](#label-set-xstream-outbound-server-connect-user) 8. [](#label-set-xstream-outbound-server-capture-user) 9. (Optional) [](#label-configure-ssl-connections) The steps in this topic are written for a multi-tenant architecture with a Container Database (CDB) and one or more Pluggable Databases (PDB). If your Oracle database uses a single-tenant architecture, see [](#label-setup-xstream-single-tenant). ## Configure the retention period for archived redo logs You must enable the `ARCHIVELOG` mode to ensure that change data is available for replication. If you use AWS RDS for Oracle, you must also configure the retention period for archived redo logs. Determine this period based on the volume of changes in the source database and your storage capacity. To set the retention period, for example to 24 hours, follow the procedures in the following table:
## Enable XStream and supplemental logging XStream is included with Oracle Database and doesn't require any additional software. To enable and configure XStream replication to capture and stream change data, run the following commands: 1. Enable XStream replication: ```sql ALTER SYSTEM SET enable_goldengate_replication=TRUE SCOPE=BOTH; ALTER SYSTEM SET STREAMS_POOL_SIZE = 2560M; ``` Snowflake recommends setting the streams pool size to 2.5 GB. This allocation covers the following: - 1 GB for Capture - 1 GB for Apply - An additional 25% buffer To enable supplemental logging to ensure that the redo logs capture the information required for logical replication, run the following commands: 1. Confirm that the database is in ARCHIVELOG mode as shown in the following example: ```sql SELECT LOG_MODE, FORCE_LOGGING FROM V$DATABASE; ``` Snowflake recommends forcing logging on database or table space level. 2. Set the container to the root container and add supplemental logging to the database: ```sql ALTER SESSION SET CONTAINER = CDB$ROOT; ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; ``` Alternatively, you can enable logging only on specific tables as shown in the following example: ```sql ALTER TABLE schema_name.table_name ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; ``` ## Create the XStream administrator user An XStream administrator user is required to manage XStream components, including the creation and alteration of outbound servers. You can either create a dedicated user for this purpose or use an existing user, provided that the necessary XStream administration privileges are granted (see the next section). The following example details the setup of a dedicated XStream administrator user in the root container of a CDB. The following example assumes that the database also has a PDB containing tables to be replicated. Connect as SYSDBA or a user with appropriate privileges and run the following commands: ```sql -- Switch to the root container. ALTER SESSION SET CONTAINER = CDB$ROOT; -- Create a tablespace for the XStream administrator user. CREATE TABLESPACE xstream_adm_tbs DATAFILE '/path/to/your/cdb/xstream_adm_tbs.dbf' SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED; -- Switch to the Pluggable Database (PDB) and create a tablespace there. ALTER SESSION SET CONTAINER = YOUR_PDB_NAME; CREATE TABLESPACE xstream_adm_tbs DATAFILE '/path/to/your/pdb/xstream_adm_tbs.dbf' SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED; -- Switch back to the root container to create the common user. ALTER SESSION SET CONTAINER = CDB$ROOT; -- Create the XStream administrator user. -- Note 'c##' prefix indicates a common user in a CDB environment, and CONTAINER=ALL grants privileges across all containers. -- Replace "YOUR_XSTREAM_ADMIN_PASSWORD" with a strong, secure password. CREATE USER c##xstreamadmin IDENTIFIED BY "YOUR_XSTREAM_ADMIN_PASSWORD" DEFAULT TABLESPACE xstream_adm_tbs QUOTA UNLIMITED ON xstream_adm_tbs CONTAINER=ALL; ``` ## Grant XStream administrator privileges Connect as SYSDBA or a user with appropriate privileges and grant the required privileges to the XStream administrator user. 1. Grant the CREATE SESSION privilege to the XStream administrator: ```sql GRANT CREATE SESSION TO c##xstreamadmin CONTAINER=ALL; ``` 2. Grant XStream capture privileges using one of the following commands, depending on your Oracle Database version:
## Configure XStream server connect user The Snowflake Openflow Connector uses a dedicated connect user to establish a connection to the XStream Outbound Server and receive change data. This user requires specific privileges to facilitate replication: - **Read from XStream Outbound Server**: The user must be able to access the change data stream from the configured XStream Outbound Server. - **Select from Data Dictionary Views**: The connect user needs SELECT access to various data dictionary views. This can be achieved by granting SELECT_CATALOG_ROLE or SELECT ANY DICTIONARY. If granting SELECT ANY DICTIONARY isn't desired due to company policy, the user specifically needs SELECT access to the following views: - ALL_USERS - ALL_TABLES - ALL_TAB_COLS - ALL_CONS_COLUMNS - ALL_CONSTRAINTS - ALL_INDEXES - ALL_IND_COLUMNS - V$DATABASE `ALL_INDEXES` and `ALL_IND_COLUMNS` are required so the connector can detect unique constraints and unique indexes as replication keys when a table has no primary key. For more information on the selection algorithm, see [](#label-oracle-replication-key-selection). - **Select from Source Tables**: The user must have SELECT privileges on all tables that are intended for replication. The following is an example of how to set up such a user in the root container of the CDB. The example assumes that the database also has a PDB containing tables to be replicated. ```sql -- Connect as SYSDBA or a user with appropriate privileges -- Switch to the root container. ALTER SESSION SET CONTAINER = CDB$ROOT; -- Create the connect user. -- Replace "YOUR_CAPTURE_USER_PASSWORD" with a strong, secure password. CREATE USER c##connectuser IDENTIFIED BY "YOUR_CAPTURE_USER_PASSWORD" CONTAINER=ALL; -- Grant necessary privileges to the connect user. -- You can choose to grant access to specific tables -- instead of SELECT ANY TABLE for more granular control, -- for example, GRANT SELECT ON schema.table TO c##connectuser; GRANT CREATE SESSION, SELECT_CATALOG_ROLE, SELECT ANY TABLE TO c##connectuser CONTAINER=ALL; ``` If your database is multi-tenant and the connector is connected to a CDB to replicate data from multiple PDBs, grant the connect user the additional privileges needed to switch between containers and read data dictionary information across all of them: ```sql ALTER USER c##connectuser SET CONTAINER_DATA = ALL CONTAINER = CURRENT; GRANT SET CONTAINER TO c##connectuser CONTAINER=ALL; ``` ## Create XStream Outbound Server The XStream Outbound Server captures changes from redo logs for consumption by the Openflow Connector. Define which schemas or tables to replicate. For more information see [DBMS_XSTREAM_ADM.CREATE_OUTBOUND Documentation](https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_XSTREAM_ADM.html#GUID-A602ED86-0F5A-4A27-92A0-55D5ADC0AF0D). Important considerations for replication scope: - If a table is included in the XStream Outbound filtering rules command, it won't be replicated. - A table or schema included here must also be defined in the connector parameters for it to be replicated. You can include an entire schema in the server filtering rules and later, in the connector parameters, specify only certain tables within that schema for replication. The XStream Outbound Server can only be created from root container. However, starting with Oracle Database version 23ai, it can also be created on the PDB level. To avoid a significant hit to your CPU and network, and to prevent your queues from being filled with irrelevant data, it's essential to use a granular approach. The best way to do this is with the DBMS_XSTREAM_ADM.ADD_TABLE_RULES procedure, which lets you choose only the specific tables you need. The following examples show how to set up the XStream Outbound Server based on different replication needs. In practice, when setting up your XStream Outbound Server on your production environment, you should be selective about what changes you capture. Capturing everything can have serious consequences for your database's performance and resource usage. For information on how to configure XStream Outbound Server, see [Configuring XStream Out](https://docs.oracle.com/en/database/oracle/oracle-database/19/xstrm/configuring-xstream-out.html#GUID-A1C8430E-565B-4F66-8E00-495F283AAAFB). **Example 1:** Capture all tables from all schemas in the root container and all PDBs ```sql -- Connect as a user with XStream admin privileges to the root container. -- Ensure serveroutput is enabled to see messages from the PL/SQL block. SET SERVEROUTPUT ON; DECLARE tables DBMS_UTILITY.UNCL_ARRAY; schemas DBMS_UTILITY.UNCL_ARRAY; BEGIN -- To replicate all tables in all schemas across all containers, set both to NULL. tables(1) := NULL; schemas(1) := NULL; DBMS_XSTREAM_ADM.CREATE_OUTBOUND( server_name => 'XOUT1', table_names => tables, schema_names => schemas, include_ddl => TRUE ); DBMS_OUTPUT.PUT_LINE('XStream Outbound Server created.'); EXCEPTION WHEN OTHERS THEN DBMS_OUTPUT.PUT_LINE('Error creating XStream Outbound Server: ' || SQLERRM); RAISE; END; / ``` **Example 2:** Capture all tables from a single schema in a Pluggable Database (PDB) ```sql -- Connect as a user with XStream admin privileges to the root container. -- Ensure serveroutput is enabled to see messages from the PL/SQL block. SET SERVEROUTPUT ON; DECLARE tables DBMS_UTILITY.UNCL_ARRAY; schemas DBMS_UTILITY.UNCL_ARRAY; BEGIN -- To replicate all tables in a schemas in the single PDB, set source_container_name. tables(1) := NULL; schemas(1) := 'schema_name'; DBMS_XSTREAM_ADM.CREATE_OUTBOUND( server_name => 'XOUT1', table_names => tables, schema_names => schemas, include_ddl => TRUE, source_container_name => 'YOUR_PDB_NAME' ); DBMS_OUTPUT.PUT_LINE('XStream Outbound Server created.'); EXCEPTION WHEN OTHERS THEN DBMS_OUTPUT.PUT_LINE('Error creating XStream Outbound Server: ' || SQLERRM); RAISE; END; / ``` ## Set up the XStream Outbound Server Connect User Set the connect user on the XStream Outbound Server. This ensures that the previously created connect user is associated with the XStream Outbound Server (XOUT1), allowing it to receive change data. The following example assumes that the connect user is c##connectuser. ```sql BEGIN DBMS_XSTREAM_ADM.ALTER_OUTBOUND( server_name => 'XOUT1', connect_user => 'c##connectuser'); END; / ``` ## Set up the XStream Outbound Server Capture User If you want the data to be captured by the same user that created the server (the administrator), skip this section. If you configured a separate capture user, configure the XStream Outbound Server to run as this user. This ensures that the dedicated capture user is associated with the XStream Outbound Server (XOUT1), allowing that user to capture change data. ```sql BEGIN DBMS_XSTREAM_ADM.ALTER_OUTBOUND( server_name => 'XOUT1', capture_user => 'yourcaptureuser'); END; / ``` ## Set up XStream for single-tenant databases The default architecture for Oracle 12c and later is a multi-tenant architecture with a Container Database (CDB) and one or more Pluggable Databases (PDB). If your Oracle database uses a single-tenant architecture, note the following differences in setting up XStream: - Do not use `ALTER SESSION SET CONTAINER` commands. In a single-tenant database, there is only one instance, so container switching doesn't apply. - Create only one `xstream_adm_tbs` tablespace. Do not create a second tablespace in a PDB. - Do not use the `C##` prefix on user names. For example, create `xstreamadmin` instead of `c##xstreamadmin` and `connectuser` instead of `c##connectuser`. The `C##` prefix is required only in multi-tenant environments. - Do not include `CONTAINER=ALL` or `container => 'ALL'` in any commands. These clauses grant privileges across multiple containers and don't apply in a single-tenant database. ## Configure SSL connections (optional) The %oracleofc% supports encrypted SSL connections to the Oracle database using the TCPS (TCP with SSL) protocol. When SSL is enabled, both the database connection and the XStream connection use encrypted communication. To use SSL, you must: 1. [](#label-enable-tcps-on-oracle-database) 2. [](#label-create-client-wallet) ### Enable TCPS on the Oracle database You must configure the Oracle database to accept connections using the TCPS protocol. Follow the procedure for your database environment. #### On-premises / OCI 1. Create an SSL server wallet with the server certificate. 2. Configure the `listener.ora` to include a TCPS endpoint (default port 2484). 3. Configure the `sqlnet.ora` to reference the server wallet. 4. Restart the listener. For more information, see [Configuring Transport Layer Security Encryption](https://docs.oracle.com/en/database/oracle/oracle-database/23/dbseg/configuring-transport-layer-security-encryption.html). #### AWS RDS (Standard) 1. Add the Oracle SSL option to the option group associated with the DB instance. 2. Specify the SSL port (for example, 2484). For more information, see [Oracle Secure Sockets Layer](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.Oracle.Options.SSL.html). ### Create a client wallet After TCPS is enabled on the database, create an Oracle auto-login wallet (`cwallet.sso`) containing the server's trusted certificate. This wallet is provided to the connector so that it can verify the server during the SSL handshake. 1. Export the server certificate from the Oracle database server as a PEM file. 2. Use the Oracle `orapki` utility to create a client wallet and import the server certificate: ```bash orapki wallet create -wallet /path/to/client/wallet -pwd -auto_login orapki wallet add -wallet /path/to/client/wallet -pwd \ -trusted_cert -cert /path/to/server-cert.pem ``` 3. Copy the generated `cwallet.sso` file to a location accessible by the Openflow runtime. For AWS RDS, download the root certificate from AWS instead of exporting it from the database server. For more information, see [Connecting to an RDS for Oracle DB instance using SSL](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.Oracle.Options.SSL.Connecting.html). For more information, see [Using the orapki Utility to Manage PKI Elements](https://docs.oracle.com/en/database/oracle/oracle-database/23/dbseg/using-the-orapki-utility-to-manage-pki-elements.html). ## Next steps [Configure the connector](/user-guide/data-integration/openflow/connectors/oracle/setup-connector). --- title: Openflow Connector for Oracle: Data mapping source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/data-mapping.md section: Loading & Unloading Data --- # %oracleofc%: Data mapping This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). The %oracleofc% is also subject to additional terms of service beyond the standard connector terms of service. For more information, see the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/). - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/oracle/about) - [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms) - [](/user-guide/data-integration/openflow/connectors/oracle/setup-tasks) This topic describes how Oracle data types are mapped to Snowflake data types when replicating data. ## Oracle to Snowflake data type mapping The following table shows how Oracle data types are mapped to Snowflake data types when replicating data.
Any Oracle data types not listed in this table are mapped to TEXT by default. ## Next steps Review [](/user-guide/data-integration/openflow/connectors/oracle/setup-tasks) to set up the connector. --- title: Openflow Connector for Oracle: Enable and manage commercial terms source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms.md section: Loading & Unloading Data --- # %oracleofc%: Enable and manage commercial terms This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). The %oracleofc% is also subject to additional terms of service beyond the standard connector terms of service. For more information, see the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/). - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/oracle/about) - [](/user-guide/data-integration/openflow/connectors/oracle/setup-tasks) - [](/user-guide/data-integration/openflow/connectors/oracle/setup-connector) - [](/user-guide/data-integration/openflow/connectors/oracle/maintenance) This topic describes how to enable the %oracleofc% in the list of available connectors and manage the licensing lifecycle. This task must be performed by the organization administrator (ORGADMIN). Setting up the %oracleofc% is a two-stage process. First, enable Oracle XStream services to make the connector available for installation. Then, finalize the license configuration after the connector detects your source database inventory. ## Part 1: Enable service (pre-installation) By default, the %oracleofc% isn't displayed in the list of available connectors. You must accept the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/) terms to make it available for installation. This is required for all license models. 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **Admin** %raa% **Terms**. 3. Locate the item **Oracle Connector Terms** in the list. 4. Select **Review & Enable**. After you complete these steps, the following changes take effect: - The %oracleofc% listing becomes visible in the list of available connectors. - A new tab titled **Openflow for Oracle** appears in the **Admin** %raa% **Terms** tab. ## Part 2: License setup and lifecycle Complete the steps for the license model you selected during configuration: - [Option A: Embedded license (Snowflake-provided)](#label-oracle-embedded-license-setup) - [Option B: Independent license / BYOL](#label-oracle-byol-license-setup) ### Option A: Embedded license (Snowflake-provided) For this licensing model, you must activate the trial to enable the connector. Even if you install the connector, data replication doesn't start until this step is complete. #### Step 1: Start the trial (prerequisite) To start the trial: 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **Admin** %raa% **Terms**. 3. Select **Openflow for Oracle** tab. 4. Locate the **Trial Status** card (status: "Ready to Activate"). 5. Select **Start Trial**. 6. Accept the terms to start the 60-day trial period. This action enables the captureChangeOracle processor, allowing it to connect to your database. #### Step 2: Configure connector After starting the trial, install and configure the connector. For more information, see [Configure the connector](/user-guide/data-integration/openflow/connectors/oracle/setup-connector). After the connector successfully connects to the source database, a subscription is automatically created and displayed in the **Openflow for Oracle** dashboard. #### Step 3: Verify inventory 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **Admin** %raa% **Terms**. 3. Select the **Openflow for Oracle** tab. 4. Review the **Subscription Inventory** section. 5. Verify that the CPU core count matches your physical source database hardware. 6. If the core count is incorrect, update the runtime configuration. #### Step 4: Lifecycle management For more information about the licensing models and terms, see [Licensing models and critical constraints](#label-oracle-licensing-models). The following table describes the actions available at each stage of the embedded license lifecycle.
### Option B: Independent license / BYOL If you are using the independent license (Bring Your Own License), no prior trial activation is required. #### Step 1: Configure the connector To set up the connector with the independent/BYOL license, follow the steps in [Configure the connector](/user-guide/data-integration/openflow/connectors/oracle/setup-connector). #### Step 2: Verify inventory (recommended) Verify that Snowflake has correctly identified your database inventory. 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **Admin** %raa% **Terms**. 3. Select the **Openflow for Oracle** tab. 4. Review the database inventory details. The **Start Trial** button doesn't appear for this license model, and the 36-month lifecycle rules don't apply. You are responsible for maintaining a valid Oracle license that includes XStream entitlements. --- title: Openflow Connector for Oracle: Maintenance source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/maintenance.md section: Loading & Unloading Data --- # %oracleofc%: Maintenance This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). The %oracleofc% is also subject to additional terms of service beyond the standard connector terms of service. For more information, see the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/). - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/oracle/about) - [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms) - [](/user-guide/data-integration/openflow/connectors/oracle/incremental-replication) This topic describes maintenance tasks for the %oracleofc%, such as reinstalling the connector or setting the starting redo log position. These operations are often used in conjunction with [Incremental replication without snapshots](/user-guide/data-integration/openflow/connectors/oracle/incremental-replication). ## Check the replication status of a table Interim failures, such as connection errors, do not prevent table replication. However, permanent failures, such as unsupported data types, prevent table replication. To troubleshoot replication issues or verify that a table has been successfully removed from the replication flow, check the Table State Store: 1. In the Openflow runtime canvas, right-click a processor group and choose **Controller Services**. A table listing controller services displays. 2. Locate the row labeled **Table State Store**, click the **More** %sf-vertical-more-button% button on the right side of the row, and then choose **View State**. A list of tables and their current states displays. Type in the search box to filter the list by table name. The possible states are: - **NEW**: The table is scheduled for replication but replication hasn't started. - **SNAPSHOT_REPLICATION**: The connector is copying existing data. This status displays until all records are stored in the destination table. - **INCREMENTAL_REPLICATION**: The connector is actively replicating changes. This status displays after snapshot replication ends and continues to display indefinitely until a table is either removed from replication or replication fails. - **FAILED**: Replication has permanently stopped due to an error. The Openflow runtime canvas doesn't display table status changes — only the current table status. However, table status changes are recorded in logs when they occur. Look for the following log message: ```text Replication state for table .. changed from to ``` If a permanent failure prevents table replication, remove the table from replication. After you address the problem that caused the failure, you can add the table back to replication. For more information, see [Restart table replication](#label-of-oracle-restart-table-replication). ## Reinstall the connector This section provides instructions on how to reinstall the connector, and continue replicating data for the same tables without having to snapshot them again. It covers situations where the new connector is installed in the same runtime, as well as moved to a new runtime. For the connector to continue replicating from the same CDC stream position where it stopped before reinstallation, the source database must retain the archived redo logs long enough to cover the time after the prior connector was stopped and before the new connector is started. Ensure the archived redo log retention period of the Oracle database is high enough, and keep the reinstallation time to a minimum. Typically a retention period of 24 hours is sufficient, however longer times might be appropriate to ensure time to reinstall. For more information on configuring archived redo log retention, see [](/user-guide/data-integration/openflow/connectors/oracle/setup-oracledb). ### Prerequisites Review and note connector parameter context values. If you're reinstalling the connector in the same runtime, you can reuse the existing context. If the new instance is located in a different runtime, you must re-enter all parameters. 1. Finish processing all in-flight FlowFiles in the existing connector, then stop the connector. 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **Ingestion** %raa% **Openflow**. 3. Select **Launch Openflow**. 4. In the **Openflow** pane select the **Runtimes** tab. 5. Select the runtime containing the connector. 6. Select the connector. 7. Stop the topmost processor **Set Tables for Replication** in the **Snapshot Load** group. 8. Stop the topmost processor **Read Oracle CDC Stream** in the **Incremental Load** group. 9. If you changed the value of the **Merge Task Schedule CRON** parameter, return it to `* * * * * ?`, otherwise queues won't be emptied until the next scheduled run. Wait until all FlowFiles in the connector have been processed, and all queues are empty. When all FlowFiles have been processed, the **Queued** value on the connector's processor group becomes zero. If any items remain in the original connector's queues, data gaps might occur when the new connector starts. 10. Stop all processors and controller services in the connector. The existing connector can remain in the runtime and doesn't interfere with the new instance, as long as it remains stopped. 2. Create a new instance of the connector. If you're using the same runtime as the original connector, you can choose to keep the existing parameter contexts and reuse the settings. 3. If you're installing into a different runtime or you deleted the previous parameter contexts, enter the configuration settings into the new parameter contexts, including the table names and patterns as described in [](/user-guide/data-integration/openflow/connectors/oracle/setup-connector). 4. Navigate to the `Oracle Ingestion Parameters` context, and set the following parameters: - Set the `Ingestion Type` parameter to `incremental`. For more information on the concerns see [](#label-oracle-incremental-replication). - Set the `Starting Redo Log Position` parameter to `Earliest`. For more information and potential concerns see [](#label-oracle-alter-xstream-outbound-server). 5. Start the new connector. ### Usage notes The new connector uses the existing destination tables that were created by the original connector, but the connector creates new journal tables. ## Alter XStream outbound server The connector regularly updates the XStream server with the latest SCN position it processed. If the connector is reinstalled and connects to the same XStream outbound server, it will resume reading from the SCN position where it left off. This SCN number can be checked with: ```sql SELECT PROCESSED_LOW_SCN FROM DBA_XSTREAM_OUTBOUND_PROGRESS WHERE SERVER_NAME = 'XOUT1'; ``` If you want to re-read data from an earlier position, you must first change the start SCN of the XStream server: ```sql BEGIN DBMS_XSTREAM_ADM.ALTER_OUTBOUND( server_name => 'XOUT1', start_scn => ); END; / ``` The value of `` must be a valid SCN within the range of available redo logs. The lowest SCN that the start position can be reset to can be checked with: ```sql SELECT REQUIRED_CHECKPOINT_SCN FROM DBA_CAPTURE WHERE CLIENT_NAME = 'XOUT1'; ``` This is the lowest SCN for which the capture process requires redo information. ## Specify load from XStream position The %oracleofc% connector allows you to select the starting position where Oracle redo logs are read. By default the connector reads from the latest available position. Alternatively, you can choose the earliest position available on the source instance. Choosing to start from the earliest position is common when reinstalling the connector. This allows the new instance to catch up and continue replicating existing tables without having to snapshot each again. Switching a running connector from latest to earliest position causes the entire available redo logs to be re-read, re-processed, and re-applied to the destination table. While the redo logs are being re-read, the columns and data in affected destination tables can become out of sync with their sources until all events have been re-processed and merged. The following parameters are available in the `Ingestion Parameters` context:
To determine whether the connector finished re-reading the redo logs: 1. Navigate to the Openflow canvas. 2. Open the **Incremental Load** process group. 3. Right-click the topmost processor named **Read Oracle CDC Stream**, then select **View state**. 4. Compare the state entries: - **lcr.position.rewind**: the latest position the processor read before re-reading of the redo logs started. - **lcr.position.last**: the current latest position read by the processor. As long as this value is lower than the rewind value above, the processor is still re-reading the redo logs. ### Usage notes - After a running connector is switched to read from the earliest position, and starts running, the process can't be reconfigured or cancelled, and continues until the currently-read position reaches the position from before it started. - Switching to the earliest position on a running connector will, for any tables being re-processed, finish their existing journals, and create new journal tables. - If the redo log contains events from a previous table that was dropped and re-created in the source database, the re-reading the stream re-processes all events in the current destination. The connector can't distinguish between a previous and current source table if they share the same name. Schema changes (such as ALTER TABLE statements that add or drop columns) aren't supported while re-reading the redo logs from the earliest position. If any table's schema was altered between the earliest available SCN and the current position, that table should be removed from replication and re-added with a fresh snapshot instead. --- title: Openflow Connector for Oracle: Set up incremental replication without snapshots source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/incremental-replication.md section: Loading & Unloading Data --- # %oracleofc%: Set up incremental replication without snapshots This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). The %oracleofc% is also subject to additional terms of service beyond the standard connector terms of service. For more information, see the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/). - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/oracle/about) - [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms) - [](/user-guide/data-integration/openflow/connectors/oracle/setup-connector) - [](/user-guide/data-integration/openflow/connectors/oracle/maintenance) This topic describes how to configure the %oracleofc% connector to start replicating incremental changes for newly added tables immediately, bypassing snapshots. This configuration is useful when you reinstall the connector over previously replicated data and want to continue replication without snapshotting every table again. You can enable incremental replication on either a new or an existing connector instance. ## Enable incremental replication without snapshots on a new connector To enable incremental replication on a new connector instance: 1. Set up the connector as described in [](/user-guide/data-integration/openflow/connectors/oracle/setup-connector). 2. In the `Oracle Ingestion Parameters` context, set the `Ingestion Type` parameter to `incremental`. ## Enable incremental replication without snapshots on an existing connector To enable incremental replication on an existing connector: 1. sign in to %sf-web-interface-link%. 2. in the navigation menu, select **Ingestion** %raa% **Openflow**. 3. In the **Openflow** pane select the **Runtimes** tab. 4. Select the runtime containing the connector. 5. Select the connector. 6. In the `Ingestion Parameters` context, specify `Ingestion Type` = `incremental`. 7. Add new replication tables. These tables immediately switch to their incremental load. To return to replicating tables with the snapshot load, change **Ingestion Type** from `incremental` to `full`. # Usage notes - Changing the value of **Ingestion Type** does not impact any tables that have begun replicating data. Tables currently in the snapshot phase continue until the snapshot load is complete. - While **Ingestion Type** is set to `incremental`, new tables added to the list of replicated tables bypass the snapshot phase. This includes new tables added to the source database that match the `Included Table Regex` parameter. Ensure that the ingestion type is set to `incremental` to bypass the snapshot phase. Connectors should only remain in `incremental` mode as long as required as it bypasses snapshots. Once customer needs for incremental updates have been satisfied the connector should be returned to `full` mode. - For tables that bypass snapshot load, the connector creates a destination table in Snowflake, by executing `CREATE TABLE IF NOT EXISTS`, only if no destination table already exists. Tables going through the snapshot require that no destination table exist. --- title: Openflow Connector for Oracle: Set up Snowflake source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/setup-snowflake.md section: Loading & Unloading Data --- # %oracleofc%: Set up Snowflake This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). The %oracleofc% is also subject to additional terms of service beyond the standard connector terms of service. For more information, see the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/). - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/oracle/about) - [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms) - [](/user-guide/data-integration/openflow/connectors/oracle/setup-oracledb) - [](/user-guide/data-integration/openflow/connectors/oracle/setup-connector) This topic describes how to set up your Snowflake environment for the %oracleofc%. As a Snowflake administrator, perform the following tasks: 1. Create a destination database in Snowflake to store the replicated data: ```sql CREATE DATABASE ; ``` 2. Create a Snowflake [service user](#label-user-type-property): ```sql CREATE USER TYPE = SERVICE COMMENT='Service user for automated access of Openflow'; ``` 3. Create a Snowflake role for the connector and grant the required privileges: ```sql CREATE ROLE ; GRANT ROLE TO USER ; GRANT USAGE ON DATABASE TO ROLE ; GRANT CREATE SCHEMA ON DATABASE TO ROLE ; ``` Use this role to manage the connector's access to the Snowflake database. To create objects in the destination database, you must grant the [USAGE and CREATE SCHEMA privileges](#label-database-privileges) on the database to the role used to manage access. 4. Create a Snowflake warehouse for the connector and grant the required privileges: ```sql CREATE WAREHOUSE WITH WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE; GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE ; ``` Snowflake recommends starting with a XSMALL warehouse size, then experimenting with size depending on the number of tables being replicated and the amount of data transferred. Large numbers of tables typically scale better with multi-cluster warehouses, rather than a larger warehouse size. For more information, see [multi-cluster warehouses](/user-guide/warehouses-multicluster). 5. Set up the public and private keys for key pair authentication: 1. Create a pair of secure keys (public and private). 2. Store the private key for the user in a file to supply to the connector's configuration. 3. Assign the public key to the Snowflake service user: ```sql ALTER USER SET RSA_PUBLIC_KEY = 'thekey'; ``` For more information, see [](/user-guide/key-pair-auth). ## Next steps [Configure the connector](/user-guide/data-integration/openflow/connectors/oracle/setup-connector). --- title: Openflow Connector for PostgreSQL Maintenance source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/postgres/maintenance.md section: Loading & Unloading Data --- # %postgresql% Maintenance This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/postgres/setup) - [](/user-guide/data-integration/openflow/connectors/postgres/data-mapping) This topic describes important maintenance considerations and best practices for maintaining the %postgresql% when making changes to the source PostgreSQL database. In addition, this topic describes how to restart table replication and reinstall the connector. ## Check the replication status of a table Interim failures, such as connection errors, do not prevent table replication. However, permanent failures, such as unsupported data types, prevent table replication. To troubleshoot replication issues or verify that a table has been successfully removed from the replication flow, check the Table State Store: 1. In the Openflow runtime canvas, right-click a processor group and choose **Controller Services**. A table listing controller services displays. 2. Locate the row labeled **Table State Store**, click the **More** %sf-vertical-more-button% button on the right side of the row, and then choose **View State**. A list of tables and their current states displays. Type in the search box to filter the list by table name. The possible states are: - **NEW**: The table is scheduled for replication but replication hasn't started. - **SNAPSHOT_REPLICATION**: The connector is copying existing data. This status displays until all records are stored in the destination table. - **INCREMENTAL_REPLICATION**: The connector is actively replicating changes. This status displays after snapshot replication ends and continues to display indefinitely until a table is either removed from replication or replication fails. - **FAILED**: Replication has permanently stopped due to an error. The Openflow runtime canvas doesn't display table status changes — only the current table status. However, table status changes are recorded in logs when they occur. Look for the following log message: ```text Replication state for table .. changed from to ``` If a permanent failure prevents table replication, remove the table from replication. After you address the problem that caused the failure, you can add the table back to replication. For more information, see [Restart table replication](#label-of-postgres-restart-table-replication). ## Restart table replication A table in FAILED state — for example, due to a missing primary key or unsupported schema change — does not restart automatically. If a table enters a FAILED state or you need to restart replication from scratch, use the following procedure to remove and re-add the table to replication. If the failure was caused by an issue in the source table such as a missing primary key, resolve that issue in the source database before continuing. 1. Remove the table from replication, using one of the following methods: - Add the table to the **Re-snapshot Table Exclusions** parameter to temporarily exclude it from replication. This is convenient when the table is matched by an **Included Table Regex** that you don't want to change. - In the Ingestion Parameters context, either remove the table from **Included Table Names** or modify the **Included Table Regex** so the table is no longer matched. 2. Verify the table has been removed: 1. In the Openflow runtime canvas, right-click a processor group and choose **Controller Services**. 2. In the table listing controller services, locate the **Table State Store** row, click the three vertical dots on the right side of the row, then choose **View State**. You must wait until the table's state is fully removed from this list before proceeding. Do not continue until this configuration change has completed. 3. Clean up the destination: Once the table's state shows as fully removed, manually [DROP](/sql-reference/sql/drop-table) the destination table in Snowflake. Note that the connector will not overwrite an existing destination table during the snapshot phase; if the table still exists, replication will fail again. Optionally, the journal table and stream can also be removed if they are no longer needed. 4. Re-add the table by reversing the change you made in the first step: either remove the table from **Re-snapshot Table Exclusions**, or add it back to **Included Table Names** or **Included Table Regex**. The connector then re-snapshots the table. 5. Verify the restart: Check the **Table State Store** using the instructions given previously. The state of the table should appear with the status NEW, then transition to SNAPSHOT_REPLICATION, and finally INCREMENTAL_REPLICATION. ## Upgrading PostgreSQL Upgrading the connector requires a different approach depending on whether PostgreSQL is being upgraded to the next minor or major version. Minor version upgrades - Are data safe. - Require no special treatment. - Require stopping the connector for the duration of the upgrade to avoid reporting connectivity issues. - Continue replicating, after the upgrade, with no data loss. Major version upgrades - Require the PostgreSQL server to drop replication slots, including any used by the connector. - Cannot preserve, or migrate replication slots to the new version. See also [](#label-postgres-upgrade-note). - Restart replicating all tables from the prior snapshot phase. To perform a minor version upgrade, do the following: 1. Stop the connector, including all Processors and Controller Services. 2. Upgrade PostgreSQL. 3. Restart the connector. To perform a major version upgrade, do the following: 1. Remove all tables from replication in the connector by clearing the **Included Table Names** and **Included Table Regex** parameters. 2. Wait until all queues in the connector are empty. 3. Remove the destination tables, by dropping them or renaming. 4. Stop the connector, including all Processors and Controller Services. 5. Open the **Incremental Load** group in the connector. 6. Right-click the top Processor in the group, **Read PostgreSQL CDC Stream**, and select **View state**. 7. Click **Clear state**. 8. Click **Close**. 9. Upgrade PostgreSQL. 10. Restart the connector. A new replication slot will be created. 11. Re-add all tables to begin replication. ### PostgresSQL 17 and later versions upgrades PostgreSQL 17 improved upgrading such that it no longer requires dropping replication slots when upgrading to later versions such as 17.1 %ra% 18.0. Upgrading to PostgreSQL 17.0 or later from prior versions (16 and earlier) drops replications slots and should be treated as a major upgrade. Future versions of PostgreSQL may also improve the upgrade process further. ## Reinstall the connector This section describes how to reinstall the connector. It covers situations where the new connector is installed in the same runtime, or when it is moved to a new runtime. Reinstall is often used in conjunction with [Incremental replication without snapshots](/user-guide/data-integration/openflow/connectors/postgres/incremental-replication). For the connector to be able to continue replicating from the same CDC stream position where it stopped before reinstallation, the source database must retain the WAL long enough to cover the time since the old connector is stopped and the new connector is started. Ensure the `max_wal_size` parameter of the PostgreSQL server is high enough, depending on your traffic, and keep the reinstallation time to a minimum. ### Prerequisites Review and note connector parameter context values. If you're reinstalling the connector in the same runtime, you can reuse the existing context. If the new instance will be located in a different runtime, you will have to re-enter all parameters. To reinstall the connector: 1. Finish processing all in-flight FlowFiles in the existing connector, and then stop the connector. 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **Ingestion** %raa% **Openflow**. 3. Select **Launch Openflow**. 4. In the **Openflow** pane select the **Runtimes** tab. 5. Select the runtime containing the connector. 6. Select the connector. 7. Stop the topmost processor **Set Tables for Replication** in the **Snapshot Load** group. 8. Stop the topmost processor **Read PostgreSQL CDC Stream** in the **Incremental Load** group. 9. If you changed the value of the **Merge Task Schedule CRON** parameter, return it to `* * * * * ?`, otherwise queues won't be emptied until the next scheduled run. Wait until all FlowFiles in the connector have been processed, and all queues are empty. When all FlowFiles have been processed, the **Queued** value on the connector's processor group becomes zero. If there are any items left in the original connector's queues, there may be data gaps when the new connector starts. 10. Stop all processors and controller services in the connector. 2. Find and copy the name of the replication slot used by the original connector, by viewing the state of the topmost processor in the `Incremental Load` group with name `Read PostgreSQL CDC Stream`. The replication slot name is stored under the key `replication.slot.name`. Copy the value of the key to a text editor. 3. Create a new instance of the connector. If you're using the same runtime as the original connector, you can choose to keep the existing parameter contexts, and reuse the settings. The existing connector can remain in the runtime and doesn't interfere with the new instance, as long as it remains stopped. 4. If you're installing into a different runtime, or you deleted the previous parameter contexts, enter all the configuration settings into the new parameter contexts, including the table names and patterns as described in [](/user-guide/data-integration/openflow/connectors/postgres/setup). 5. Open the `PostgreSQL Ingestion Parameters` context, and set `Ingestion Type` parameter to `incremental`. For more information on the concerns see [](#label-postgres-incremental-replication). 6. Open the `PostgreSQL Source Parameters` context, and set the `Replication Slot Name` parameter to the value you copied earlier. 7. Start the new connector. ### Usage notes The new connector will use the same, existing destination tables that created by the original connector, but will create new journal tables. --- title: Openflow Connector for PostgreSQL: Data mapping source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/postgres/data-mapping.md section: Loading & Unloading Data --- # %postgresql%: Data mapping This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/postgres/about) - [](/user-guide/data-integration/openflow/connectors/postgres/setup) This topic describes how PostgreSQL data types are mapped to Snowflake data types. ## PostgreSQL to Snowflake data type mapping The following table shows how PostgreSQL data types are mapped to Snowflake data types when replicating data.
Any PostgreSQL data types not listed in this table are mapped to TEXT by default. --- title: Openflow Connector for PostgreSQL: Set up incremental replication without snapshots source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/postgres/incremental-replication.md section: Loading & Unloading Data --- # %postgresql%: Set up incremental replication without snapshots This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/postgres/setup) - [](/user-guide/data-integration/openflow/connectors/postgres/data-mapping) You can configure the %postgresql% connector to immediately replicate incremental changes for newly added tables, bypassing snapshots. Use incremental load to continue replication without snapshotting every table again when you reinstall the connector over previously replicated data. To enable incremental replication in a new connector instance: 1. Set up the connector as described in [](/user-guide/data-integration/openflow/connectors/postgres/setup). 2. In the `PostgreSQL Ingestion Parameters` context, set the `Ingestion Type` parameter to `incremental`. ## Enable incremental replication without snapshots To enable incremental replication on an existing connector: 1. sign in to %sf-web-interface-link%. 2. in the navigation menu, select **Ingestion** %raa% **Openflow**. 3. In the **Openflow** pane select the **Runtimes** tab. 4. Select the runtime containing the connector. 5. Select the connector. 6. In the `Ingestion Parameters` context, specify `Ingestion Type` = `incremental`. 7. Add new replication tables. These tables immediately switch to their incremental load. To return to replicating tables with the snapshot load, change **Ingestion Type** from `incremental` to `full`. # Usage notes - Changing the value of **Ingestion Type** does not impact any tables that have begun replicating data. Tables currently in the snapshot phase continue until the snapshot load is complete. - While **Ingestion Type** is set to `incremental`, new tables added to the list of replicated tables bypass the snapshot phase. This includes new tables added to the source database that match the `Included Table Regex` parameter. Ensure that the ingestion type is set to `incremental` to bypass the snapshot phase. Connectors should only remain in `incremental` mode as long as required as it bypasses snapshots. Once customer needs for incremental updates have been satisfied the connector should be returned to `full` mode. - For tables that bypass snapshot load, the connector creates a destination table in Snowflake, by executing `CREATE TABLE IF NOT EXISTS`, only if no destination table already exists. Tables going through the snapshot require that no destination table exist. --- title: Openflow Connector for Salesforce Bulk API: Configure the connector source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/configure-connector.md section: Loading & Unloading Data --- # %salesforcebulkapiof%: Configure the connector This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-snowflake) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/formula-fields) This topic describes the steps to configure the %salesforcebulkapiof%. ## Install the connector Follow these steps to install the %salesforcebulkapiof% in an Openflow runtime: 1. Navigate to the Openflow **Overview** page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find **Openflow connector for Salesforce Bulk API** and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down. The Openflow canvas appears with the connector process group added to it. ## Configure the connector To configure the connector, perform the following steps: 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values as described in the table below.
## Verify the Salesforce connection Before enabling and starting the connector, Snowflake recommends verifying that the Salesforce authentication is properly configured. The **Verification** feature on controller services lets you test the connection without starting the full connector flow. The **JWT Bearer OAuth2 Access Token Provider** controller service depends on two other controller services that must be enabled first: the **Salesforce Private Key Service** and the **Web Client Service Provider**. 1. Double-click the connector process group to open it. 2. Right-click on an empty area of the canvas and select **Controller Services**. 3. Enable the **Salesforce Private Key Service** and the **Web Client Service Provider** services. 4. Locate the **JWT Bearer OAuth2 Access Token Provider** service in the list. 5. Click the **Verification** button for the service. A dialog opens where you can provide property overrides. You can ignore this and click **Verify** directly. 6. If everything is configured properly, the **Acquire token** step shows a green checkmark indicating success. This confirms the connector can authenticate with Salesforce and obtain an access token. You can proceed to the next step to run the connector. 7. If verification fails, review the error message and check the following: - The **OAuth2 Client ID** parameter matches the **Consumer Key** from the external client app in Salesforce. - The private key corresponds to the certificate uploaded to the external client app. - The **OAuth2 Subject** user is authorized for the external client app (see [](#salesforce-approve-client-app)). - The **OAuth2 Token Endpoint URL** uses the correct Salesforce instance hostname. - The **OAuth2 Audience** is set to the correct value: `https://login.salesforce.com` for production or `https://test.salesforce.com` for sandboxes. For detailed troubleshooting, see [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/troubleshoot). ## Run the connector Follow these steps to start the connector and begin replicating data from Salesforce to Snowflake: 1. Right-click on an empty area in the canvas and select **Enable all Controller Services**. 2. Right-click on the connector process group and select **Start**. ## Manage object replication After the connector has been started and objects have been replicated, you can add new objects or remove existing objects from replication. ### Add new objects to replication To add a new object to replication, update the **Filter** parameter (or **Special Objects Filter** parameter, if applicable) with the new object names. You do not need to stop the connector. The new object is replicated at the next scheduled execution. For example, if the current **Filter** value is `Account, Opportunity` and you want to add the `Contact` object, change the value to `Account, Opportunity, Contact`. ### Remove objects from replication Removing an object from replication requires stopping the connector and cleaning up both the connector state and the destination table in Snowflake: 1. Stop all processors in the flow by right-clicking on the connector process group and selecting **Stop**. 2. Ensure that no in-flight FlowFiles are being processed. 3. Right-click on the canvas and select **Parameters**, then remove the object name from the **Filter** parameter (or the **Special Objects Filter** parameter, if applicable). 4. Right-click on the canvas and select **Disable all controller services**. 5. Go to **Controller services** and open the state of the controller service named **Salesforce Bulk Jobs State**. 6. Select the trash icon next to the object type you removed to delete its state entry. 7. Right-click on the canvas and select **Enable all controller services**, then start all processors to resume the connector. 8. If applicable, drop the corresponding table from the Snowflake destination database to clean up the previously replicated data. For example: ```sql DROP TABLE ..; ``` ## Next steps - To monitor and troubleshoot the connector, see [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/troubleshoot). --- title: Openflow Connector for Salesforce Bulk API: Salesforce formula fields source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/formula-fields.md section: Loading & Unloading Data --- # %salesforcebulkapiof%: Salesforce formula fields This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/configure-connector) This topic describes how the %salesforcebulkapiof% translates Salesforce formula fields into Snowflake SQL views, including supported functions and limitations. ## How formula views work When **Enable Views Creation** is set to `true`, the connector performs the following for each object that has formula fields: 1. Retrieves the formula expressions from the Salesforce object metadata via the Describe API. 2. Parses each formula expression and translates it into equivalent Snowflake SQL. 3. Generates a `CREATE OR REPLACE VIEW` statement that combines non-formula columns from the base table with the translated formula expressions as computed columns. 4. Runs the DDL against Snowflake to create or update the view. The resulting view is named `_FORMULA_VW`. For example, the `Account` object produces a view named `ACCOUNT_FORMULA_VW`. You can query this view to obtain formula field values alongside the replicated data. The view is automatically updated whenever the connector detects schema changes in the source object, ensuring that formula definitions stay in sync with Salesforce. ## Cross-object formula fields Salesforce formulas can reference fields from related objects using relationship traversal (for example, `Account.Owner.Name`). The connector supports these cross-object references by generating `LEFT JOIN` clauses in the view definition. Each relationship traversal produces a join to the corresponding related table in Snowflake. For cross-object formulas to work correctly, the related objects must also be replicated by the connector. The connector does not check whether the referenced tables exist in Snowflake at translation time. If a related object is not being synced, the generated `CREATE OR REPLACE VIEW` statement references a table that does not exist in Snowflake, and the view creation fails. To resolve this, ensure that all related objects referenced by formula fields are included in the **Filter** parameter. The view is automatically recreated on the next connector run after the referenced tables exist. ## Formula view column comments Each formula column in the generated view includes a SQL `COMMENT` annotation: - For successfully translated formulas, the comment contains the original Salesforce formula expression. - For formulas that could not be translated, the comment contains the failure reason code. You can inspect these comments by running `DESCRIBE VIEW ` in Snowflake. ## Supported formula functions The following Salesforce formula functions are translated into equivalent Snowflake SQL:
In addition to functions, the following operators are supported: - Arithmetic: `+`, `-`, `*`, `/`, `^` (exponentiation, translated to `POWER`) - Comparison: `=`, `==`, `!=`, `<>`, `<`, `<=`, `>`, `>=` - Logical: `AND`, `OR`, `&&`, `||` - String concatenation: `&` (translated to `||` with `COALESCE` null handling) - Unary: `-` (negation), `NOT` ## Unsupported formula constructs The following formula constructs are not yet supported. Support for additional functions and constructs will be added in future releases. When a formula uses any of these, the corresponding column in the view returns `NULL` and the column comment indicates the failure reason.
--- title: Openflow Connector for Salesforce Bulk API: Set up Salesforce source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce.md section: Loading & Unloading Data --- # %salesforcebulkapiof%: Set up Salesforce This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-snowflake) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/configure-connector) This topic describes the steps to set up Salesforce for the %salesforcebulkapiof%. The connector authenticates with Salesforce using the OAuth 2.0 JWT Bearer Flow. This requires creating a certificate key pair, configuring an external client app in Salesforce, and authorizing a user to use the app. Salesforce has deprecated Connected Apps in favor of External Client Apps. If you have an existing Connected App, Snowflake recommends creating a new External Client App instead. ## Create certificates You need a private key and public certificate to configure the external client app in Salesforce. The private key is used by the connector to sign JWT tokens, and the public certificate is uploaded to the external client app in Salesforce so that Salesforce can verify the signature. 1. Generate the private key. You are asked for a password to secure the private key. ```bash openssl genpkey -algorithm RSA -out private.key -aes256 ``` Record the password. You need it when configuring the connector parameters in Snowflake. 2. Create a self-signed certificate from the private key. ```bash openssl req -new -x509 -key private.key -out public.crt -days 365 ``` You can also generate a Certificate Signing Request (CSR) to have a certificate signed by your company CA. You are responsible for safeguarding and rotating the public key and private key files used for key-pair authentication according to the security policies of your organization. ## Create an external client app in Salesforce Create an external client app in Salesforce with JWT Bearer Flow. The connector requires this specific OAuth flow to authenticate. Using a different OAuth flow (such as Authorization Code Flow) causes `invalid_grant` errors. 1. Log in to Salesforce as an administrator. 2. Go to **Setup** %ra% **Apps** %ra% **App Manager**, and then select **New External Client App**. 3. Fill in the required fields: - **External Client App Name**: For example, `Openflow connector for Salesforce Bulk API`. - **Contact Email**: For example, `salesforceadmin@mycompany.com`. 4. In the **API (Enable OAuth Settings)** section, select the **Enable OAuth** checkbox. 5. Provide a valid **Callback URL** (for example, `https://www.google.com/`). The callback URL is required by Salesforce, but it is not used by the JWT Bearer Flow. You can provide any valid URL. 6. Provide the desired **OAuth Scopes** for the application. The following scopes are required for the connector to operate properly: - Manage user data via APIs (`api`) - Perform requests at any time (`refresh_token`, `offline_access`) 7. In **Flow Enablement**, select the **Enable JWT Bearer Flow** checkbox and upload the `public.crt` file created in the previous step. You must select **Enable JWT Bearer Flow** specifically. Do not enable other flows unless you have a specific reason to do so. The certificate you upload here must correspond to the private key (`private.key`) that you configure in the connector parameters. 8. Click **Create** to complete the application creation process. 9. Go to the **Settings** tab, expand the **OAuth Settings** section, and click **Consumer Key and Secret** to retrieve the credentials of your application. 10. Record the values for the **Consumer Key** and the **Consumer Secret** for use when configuring the connector in Snowflake. The **Consumer Key** is used as the **OAuth2 Client ID** parameter in the connector configuration. ## Approve the client app for a user The connector interacts with Salesforce APIs on behalf of a specific user (the OAuth2 Subject configured in the connector parameters). You must authorize this user to use the external client app by assigning the appropriate profiles or permission sets. If this step is not completed, the connector receives a permission error when attempting to authenticate, even if the JWT Bearer Flow is configured correctly. 1. Go to the **Policies** tab of the client application. 2. Click **Edit**. 3. Expand the **OAuth Policies** section and change **Permitted Users** to **Admin approved users are pre-authorized**. 4. Expand the **App Policies** section and select the profiles or permission sets that are assigned to the Salesforce user you want the connector to use. For example, if the user has the `System Administrator` profile, select that profile. The user specified as the **OAuth2 Subject** in the connector configuration must belong to at least one of the profiles or permission sets selected here. If the user is not authorized, you receive a permission error when verifying or running the connector. 5. Click **Save**. ## Verify credentials match Before proceeding to the Snowflake setup, confirm that the following credentials all belong to the same external client app and key pair: - The **Consumer Key** (Client ID) was retrieved from the external client app you just created. - The **private key** (`private.key`) corresponds to the **certificate** (`public.crt`) uploaded to the same external client app. - The **OAuth2 Subject** (user) is authorized for this external client app through the profile or permission set assignment. If you have created multiple external client apps or experimented with different configurations, mixing credentials from different apps or key pairs is a common source of `invalid_grant` errors. When in doubt, create a new external client app with a fresh certificate and key pair. ## Next steps Perform the Snowflake setup tasks: [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-snowflake) --- title: Openflow Connector for Salesforce Bulk API: Set up Snowflake source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-snowflake.md section: Loading & Unloading Data --- # %salesforcebulkapiof%: Set up Snowflake This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/configure-connector) This topic describes the steps to set up Snowflake for the %salesforcebulkapiof%. ## Prerequisites Before you begin, ensure you have completed the following: - Install Openflow (either BYOC or SPCS). For more information, see [](/user-guide/data-integration/openflow/about). - Create an Openflow deployment. For more information, see [](/user-guide/data-integration/openflow/setup-openflow-spcs-deployment) or [](/user-guide/data-integration/openflow/setup-openflow-byoc). - Create an Openflow runtime. For more information, see [](/user-guide/data-integration/openflow/setup-openflow-spcs-create-runtime) or [](/user-guide/data-integration/openflow/setup-openflow-byoc). - Review the known limitations of the connector in [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about). ## Create a key pair Create a key pair that will be used by the service account user in the connector to interact with the database. This step is only required if you are deploying the connector in Openflow BYOC. It is NOT needed when deploying the connector in Openflow SPCS. 1. Generate a private key. The example below shows how to generate an unencrypted private key. ```bash openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt ``` The content of the `rsa_key.p8` file will look like this: ```text -----BEGIN PRIVATE KEY----- MIIE6T... -----END PRIVATE KEY----- ``` 2. Generate the public key by referencing the private key. ```bash openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub ``` The content of the `rsa_key.pub` file will look like this: ```text -----BEGIN PUBLIC KEY----- MIIBIjANBgkqh... -----END PUBLIC KEY----- ``` Copy the contents of this file (without the `-----BEGIN PUBLIC KEY-----` and `-----END PUBLIC KEY-----` headers) to use when creating the user in the next section. ## Create objects and grant privileges Create a service account, role, database, schema, and warehouse for the connector, and grant the appropriate permissions. 1. Use a role with `ACCOUNTADMIN` privileges to set the role: ```sql USE ROLE ACCOUNTADMIN; ``` 2. Create the destination Snowflake database, if it does not exist: ```sql CREATE DATABASE IF NOT EXISTS ; ``` 3. Create the destination schema in the database, if it does not exist: ```sql CREATE SCHEMA IF NOT EXISTS .; ``` 4. Create the role used by the Openflow connector: ```sql CREATE ROLE IF NOT EXISTS ; ``` 5. Grant the privileges to the role to use the database: ```sql GRANT USAGE ON DATABASE TO ROLE ; GRANT USAGE ON SCHEMA . TO ROLE ; GRANT CREATE TABLE ON SCHEMA . TO ROLE ; ``` 6. Create a warehouse for the connector (or use an existing one) and grant usage privileges to the connector role: ```sql -- Create a warehouse (skip if you wish to use an existing warehouse) CREATE OR REPLACE WAREHOUSE MY_WAREHOUSE WITH WAREHOUSE_SIZE = 'SMALL' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE; GRANT USAGE, OPERATE ON WAREHOUSE MY_WAREHOUSE TO ROLE ; ``` 7. Create the service user and assign the role and public key: ```sql -- Create a service user that the connector will use to interact with Snowflake -- Set default role to -- Assign the public key generated with openssl in the previous step (only for BYOC) CREATE OR REPLACE USER TYPE = SERVICE DEFAULT_ROLE = RSA_PUBLIC_KEY = ''; -- Grant the role to the user GRANT ROLE TO USER ; ``` ## Create a network rule (Openflow Snowflake Deployment only) If you are deploying the connector in a runtime that is in an Openflow Snowflake Deployment, you must create a network rule and external access integration and set them on the runtime. ```sql USE ROLE SECURITYADMIN; CREATE NETWORK RULE MY_OPENFLOW_SALESFORCE_NETWORK_RULE TYPE = HOST_PORT MODE = EGRESS VALUE_LIST = (':443'); CREATE EXTERNAL ACCESS INTEGRATION MY_OPENFLOW_SALESFORCE_EAI ALLOWED_NETWORK_RULES = (MY_OPENFLOW_SALESFORCE_NETWORK_RULE) ENABLED = TRUE COMMENT = 'External Access Integration to connect to Salesforce'; GRANT USAGE ON INTEGRATION MY_OPENFLOW_SALESFORCE_EAI TO ROLE ; ``` ## Next steps Configure the connector in Openflow: [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/configure-connector) --- title: Openflow Connector for SQL Server: Data mapping source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/sql-server/data-mapping.md section: Loading & Unloading Data --- # %sqlserver%: Data mapping This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/sql-server/about) - [](/user-guide/data-integration/openflow/connectors/sql-server/setup) This topic describes how the SQL Server data types are mapped to Snowflake data types. ## SQL Server to Snowflake data type mapping The following table shows how SQL Server data types are mapped to Snowflake data types when replicating data.
Any SQL Server data types not listed in this table are mapped to TEXT by default. --- title: Openflow Connector for SQL Server: Maintenance source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/sql-server/maintenance.md section: Loading & Unloading Data --- # %sqlserver%: Maintenance This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/sql-server/setup) - [](/user-guide/data-integration/openflow/connectors/sql-server/data-mapping) This topic describes maintenance considerations and best practices for the %sqlserver%, such as reinstalling the connector or setting the change tracking starting position. These operations are often used in conjunction with [Incremental replication without snapshots](/user-guide/data-integration/openflow/connectors/sql-server/incremental-replication). ## Check the replication status of a table Interim failures, such as connection errors, do not prevent table replication. However, permanent failures, such as unsupported data types, prevent table replication. To troubleshoot replication issues or verify that a table has been successfully removed from the replication flow, check the Table State Store: 1. In the Openflow runtime canvas, right-click a processor group and choose **Controller Services**. A table listing controller services displays. 2. Locate the row labeled **Table State Store**, click the **More** %sf-vertical-more-button% button on the right side of the row, and then choose **View State**. A list of tables and their current states displays. Type in the search box to filter the list by table name. The possible states are: - **NEW**: The table is scheduled for replication but replication hasn't started. - **SNAPSHOT_REPLICATION**: The connector is copying existing data. This status displays until all records are stored in the destination table. - **INCREMENTAL_REPLICATION**: The connector is actively replicating changes. This status displays after snapshot replication ends and continues to display indefinitely until a table is either removed from replication or replication fails. - **FAILED**: Replication has permanently stopped due to an error. The Openflow runtime canvas doesn't display table status changes — only the current table status. However, table status changes are recorded in logs when they occur. Look for the following log message: ```text Replication state for table .. changed from to ``` If a permanent failure prevents table replication, remove the table from replication. After you address the problem that caused the failure, you can add the table back to replication. For more information, see [Restart table replication](#label-of-sql-server-restart-table-replication). ## Reinstall the connector This section provides instructions on how to reinstall the connector, and continue replicating data for the same tables without having to snapshot them again. It covers situations where the new connector is installed in the same runtime, as well as moved to a new runtime. ### Prerequisites Review and note connector parameter context values. If you reinstall the connector in the same runtime, you can reuse the existing context. If the new instance is located in a different runtime, you must re-enter all parameters. 1. Finish processing all in-flight FlowFiles in the existing connector, then stop the connector. 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **Ingestion** %raa% **Openflow**. 3. Select **Launch Openflow**. 4. In the **Openflow** pane select the **Runtimes** tab. 5. Select the runtime containing the connector. 6. Select the connector. 7. Stop the topmost processor **Set Tables for Replication** in the **Snapshot Load** group. 8. Stop the topmost processor **Read SQLServer Change Tracking tables** in the **Incremental Load** group. 9. If you changed the value of the **Merge Task Schedule CRON** parameter, return it to `* * * * * ?`, otherwise queues won't be emptied until the next scheduled run. Wait until all FlowFiles in the connector have been processed, and all queues are empty. When all FlowFiles have been processed, the **Queued** value on the connector's processor group becomes zero. If there are any items left in the original connector's queues, there may be data gaps when the new connector starts. 10. Stop all processors and controller services in the connector. The existing connector can remain in the runtime and doesn't interfere with the new instance, as long as it remains stopped. 2. Create a new instance of the connector. If you use the same runtime as the original connector, you can choose to keep the existing parameter contexts and reuse the settings. 3. If you install into a different runtime or you deleted the previous parameter contexts, enter the configuration settings into the new parameter contexts, including the table names and patterns as described in [](/user-guide/data-integration/openflow/connectors/sql-server/setup). 4. Navigate to the `SQLServer Ingestion Parameters` context, and set the following parameters: - Set the `Ingestion Type` parameter to `incremental`. For information, see [](#label-sql-server-incremental-replication). - Set the `Starting Change Tracking Position` parameter to `Earliest`. For information, see [](#label-sql-server-connector-start-restart-incremental-load-from-earliest-available-change-tracking-position). 5. Start the new connector. ### Usage notes The new connector uses the existing destination tables created by the original connector, but creates new journal tables. ## Specify load from change tracking table position The %sqlserver% connector lets you select the starting position where change tracking tables are read. By default, the connector reads from the latest available position. Alternatively, you can choose the earliest position available on the source instance. Choosing to start from the earliest position is common when reinstalling the connector. This allows the new instance to catch up and continue replicating existing tables without having to snapshot each again. Switching a running connector from latest to earliest position causes the contents of change tracking tables to be re-read, re-processed, and re-applied to the destination table. While the change tracking tables are being re-read, the data in affected destination tables can become out of sync with their sources until all events have been re-processed and merged. The following parameters are available in the `Ingestion Parameters` context:
To determine whether the connector finished re-reading the change tracking tables: 1. Navigate to the Openflow canvas. 2. Open the **Incremental Load** process group. 3. Right-click the topmost processor named **Read SQLServer Change Tracking tables**, then select **View state**. 4. Check the state entries for every table with keys starting with `position.`. If a value is `0/0` then the connector has not yet finished re-reading the changes for this table. ### Usage notes - After you switch a running connector to read from the earliest positions and start it, you can't reconfigure or cancel the process, and it will continue until the currently-read positions reach the latest values. - Switching to the earliest position on a running connector will, for any tables being re-processed, finish their existing journals, and create new journal tables. --- title: Openflow Connector for SQL Server: Set up incremental replication without snapshots source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/sql-server/incremental-replication.md section: Loading & Unloading Data --- # %sqlserver%: Set up incremental replication without snapshots This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/sql-server/setup) - [](/user-guide/data-integration/openflow/connectors/sql-server/data-mapping) You can configure the %sqlserver% connector to immediately replicate incremental changes for newly added tables, bypassing snapshots. Use incremental load to continue replication without snapshotting every table again when you reinstall the connector over previously replicated data. To enable incremental replication in a new connector instance: 1. Set up the connector as described in [](/user-guide/data-integration/openflow/connectors/sql-server/setup). 2. In the `SQLServer Ingestion Parameters` context, set the `Ingestion Type` parameter to `incremental`. ## Enable incremental replication without snapshots To enable incremental replication on an existing connector: 1. sign in to %sf-web-interface-link%. 2. in the navigation menu, select **Ingestion** %raa% **Openflow**. 3. In the **Openflow** pane select the **Runtimes** tab. 4. Select the runtime containing the connector. 5. Select the connector. 6. In the `Ingestion Parameters` context, specify `Ingestion Type` = `incremental`. 7. Add new replication tables. These tables immediately switch to their incremental load. To return to replicating tables with the snapshot load, change **Ingestion Type** from `incremental` to `full`. # Usage notes - Changing the value of **Ingestion Type** does not impact any tables that have begun replicating data. Tables currently in the snapshot phase continue until the snapshot load is complete. - While **Ingestion Type** is set to `incremental`, new tables added to the list of replicated tables bypass the snapshot phase. This includes new tables added to the source database that match the `Included Table Regex` parameter. Ensure that the ingestion type is set to `incremental` to bypass the snapshot phase. Connectors should only remain in `incremental` mode as long as required as it bypasses snapshots. Once customer needs for incremental updates have been satisfied the connector should be returned to `full` mode. - For tables that bypass snapshot load, the connector creates a destination table in Snowflake, by executing `CREATE TABLE IF NOT EXISTS`, only if no destination table already exists. Tables going through the snapshot require that no destination table exist. --- title: Openflow connectors source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/about-openflow-connectors.md section: Loading & Unloading Data --- # Openflow connectors This feature is not available in the People's Republic of China. Openflow is available to all accounts in AWS [](#label-na-general-regions). The connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/setup-openflow-byoc) - [](/user-guide/data-integration/openflow/setup-openflow-spcs) - [](/user-guide/data-integration/openflow/manage) Openflow connectors are curated, versioned Apache NiFi flow definitions built using open-source and proprietary NiFi components. These connectors follow a strict set of design patterns to ensure performance, fault-tolerance, and ease of configuration. Review the details of the following connectors available in Openflow:
--- title: Openflow Snowflake Deployment cost and scaling considerations source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/cost-spcs.md section: Loading & Unloading Data --- # Openflow Snowflake Deployment cost and scaling considerations This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about-spcs) - [](/user-guide/data-integration/openflow/setup-openflow-spcs) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) When running %ofsfspcs% you must be aware of the cost considerations associated with multiple Snowflake components, including, but not limited to the following cost categories: - Compute pool costs - Snowpark Container Services infrastructure - Data Ingestion - Telemetry Data Ingestion - Other costs not explicitly mentioned in this topic Using and scaling Openflow involves understanding these costs. The following sections describe Openflow costs in general, and provide a number of examples of scaling Openflow runtimes and associated costs. ## %ofsfspcs% costs When using %ofsfspcs%, you can incur costs from multiple Snowflake components that Openflow uses. These cost categories are described in the following sections. However, your actual costs may vary based on your specific environment. See [](#label-openflow-spcs-consumption-examples) for examples of different cost consumption scenarios. ### Openflow compute pool costs This cost category is shown as **Openflow Compute Snowflake** on your Snowflake bill. The total costs for running Openflow are based on the number and types of instances used by [Snowpark Container Service compute pools](/developer-guide/snowpark-container-services/working-with-compute-pool) in your Snowflake account. Openflow uses compute pools for two different purposes: - Openflow Management Services Openflow Management Services run as part of an Openflow deployment. They use a compute pool to manage the Openflow deployment. This compute pool begins running as soon as you create a deployment. It continues to run as long as the deployment is active. The compute pool associated with the Openflow Management Services continues to run and incurs costs, even if there are no runtimes running. - Openflow runtimes Openflow uses compute pools to run the Openflow runtimes. The number of compute pools required and the number of nodes within each compute pool are scaled based on the number of runtimes that are currently running. When all runtimes associated with a runtime are stopped, the compute pool associated with the runtimes is scaled down to 0 nodes. No costs are incurred for a runtime compute pool when it is not in use. Credits are billed per-second with a 5 minute minimum. For information on the rate per Snowpark Container Services Compute Instance Family per hour, refer to Table 1(d) in the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf). The following views in the [](/sql-reference/account-usage) schema provide additional details on Openflow compute costs: - [METERING_DAILY_HISTORY](/sql-reference/account-usage/metering_daily_history) - [METERING_HISTORY](/sql-reference/account-usage/metering_history) Compute pool costs related to Openflow appear under *SERVICE_TYPE* as *OPENFLOW_COMPUTE_SNOWFLAKE*. The [OPENFLOW_USAGE_HISTORY](/sql-reference/account-usage/openflow_usage_history) view currently does not contain records for the *OPENFLOW_COMPUTE_SNOWFLAKE* service type. For more information on compute costs in Snowflake, see [](/user-guide/cost-exploring-compute). ### Snowpark Container Services infrastructure costs In addition to compute pool costs, there are costs associated with additional Snowpark Container Services infrastructure, including storage and data transfer. For additional information, see [](/developer-guide/snowpark-container-services/accounts-orgs-usage-views). ### Data ingestion costs Costs are incurred when loading data into Snowflake using services such as Snowpipe or Snowpipe Streaming. These costs are based on the volume of data ingested. These costs appear on your Snowflake bill under their respective ingestion services line items. Additionally, some connectors may require a warehouse and will incur warehouse costs. For example, database CDC connectors require a warehouse for both the initial snapshots and ongoing incremental Change Data Capture (CDC). ### Telemetry data ingestion costs When using an event table to store telemetry data for Openflow, Snowflake charges for sending logs and metrics to Openflow deployments. There are also charges for sending runtime telemetry data to your event table within Snowflake. The rate for credits per GB of telemetry data is specified in Table 5 in the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf) This item is referred to as Telemetry Data Ingest. ## Reducing Openflow credit consumption If you have runtimes that are not actively in use, you can suspend them to reduce costs. Suspending a runtime stops credit consumption for the associated runtime compute pool. When a runtime is suspended, its compute pool scales down to 0 nodes and no longer incurs charges. ## %ofsfspcs% costs associated with runtimes and scaling behavior How you choose to configure and scale runtimes is important for managing costs effectively. Openflow supports different runtime types, each with its own scaling characteristics and associated costs. ### Mapping runtimes to Snowflake compute pools The runtime type you choose determines the runtime pods that are scheduled on the associated compute pool. Using a larger runtime type will result in a larger compute pool being used, which will incur higher costs. The runtime sizes and their scaling behavior are described in the following table: | Runtime type | vCPUs | Available memory (GB) | Snowflake Compute Pool instance family | Snowflake Compute Pool | Instance Family - vCPUs | Instance Family - memory (GB) | | ------------ | ----- | --------------------- | -------------------------------------- | -------------------------- | ----------------------- | ----------------------------- | | Small | 1 | 2 | CPU_X64_S | INTERNAL_OPENFLOW_0_SMALL | 4 | 16 | | Medium | 4 | 10 | CPU_X64_SL | INTERNAL_OPENFLOW_0_MEDIUM | 16 | 64 | | Large | 8 | 20 | CPU_X64_L | INTERNAL_OPENFLOW_0_LARGE | 32 | 128 | Openflow scales the underlying Snowflake Compute Pools when additional compute pool nodes need to be scheduled, based on CPU consumption, and up to the maximum node setting set during runtime creation. Compute pools are configured with a minimum size of 0 nodes and a maximum of 50 nodes. The required size is dynamically adjusted depending on the CPU and memory requirements of the runtimes. If there are no resource demands, for example, if the runtime is not running, a compute pool scales down to 0 nodes after 600 seconds (10 minutes). ### Examples for calculating %ofsfspcs% consumption
You created an Openflow Snowflake Deployment and have not created any runtimes.
- The Openflow_Control_Pool_0 Compute Pool is running with one CPU_X64_S instance - Total Openflow consumption = 1 CPU_X64_S instance-hour
You created one small runtime with Min Nodes = 1 and Max Nodes = 2. Runtime stays at 1 node for 1 hour.
- The Openflow_Control_Pool_0 Compute Pool is running with 1 CPU_X64_S instance - The INTERNAL_OPENFLOW_0_SMALL Compute Pool is running with 1 CPU_X64_S instance - Total Openflow consumption = 2 CPU_X64_S instance-hours
You created two small runtimes with min/max of two nodes each, and one large runtime with min/max of 10 nodes. These Runtimes are active for one hour.
- The Openflow_Control_Pool_0 Compute Pool is running with 1 CPU_X64_S instance - Two small runtimes at two nodes = INTERNAL_OPENFLOW_0_SMALL Compute Pool is running with 2 CPU_X64_S instances = 2 CPU_X64_S instance-hours - One large runtime at 10 nodes = INTERNAL_OPENFLOW_0_LARGE Compute Pool is running with 4 CPU_X64_L instances = 4 CPU_X64_L instance-hours - Total Openflow consumption = 3 CPU_X64_S instance-hours + 4 CPU_X64_L instance-hour
You created one medium runtime with one node. After 20 minutes, it scales to two nodes. After 20 minutes, it scales back down to one node and runs for another 20 minutes.
- The Openflow_Control_Pool_0 Compute Pool is running with 1 CPU_X64_S instance - One medium runtime scaling up to two medium runtimes = INTERNAL_OPENFLOW_0_MEDIUM Compute Pool is running with 1 CPU_X64_SL instance = 1 CPU_X64_SL instance-hour - Total Openflow consumption = 1 CPU_X64_S instance-hour + 1 CPU_X64_SL instance-hour
You created one medium runtime with two nodes, then suspended it after 30 minutes.
- The Openflow_Control_Pool_0 Compute Pool is running with 1 CPU_X64_S instance - One medium runtime at one node = INTERNAL_OPENFLOW_0_MEDIUM Compute Pool is running with 1 CPU_X64_SL instance - 30 minutes = 1/2 hour - Total Openflow consumption = 1 CPU_X64_S instance-hour +1/2 CPU_X64_SL instance-hour
--- title: PackageFlowFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/packageflowfile.md section: Loading & Unloading Data --- # PackageFlowFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description This processor will package FlowFile attributes and content into an output FlowFile that can be exported from NiFi and imported back into NiFi, preserving the original attributes and content. ## Tags attributes, flowfile, flowfile-stream, flowfile-stream-v3, package ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## Use Cases Involving Other Components | Send FlowFile content and attributes from one NiFi instance to another NiFi instance. | | ------------------------------------------------------------------------------------- | | Export FlowFile content and attributes from NiFi to external storage and reimport. | ## See also - [org.apache.nifi.processors.standard.MergeContent](/user-guide/data-integration/openflow/processors/mergecontent) - [org.apache.nifi.processors.standard.UnpackContent](/user-guide/data-integration/openflow/processors/unpackcontent) --- title: PaginatedJsonQueryElasticsearch 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/paginatedjsonqueryelasticsearch.md section: Loading & Unloading Data --- # PaginatedJsonQueryElasticsearch 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-elasticsearch-restapi-nar ## Description A processor that allows the user to run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. It will use the flowfile's content for the query unless the QUERY attribute is populated. Search After/Point in Time queries must include a valid "sort" field. ## Tags elasticsearch, elasticsearch7, elasticsearch8, elasticsearch9, json, page, query, read, scroll ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.elasticsearch.ConsumeElasticsearch](/user-guide/data-integration/openflow/processors/consumeelasticsearch) - [org.apache.nifi.processors.elasticsearch.JsonQueryElasticsearch](/user-guide/data-integration/openflow/processors/jsonqueryelasticsearch) - [org.apache.nifi.processors.elasticsearch.SearchElasticsearch](/user-guide/data-integration/openflow/processors/searchelasticsearch) --- title: ParquetIcebergWriter source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/parqueticebergwriter.md section: Loading & Unloading Data --- # ParquetIcebergWriter This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides record serialization for Apache Iceberg using Apache Parquet formatting ## Tags iceberg, openflow, parquet, record ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: ParseEvtx 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/parseevtx.md section: Loading & Unloading Data --- # ParseEvtx 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-evtx-nar ## Description Parses the contents of a Windows Event Log file (evtx) and writes the resulting XML to the FlowFile ## Tags event, evtx, file, logs, message, windows ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ParseExcelCellReference 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/parseexcelcellreference.md section: Loading & Unloading Data --- # ParseExcelCellReference 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-office-nar ## Description Processor responsible for parsing Excel cell reference formula. ## Tags cell, excel, parse, spreadsheet, xls, xlsx ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ParseSyslog 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/parsesyslog.md section: Loading & Unloading Data --- # ParseSyslog 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Attempts to parses the contents of a Syslog message in accordance to RFC5424 and RFC3164 formats and adds attributes to the FlowFile for each of the parts of the Syslog message. Note: Be mindfull that RFC3164 is informational and a wide range of different implementations are present in the wild. If messages fail parsing, considering using RFC5424 or using a generic parsing processors such as ExtractGrok. ## Tags attributes, event, logs, message, syslog, system ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.ListenSyslog](/user-guide/data-integration/openflow/processors/listensyslog) - [org.apache.nifi.processors.standard.PutSyslog](/user-guide/data-integration/openflow/processors/putsyslog) --- title: ParseSyslog5424 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/parsesyslog5424.md section: Loading & Unloading Data --- # ParseSyslog5424 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Attempts to parse the contents of a well formed Syslog message in accordance to RFC5424 format and adds attributes to the FlowFile for each of the parts of the Syslog message, including Structured Data. Structured Data will be written to attributes as one attribute per item id + parameter see [https://tools.ietf.org/html/rfc5424.Note](https://tools.ietf.org/html/rfc5424.Note): ParseSyslog5424 follows the specification more closely than ParseSyslog. If your Syslog producer does not follow the spec closely, with regards to using '-' for missing header entries for example, those logs will fail with this parser, where they would not fail with ParseSyslog. ## Tags attributes, event, logs, message, syslog, syslog5424, system ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.ListenSyslog](/user-guide/data-integration/openflow/processors/listensyslog) - [org.apache.nifi.processors.standard.ParseSyslog](/user-guide/data-integration/openflow/processors/parsesyslog) - [org.apache.nifi.processors.standard.PutSyslog](/user-guide/data-integration/openflow/processors/putsyslog) --- title: PartitionRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/partitionrecord.md section: Loading & Unloading Data --- # PartitionRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Splits, or partitions, record-oriented data based on the configured fields in the data. One or more properties must be added. The name of the property is the name of an attribute to add. The value of the property is a RecordPath to evaluate against each Record. Two records will go to the same outbound FlowFile only if they have the same value for each of the given RecordPaths. Because we know that all records in a given output FlowFile have the same value for the fields that are specified by the RecordPath, an attribute is added for each field. See Additional Details on the Usage page for more information and examples. ## Tags bin, group, organize, partition, record, recordpath, rpath, segment, split ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## Use cases | Separate records into separate FlowFiles so that all of the records in a FlowFile have the same value for a given field or set of fields. | | ----------------------------------------------------------------------------------------------------------------------------------------- | | Separate records based on whether or not they adhere to a specific criteria | ## See also - [org.apache.nifi.processors.standard.ConvertRecord](/user-guide/data-integration/openflow/processors/convertrecord) - [org.apache.nifi.processors.standard.QueryRecord](/user-guide/data-integration/openflow/processors/queryrecord) - [org.apache.nifi.processors.standard.SplitRecord](/user-guide/data-integration/openflow/processors/splitrecord) - [org.apache.nifi.processors.standard.UpdateRecord](/user-guide/data-integration/openflow/processors/updaterecord) --- title: PEMEncodedSSLContextProvider source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/pemencodedsslcontextprovider.md section: Loading & Unloading Data --- # PEMEncodedSSLContextProvider This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description SSLContext Provider configurable using PEM Private Key and Certificate files. Supports PKCS1 and PKCS8 encoding for Private Keys as well as X.509 encoding for Certificates. ## Tags Certificate, ECDSA, Ed25519, Key, PEM, PKCS1, PKCS8, RSA, SSL, TLS, X.509 ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: Performance tuning of the Openflow Connector for Amazon Kinesis Data Streams source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kinesis/performance-tuning.md section: Loading & Unloading Data --- # Performance tuning of the %kinesis% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/kinesis/about) - [](/user-guide/data-integration/openflow/connectors/kinesis/setup) - [](/user-guide/data-integration/openflow/connectors/kinesis/maintenance) - [](/user-guide/data-integration/openflow/connectors/kinesis/troubleshoot) When configuring the Openflow Connector for Kinesis for optimal performance, consider the following key factors that impact ingestion throughput and latency. ## Flowfile size For optimal performance, flowfiles should be in the range 1-10 MB rather than containing individual small messages. Larger flowfiles reduce processing overhead and improve throughput by minimizing the number of individual file operations. Default settings should yield flowfiles in an acceptable size range. Small flowfiles are expected when throughput is low. If you observe small flowfiles with high throughput, contact [Snowflake Support](/user-guide/contacting-support) for assistance. ## Network and infrastructure ### Network latency Lower latency between Kinesis and Openflow improves overall performance. It's highly advised that your Kinesis stream and Openflow are located in the same cloud service provider (CSP) region. ### Node size recommendations The following table provides configuration recommendations based on expected workload characteristics. Throughput values are relative and depend heavily on the source system configuration, topic and stream sizes, data format, and other factors.
## Performance optimization best practices ### Tuning Max Records Per Request When the ConsumeKinesis processor uses the **SHARED_THROUGHPUT** consumer type, the **Max Records Per Request** property controls the maximum number of records that the processor retrieves from Kinesis in a single request. If ingestion throughput is low and you don't see an obvious bottleneck in Openflow, Snowflake, or the network, this value might be too low for your workload. For most workloads, start by setting **Max Records Per Request** so that each request retrieves about 1 MB of data. Estimate the value by dividing 1 MB by your average Kinesis record size. The following table shows example starting values for common average record sizes:
After changing the value, monitor consumer lag, throughput, and runtime resource usage. Increase the value gradually if Kinesis consumption remains the bottleneck. ### Adjusting processor concurrent tasks To optimize processor performance, you can adjust the number of concurrent tasks for both ConsumeKinesis and PublishSnowpipeStreaming processors. Concurrent tasks allow processors to run multiple threads simultaneously, improving throughput for high-volume scenarios. To adjust concurrent tasks for a processor, perform the following tasks: 1. Right-click on the processor in the Openflow canvas. 2. Select **Configure** from the context menu. 3. Navigate to the **Scheduling** tab. 4. In the **Concurrent tasks** field, enter the preferred number of concurrent tasks. 5. Select **Apply** to save the configuration. #### Recommended concurrent task settings
#### Important considerations - **Memory usage**: Each concurrent task consumes additional memory. Monitor JVM heap usage when increasing concurrent tasks. - **Start conservatively**: Begin with lower values and gradually increase while monitoring performance metrics. ## Troubleshoot common performance bottlenecks ### High consumer lag or Snowflake ingestion bottlenecks If Kinesis consumer lag is increasing or Snowflake ingestion is slow, then perform the following tasks: 1. Verify network connectivity and bandwidth between Openflow and Kinesis. 2. Observe if the queue in front of the PublishSnowpipeStreaming processor increases. 1. If yes, consider adding more concurrent tasks for the PublishSnowpipeStreaming processor in the range limitations provided in [Adjusting processor concurrent tasks](#adjusting-processor-concurrent-tasks). 2. If not, consider adding more concurrent tasks for the ConsumeKinesis processor in the range limitations provided in [Adjusting processor concurrent tasks](#adjusting-processor-concurrent-tasks). 3. Consider using a bigger node type. 4. Consider increasing the number of nodes for the runtime. This can be done by stopping the connectors in the runtime. Changing node min and max size numbers and starting the connectors again.. ### Memory pressure If experiencing memory-related issues: 1. Reduce the batch sizes to lower the memory footprint. This can be done by changing the File Fragment Size and File Fragment Count parameters in the PublishSnowpipeStreaming processor. 2. Reduce the number of concurrent tasks for the ConsumeKinesis processor. 3. Consider using a bigger node type. ### Network latency issues If experiencing high latency: 1. Verify network configuration between Openflow and external systems. 2. Consider deploying Openflow in the same region as your Kinesis stream. 3. If working with low throughput, consider lowering the Client Lag settings in the PublishSnowpipeStreaming processor and Max Uncommitted Time in the ConsumeKinesis processor. --- title: Performance Tuning of the Openflow Connector for Kafka source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kafka/performance-tuning.md section: Loading & Unloading Data --- # Performance Tuning of the Openflow Connector for Kafka This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic provides guidance for optimizing the performance of the [](/user-guide/data-integration/openflow/connectors/kafka/about) to achieve optimal throughput and minimize latency when ingesting data into Snowflake. ## Performance considerations When configuring the Openflow Connector for Kafka for optimal performance, consider the following key factors that impact ingestion throughput and latency: ### Kafka configuration #### Partition count More partitions allow for higher parallelism but require careful coordination with consumer configuration. Excessive partitions can cause several issues: increased memory usage, slower leader elections during failures, and significant metadata management overhead on brokers. #### Compression Message compression can reduce network bandwidth but increases CPU overhead. ### Flowfile optimization #### Flowfile size For optimal performance, flowfiles should be in the range 1-10 MB rather than containing individual small messages. Larger flowfiles reduce processing overhead and improve throughput by minimizing the number of individual file operations. Default settings should yield flowfiles in an acceptable size range. Small flowfiles are expected when throughput is low. If you observe small flowfiles with high throughput, contact [Snowflake Support](/user-guide/contacting-support) for assistance. ### Network and infrastructure #### Network latency Lower latency between Kafka brokers and Openflow improves overall performance. Snowflake recommends deploying Kafka brokers and Openflow in the same CSP region. #### Node size recommendations The following table provides configuration recommendations based on expected workload characteristics:
### Performance optimization best practices #### Adjusting processor concurrent tasks To optimize processor performance, you can adjust the number of concurrent tasks for both [ConsumeKafka](/user-guide/data-integration/openflow/processors/consumekafka) and PublishSnowpipeStreaming processors. Concurrent tasks allow processors to run multiple threads simultaneously, improving throughput for high-volume scenarios. To adjust concurrent tasks for a processor, perform the following tasks: 1. Right-click on the processor in the Openflow canvas. 2. Select Configure from the context menu. 3. Navigate to the Scheduling tab. 4. In the Concurrent tasks field, enter the preferred number of concurrent tasks. 5. Select Apply to save the configuration. #### Recommended concurrent task settings The following table provides recommended concurrent task settings for different node sizes:
#### Important considerations
Memory usage
Each concurrent task consumes additional memory. Monitor JVM heap usage when increasing concurrent tasks.
Kafka partitions
For ConsumeKafka, the number of concurrent tasks multiplied by the number of runtime nodes should not exceed the number of total Kafka partitions from all topics.
Start conservatively
Begin with lower values and gradually increase while monitoring performance metrics.
#### Troubleshooting performance issues: Common performance bottlenecks ##### High consumer lag or Snowflake ingestion bottlenecks If Kafka consumer lag is increasing or Snowflake ingestion is slow, then perform the following tasks: 1. Verify network connectivity and bandwidth between Openflow and Kafka brokers. 2. Observe if the queue in front of the PublishSnowpipeStreaming processor increases. 1. If yes, consider adding more concurrent tasks for the PublishSnowpipeStreaming processor in the range limitations provided in [](#label-openflow-kafka-adjust-concurrent-tasks). 2. If not, consider adding more concurrent tasks for the ConsumeKafka processor in the range limitations provided in [](#label-openflow-kafka-adjust-concurrent-tasks). 3. Consider using a bigger node type. 4. Consider increasing the max number of nodes for the runtime. ##### Memory pressure If experiencing memory-related issues: 1. Reduce the batch sizes to lower the memory footprint. 2. Reduce the number of concurrent tasks for the ConsumeKafka processor. 3. Consider upgrading to a bigger node type. ##### Network latency issues If experiencing high latency: 1. Verify network configuration between Openflow and external systems. 2. Consider deploying Openflow closer to your Kafka cluster. 3. If working with low throughput, consider lowering the Client Lag settings in the PublishSnowpipeStreaming processor and Max Uncommitted Time in the ConsumeKafka processor. --- title: PerformSnowflakeCortexOCR 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/performsnowflakecortexocr.md section: Loading & Unloading Data --- # PerformSnowflakeCortexOCR 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowflake-processors-nar ## Description Performs Optical Character Recognition (OCR) on PDF documents using Snowflake Cortex ML functions. Documents must be staged in a Snowflake internal stage with server-side encryption enabled. The processor extracts text content from PDFs and can output the results either as FlowFile content or as an attribute. ## Tags ai, cortex, document, ml, ocr, openflow, pdf, snowflake ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [com.snowflake.openflow.runtime.processors.snowflake.PutSnowflakeInternalStageFile](/user-guide/data-integration/openflow/processors/putsnowflakeinternalstagefile) --- title: PickTablesForReplication 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/picktablesforreplication.md section: Loading & Unloading Data --- # PickTablesForReplication 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-database-cdc-processors-nar ## Description Accepts a list of fully qualified table names and determines if a table: - is new (is not replicated, but was added in the source) - is existing (is replicated and exists in the source) - is stale (is replicated but no longer exists in the source) Configuration is passed as a FlowFile attribute. Processor generates a separate FlowFile for each source table. ## Tags snowflake, state, table ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PolarisIcebergCatalog source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/polarisicebergcatalog.md section: Loading & Unloading Data --- # PolarisIcebergCatalog This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides Apache Iceberg integration with Apache Polaris Catalog access over REST HTTP ## Tags catalog, iceberg, openflow, polaris ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: PromptAnthropicAI 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/promptanthropicai.md section: Loading & Unloading Data --- # PromptAnthropicAI 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-anthropic-nar ## Description Sends a prompt to Anthropic, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. The prompt may consist of pure text interaction or may include an image. Use dynamic properties to enable beta features in the Anthropic endpoint. ## Tags ai, anthropic, chat, image, openflow, prompt, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PromptAzureOpenAI 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/promptazureopenai.md section: Loading & Unloading Data --- # PromptAzureOpenAI 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-openai-nar ## Description Sends a prompt to Azure's OpenAI service, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. The prompt may consist of pure text interaction or may include images. In the case of images, a URL may be provided, or the contents of the FlowFile may be used, depending on the provided configuration ## Tags ai, azure, chat, image, openai, openflow, prompt, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [com.snowflake.openflow.runtime.processors.openai.CreateAzureOpenAiEmbeddings](/user-guide/data-integration/openflow/processors/createazureopenaiembeddings) - [com.snowflake.openflow.runtime.processors.openai.PromptOpenAI](/user-guide/data-integration/openflow/processors/promptopenai) --- title: PromptLLM 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/promptllm.md section: Loading & Unloading Data --- # PromptLLM 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-llm-processors-nar ## Description This processor sends a user defined prompt to a Large Language Model (LLM) to respond. ## Tags ai, llm, openflow, prompt, text processing ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PromptOpenAI 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/promptopenai.md section: Loading & Unloading Data --- # PromptOpenAI 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-openai-nar ## Description Sends a prompt to OpenAI, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. The prompt may consist of pure text interaction or may include images. In the case of images, a URL may be provided, or the contents of the FlowFile may be used, depending on the provided configuration ## Tags ai, chat, image, openai, openflow, prompt, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [com.snowflake.openflow.runtime.processors.openai.CreateOpenAiEmbeddings](/user-guide/data-integration/openflow/processors/createopenaiembeddings) - [com.snowflake.openflow.runtime.processors.openai.PromptAzureOpenAI](/user-guide/data-integration/openflow/processors/promptazureopenai) --- title: PromptSnowflakeCortex 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/promptsnowflakecortex.md section: Loading & Unloading Data --- # PromptSnowflakeCortex 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowflake-processors-nar ## Description Sends a prompt to Snowflake Cortex, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. The prompt may consist of pure text interaction only. ## Tags ai, chat, cortex, openflow, prompt, snowflake, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PromptVertexAI 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/promptvertexai.md section: Loading & Unloading Data --- # PromptVertexAI 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-vertexai-nar ## Description Sends a prompt to VertexAI, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. The prompt may consist of pure text interaction or may include multimedia. ## Tags ai, chat, cloud, gcp, google, image, openflow, pdf, prompt, text, video ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PropertiesFileLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/propertiesfilelookupservice.md section: Loading & Unloading Data --- # PropertiesFileLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A reloadable properties file-based lookup service ## Tags cache, enrich, join, key, lookup, properties, reloadable, value ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted ## Restrictions
## System Resource Considerations This component does not specify system resource considerations. --- title: ProtobufReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/protobufreader.md section: Loading & Unloading Data --- # ProtobufReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Parses a Protocol Buffers message from binary format. ## Tags parser, protobuf, reader, record ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: Publish Data from Snowflake to SAP® BDC Connect for Snowflake source: https://docs.snowflake.com/en/user-guide/data-integration/zero-copy/sap-sql/publish-data.md section: Loading & Unloading Data --- # Publish Data from Snowflake to %sapbdc% - [](/user-guide/data-integration/zero-copy/sap-sql/setup) - [](/user-guide/data-integration/zero-copy/sap-sql/explore-data-products) - [](/user-guide/data-integration/zero-copy/sap-sql/security) This topic describes how to publish Snowflake data back to %sapbdc% by creating a share, granting access to databases, schemas, and tables, and associating the share with a Zerocopy Connector. The connector must be in `CONNECTED` state and have `SHARE_BACK` enabled before associating a share. See [](/user-guide/data-integration/zero-copy/sap-sql/setup) for details. ## Enable Share Back Before publishing data to %sapbdc%, enable share back on the connector: ```sql ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector SET SHARE_BACK = TRUE; ``` The role used to create the share must have the `CREATE SHARE` privilege on the account. For the full list of required privileges, see [](/user-guide/data-integration/zero-copy/sap-sql/security). ## Grant Access to Snowflake Objects To publish Snowflake data to %sapbdc%, you first create a Snowflake share and grant access to the databases, schemas, and tables you want to publish. For more information about creating and managing shares, see [](/sql-reference/sql/create-share). - Only Iceberg V3 tables with copy-on-write enabled can be shared with a Zerocopy Connector. For more information, see [](#label-tables-iceberg-row-level-deletes). - Iceberg tables must use Snowflake as the catalog ([Snowflake-managed Iceberg tables](#label-tables-iceberg-snowflake-as-catalog)). To specify this when creating a table, use `CATALOG = 'SNOWFLAKE'` and `STORAGE_SERIALIZATION_POLICY = 'COMPATIBLE'`. Alternatively, you can set both properties at the database or schema level so that all tables automatically inherit them. For more information, see [](/sql-reference/sql/create-iceberg-table-snowflake). - Each shared data product should map to a single dedicated database. To create an Iceberg table that can be shared with a Zerocopy Connector: ```sql CREATE ICEBERG TABLE my_publish_db.my_schema.my_table ( id STRING, name STRING, value NUMBER ) ICEBERG_VERSION = 3 -- The following parameters can be omitted if they have been set at the parent schema, database, or account level. CATALOG = 'SNOWFLAKE' ICEBERG_MERGE_ON_READ_BEHAVIOR = 'DISABLED' STORAGE_SERIALIZATION_POLICY = 'COMPATIBLE'; ``` ### Create a Share To create a share, the role must have the `CREATE SHARE` privilege on the account. For the full list of required privileges, see [](/user-guide/data-integration/zero-copy/sap-sql/security). Create a share using [CREATE SHARE](/sql-reference/sql/create-share): ```sql CREATE SHARE IF NOT EXISTS my_share; ``` ### Grant Access to the Share Grant `USAGE` on the database: ```sql GRANT USAGE ON DATABASE my_publish_db TO SHARE my_share; ``` Grant `USAGE` on the schema: ```sql GRANT USAGE ON SCHEMA my_publish_db.my_schema TO SHARE my_share; ``` Grant `SELECT` on a specific table: ```sql GRANT SELECT ON TABLE my_publish_db.my_schema.my_table TO SHARE my_share; ``` ### Associate the Share with the Connector After granting access, associate the share with the Zerocopy Connector: ```sql ALTER ZEROCOPY CONNECTOR my_db.my_schema.my_sap_connector ADD SHARE my_share; ``` To view the shares associated with a Zerocopy Connector, use `DESC ZEROCOPY CONNECTOR`: ```sql DESC ZEROCOPY CONNECTOR my_db.my_schema.my_sap_connector; ``` ## Revoke Access To disassociate a share from the Zerocopy Connector: ```sql ALTER ZEROCOPY CONNECTOR my_db.my_schema.my_sap_connector REMOVE SHARE my_share; ``` To revoke access to a previously granted object from the share: ```sql REVOKE USAGE ON DATABASE my_publish_db FROM SHARE my_share; REVOKE USAGE ON SCHEMA my_publish_db.my_schema FROM SHARE my_share; REVOKE SELECT ON TABLE my_publish_db.my_schema.my_table FROM SHARE my_share; ``` ## Publish a Data Product to %sapbdc% After granting access to Snowflake objects, publish the data product to SAP® BDC by calling the `SYSTEM$SAP_PUBLISH_DATA_PRODUCT` function. This makes the data product discoverable and accessible from the SAP® BDC side. The `OPERATE` privilege on the connector is required to call `SYSTEM$SAP_PUBLISH_DATA_PRODUCT`. ```sql SELECT SYSTEM$SAP_PUBLISH_DATA_PRODUCT( '', '', '', '' ); ``` For example: ```sql SELECT SYSTEM$SAP_PUBLISH_DATA_PRODUCT( 'my_db.my_schema.my_sap_connector', 'my_share', '{ "title": "Airline Data Product", "shortDescription": "Airline dimension data from Snowflake.", "description": "Contains airline identifiers and attributes published from Snowflake to SAP BDC." }', '{ "csnInteropEffective": "1.2", "$version": "2.0", "meta": { "document": { "version": "1.2.3", "doc": "This is a minimal CSN example document." } }, "definitions": { "AirlineService": { "kind": "service", "doc": "This is describing the service that exposes the CDS entities through an API." }, "AirlineService.Airline": { "kind": "entity", "doc": "Human readable description of the entity, in **markdown**.", "@EndUserText.label": "Airline", "@ObjectModel.modelingPattern": { "#": "ANALYTICAL_DIMENSION" }, "elements": { "AirlineID": { "doc": "Human readable description of the element, in **markdown**.", "key": true, "type": "cds.UUID" } } } } }' ); ```
If the function fails to resolve `connector_name` or `snowflake_share_name`, verify that the names use the correct case. Snowflake identifiers are case-sensitive when quoted. For more information, see [](/sql-reference/identifiers-syntax). --- title: PublishAMQP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishamqp.md section: Loading & Unloading Data --- # PublishAMQP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-amqp-nar ## Description Creates an AMQP Message from the contents of a FlowFile and sends the message to an AMQP Exchange. In a typical AMQP exchange model, the message that is sent to the AMQP Exchange will be routed based on the 'Routing Key' to its final destination in the queue (the binding). If due to some misconfiguration the binding between the Exchange, Routing Key and Queue is not set up, the message will have no final destination and will return (i.e., the data will not make it to the queue). If that happens you will see a log in both app-log and bulletin stating to that effect, and the FlowFile will be routed to the 'failure' relationship. ## Tags amqp, message, publish, put, rabbit, send ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PublishChangeDataSnowpipeStreaming 2026.4.28.15 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishchangedatasnowpipestreaming.md section: Loading & Unloading Data --- # PublishChangeDataSnowpipeStreaming 2026.4.28.15 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowpipe-streaming-2-processors-nar ## Description Publishes change data records formatted as Newline Delimited JSON to Snowflake Database Pipes using Snowpipe Streaming High Availability. The processor supports **Concurrency Group** serialization so FlowFiles that share the same group are not processed against the channel concurrently. After data is transferred, the processor waits for the streaming channel to report committed offset tokens (according to **Offset Tracking Resolution** and **Offset Tracking Timeout**) before routing FlowFiles to **success**, **invalid**, or **failure**. It can run when the incoming connection has no FlowFiles so that pending batches finish polling. ## Tags CDC, Change Data Capture, NDJSON, Preview, Snowflake, Snowpipe Streaming ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PublishGCPubSub 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishgcpubsub.md section: Loading & Unloading Data --- # PublishGCPubSub 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Publishes the content of the incoming flowfile to the configured Google Cloud PubSub topic. The processor supports dynamic properties. If any dynamic properties are present, they will be sent along with the message in the form of 'attributes'. ## Tags gcp, google, google-cloud, message, publish, pubsub ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.gcp.pubsub.ConsumeGCPubSub](/user-guide/data-integration/openflow/processors/consumegcpubsub) --- title: PublishJMS 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishjms.md section: Loading & Unloading Data --- # PublishJMS 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-jms-processors-nar ## Description Creates a JMS Message from the contents of a FlowFile and sends it to a JMS Destination (queue or topic) as JMS BytesMessage or TextMessage. FlowFile attributes will be added as JMS headers and/or properties to the outgoing JMS message. ## Tags jms, message, publish, put, send ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Restrictions
## Relationships
## See also - [org.apache.nifi.jms.processors.ConsumeJMS](/user-guide/data-integration/openflow/processors/consumejms) --- title: PublishKafka 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishkafka.md section: Loading & Unloading Data --- # PublishKafka 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-kafka-nar ## Description Sends the contents of a FlowFile as either a message or as individual records to Apache Kafka using the Kafka Producer API. The messages to send may be individual FlowFiles, may be delimited using a user-specified delimiter (such as a new-line), or may be record-oriented data that can be read by the configured Record Reader. The complementary NiFi processor for fetching messages is ConsumeKafka. ## Tags apache, avro, csv, json, kafka, logs, message, openflow, pubsub, put, record, send ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [com.snowflake.openflow.runtime.processors.kafka.ConsumeKafka](/user-guide/data-integration/openflow/processors/consumekafka) --- title: PublishMQTT 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishmqtt.md section: Loading & Unloading Data --- # PublishMQTT 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-mqtt-nar ## Description Publishes a message to an MQTT topic ## Tags IOT, MQTT, publish ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [org.apache.nifi.processors.mqtt.ConsumeMQTT](/user-guide/data-integration/openflow/processors/consumemqtt) --- title: PublishSlack 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishslack.md section: Loading & Unloading Data --- # PublishSlack 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-slack-nar ## Description Posts a message to the specified Slack channel. The content of the message can be either a user-defined message that makes use of Expression Language or the contents of the FlowFile can be sent as the message. If sending a user-defined message, the contents of the FlowFile may also be optionally uploaded as a file attachment. ## Tags chat.postMessage, conversation, publish, send, slack, social media, team, text, unstructured, upload, write ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## Use cases | Send specific text as a message to Slack, optionally including the FlowFile's contents as an attached file. | | ----------------------------------------------------------------------------------------------------------- | | Send the contents of the FlowFile as a message to Slack. | ## Use Cases Involving Other Components | Respond to a Slack message in a thread. | | --------------------------------------- | ## See also - [org.apache.nifi.processors.slack.ConsumeSlack](/user-guide/data-integration/openflow/processors/consumeslack) - [org.apache.nifi.processors.slack.ListenSlack](/user-guide/data-integration/openflow/processors/listenslack) --- title: PublishSnowpipeStreaming 2026.4.28.15 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishsnowpipestreaming.md section: Loading & Unloading Data --- # PublishSnowpipeStreaming 2026.4.28.15 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowpipe-streaming-2-processors-nar ## Description Publishes records formatted as Newline Delimited JSON to Snowflake Database Pipes using Snowpipe Streaming High Availability. After data is transferred, the processor waits for the streaming channel to report committed offset tokens (according to **Offset Tracking Resolution** and **Offset Tracking Timeout**) before routing FlowFiles to **success**, **invalid**, or **failure**. It can run when the incoming connection has no FlowFiles so that pending batches finish polling. ## Tags NDJSON, Preview, Snowflake, Snowpipe Streaming ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PutAzureBlobStorage_v12 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putazureblobstorage_v12.md section: Loading & Unloading Data --- # PutAzureBlobStorage_v12 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Puts content into a blob on Azure Blob Storage. The processor uses Azure Blob Storage client library v12. ## Tags azure, blob, cloud, microsoft, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.azure.storage.CopyAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/copyazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.DeleteAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/deleteazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.FetchAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/fetchazureblobstorage_v12) - [org.apache.nifi.processors.azure.storage.ListAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/listazureblobstorage_v12) --- title: PutAzureCosmosDBRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putazurecosmosdbrecord.md section: Loading & Unloading Data --- # PutAzureCosmosDBRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description This processor is a record-aware processor for inserting data into Cosmos DB with Core SQL API. It uses a configured record reader and schema to read an incoming record set from the body of a Flowfile and then inserts those records into a configured Cosmos DB Container. ## Tags azure, cosmos, insert, put, record ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PutAzureDataExplorer 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putazuredataexplorer.md section: Loading & Unloading Data --- # PutAzureDataExplorer 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Acts as an Azure Data Explorer sink which sends FlowFiles to the provided endpoint. Data can be sent through queued ingestion or streaming ingestion to the Azure Data Explorer cluster. ## Tags ADX, Azure, Data, Explorer, Kusto ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PutAzureDataLakeStorage 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putazuredatalakestorage.md section: Loading & Unloading Data --- # PutAzureDataLakeStorage 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Writes the contents of a FlowFile as a file on Azure Data Lake Storage Gen 2 ## Tags adlsgen2, azure, cloud, datalake, microsoft, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.azure.storage.DeleteAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/deleteazuredatalakestorage) - [org.apache.nifi.processors.azure.storage.FetchAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/fetchazuredatalakestorage) - [org.apache.nifi.processors.azure.storage.ListAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/listazuredatalakestorage) --- title: PutAzureEventHub 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putazureeventhub.md section: Loading & Unloading Data --- # PutAzureEventHub 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Send FlowFile contents to Azure Event Hubs ## Tags azure, cloud, eventhub, events, microsoft, streaming, streams ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PutAzureQueueStorage_v12 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putazurequeuestorage_v12.md section: Loading & Unloading Data --- # PutAzureQueueStorage_v12 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Writes the content of the incoming FlowFiles to the configured Azure Queue Storage. ## Tags azure, cloud, enqueue, microsoft, queue, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [org.apache.nifi.processors.azure.storage.queue.GetAzureQueueStorage_v12](/user-guide/data-integration/openflow/processors/getazurequeuestorage_v12) --- title: PutBigQuery 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putbigquery.md section: Loading & Unloading Data --- # PutBigQuery 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Writes the contents of a FlowFile to a Google BigQuery table. The processor is record based so the schema that is used is driven by the RecordReader. Attributes that are not matched to the target schema are skipped. Exactly once delivery semantics are achieved via stream offsets. ## Tags bigquery, bq, google, google cloud ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PutBoxFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putboxfile.md section: Loading & Unloading Data --- # PutBoxFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Puts content to a Box folder. ## Tags box, put, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) --- title: PutCloudWatchMetric 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putcloudwatchmetric.md section: Loading & Unloading Data --- # PutCloudWatchMetric 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Publishes metrics to Amazon CloudWatch. Metric can be either a single value, or a StatisticSet comprised of minimum, maximum, sum and sample count. ## Tags amazon, aws, cloudwatch, metrics, publish, put ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PutDatabaseRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdatabaserecord.md section: Loading & Unloading Data --- # PutDatabaseRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL statements and executed as a single transaction. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is routed to success. The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute. IMPORTANT: If the Statement Type is UPDATE, then the incoming records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing (if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys exist). ## Tags database, delete, insert, jdbc, put, record, sql, update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## Use cases | Insert records into a database | | ------------------------------ | --- title: PutDatabricksSQL 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdatabrickssql.md section: Loading & Unloading Data --- # PutDatabricksSQL 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-databricks-processors-nar ## Description Submit a SQL Execution using Databricks REST API then write the JSON response to FlowFile Content. For high performance SELECT or INSERT queries use ExecuteSQL instead. ## Tags databricks, openflow, sql ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PutDBFSFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdbfsfile.md section: Loading & Unloading Data --- # PutDBFSFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-databricks-processors-nar ## Description Write FlowFile content to DBFS. ## Tags databricks, dbfs, openflow ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PutDistributedMapCache 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdistributedmapcache.md section: Loading & Unloading Data --- # PutDistributedMapCache 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Gets the content of a FlowFile and puts it to a distributed map cache, using a cache key computed from FlowFile attributes. If the cache already contains the entry and the cache update strategy is 'keep original' the entry is not replaced.' ## Tags cache, distributed, map, put ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.FetchDistributedMapCache](/user-guide/data-integration/openflow/processors/fetchdistributedmapcache) --- title: PutDropbox 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdropbox.md section: Loading & Unloading Data --- # PutDropbox 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-dropbox-processors-nar ## Description Puts content to a Dropbox folder. ## Tags dropbox, put, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.dropbox.FetchDropbox](/user-guide/data-integration/openflow/processors/fetchdropbox) - [org.apache.nifi.processors.dropbox.ListDropbox](/user-guide/data-integration/openflow/processors/listdropbox) --- title: PutDynamoDB 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdynamodb.md section: Loading & Unloading Data --- # PutDynamoDB 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Puts a document from DynamoDB based on hash and range key. The table can have either hash and range or hash key alone. Currently the keys supported are string and number and value can be json document. In case of hash and range keys both key are required for the operation. The FlowFile content must be JSON. FlowFile content is mapped to the specified Json Document attribute in the DynamoDB item. ## Tags AWS, Amazon, DynamoDB, Insert, Put ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.aws.dynamodb.DeleteDynamoDB](/user-guide/data-integration/openflow/processors/deletedynamodb) - [org.apache.nifi.processors.aws.dynamodb.GetDynamoDB](/user-guide/data-integration/openflow/processors/getdynamodb) - [org.apache.nifi.processors.aws.dynamodb.PutDynamoDBRecord](/user-guide/data-integration/openflow/processors/putdynamodbrecord) --- title: PutDynamoDBRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdynamodbrecord.md section: Loading & Unloading Data --- # PutDynamoDBRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Inserts items into DynamoDB based on record-oriented data. The record fields are mapped into DynamoDB item fields, including partition and sort keys if set. Depending on the number of records the processor might execute the insert in multiple chunks in order to overcome DynamoDB's limitation on batch writing. This might result partially processed FlowFiles in which case the FlowFile will be transferred to the "unprocessed" relationship with the necessary attribute to retry later without duplicating the already executed inserts. ## Tags AWS, Amazon, DynamoDB, Insert, Put, Record ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.aws.dynamodb.DeleteDynamoDB](/user-guide/data-integration/openflow/processors/deletedynamodb) - [org.apache.nifi.processors.aws.dynamodb.GetDynamoDB](/user-guide/data-integration/openflow/processors/getdynamodb) - [org.apache.nifi.processors.aws.dynamodb.PutDynamoDB](/user-guide/data-integration/openflow/processors/putdynamodb) --- title: PutElasticsearchJson 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putelasticsearchjson.md section: Loading & Unloading Data --- # PutElasticsearchJson 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-elasticsearch-restapi-nar ## Description An Elasticsearch put processor that uses the official Elastic REST client libraries. Each FlowFile is treated as a document to be sent to the Elasticsearch _bulk API. Multiple FlowFiles can be batched together into each Request sent to Elasticsearch. ## Tags elasticsearch, elasticsearch7, elasticsearch8, elasticsearch9, index, json, put ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.elasticsearch.PutElasticsearchRecord](/user-guide/data-integration/openflow/processors/putelasticsearchrecord) --- title: PutElasticsearchRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putelasticsearchrecord.md section: Loading & Unloading Data --- # PutElasticsearchRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-elasticsearch-restapi-nar ## Description A record-aware Elasticsearch put processor that uses the official Elastic REST client libraries. Each Record within the FlowFile is converted into a document to be sent to the Elasticsearch _bulk APi. Multiple documents can be batched into each Request sent to Elasticsearch. Each document's Bulk operation can be configured using Record Path expressions. ## Tags elasticsearch, elasticsearch7, elasticsearch8, elasticsearch9, index, json, put, record ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.elasticsearch.PutElasticsearchJson](/user-guide/data-integration/openflow/processors/putelasticsearchjson) --- title: PutEmail 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putemail.md section: Loading & Unloading Data --- # PutEmail 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Sends an e-mail to configured recipients for each incoming FlowFile ## Tags email, notify, put, smtp ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties true ## Properties
## Relationships
--- title: PutFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putfile.md section: Loading & Unloading Data --- # PutFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Writes the contents of a FlowFile to the local file system ## Tags archive, copy, files, filesystem, local, put ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Restrictions
## Relationships
## See also - [org.apache.nifi.processors.standard.FetchFile](/user-guide/data-integration/openflow/processors/fetchfile) - [org.apache.nifi.processors.standard.GetFile](/user-guide/data-integration/openflow/processors/getfile) --- title: PutFTP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putftp.md section: Loading & Unloading Data --- # PutFTP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Sends FlowFiles to an FTP Server ## Tags archive, copy, egress, files, ftp, put, remote ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [org.apache.nifi.processors.standard.GetFTP](/user-guide/data-integration/openflow/processors/getftp) --- title: PutGCSObject 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putgcsobject.md section: Loading & Unloading Data --- # PutGCSObject 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Writes the contents of a FlowFile as an object in a Google Cloud Storage. ## Tags archive, gcs, google, google cloud, put ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.gcp.storage.DeleteGCSObject](/user-guide/data-integration/openflow/processors/deletegcsobject) - [org.apache.nifi.processors.gcp.storage.FetchGCSObject](/user-guide/data-integration/openflow/processors/fetchgcsobject) - [org.apache.nifi.processors.gcp.storage.ListGCSBucket](/user-guide/data-integration/openflow/processors/listgcsbucket) --- title: PutGoogleDrive 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putgoogledrive.md section: Loading & Unloading Data --- # PutGoogleDrive 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Writes the contents of a FlowFile as a file in Google Drive. ## Tags drive, google, put, storage ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.gcp.drive.FetchGoogleDrive](/user-guide/data-integration/openflow/processors/fetchgoogledrive) - [org.apache.nifi.processors.gcp.drive.ListGoogleDrive](/user-guide/data-integration/openflow/processors/listgoogledrive) --- title: PutGridFS 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putgridfs.md section: Loading & Unloading Data --- # PutGridFS 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-mongodb-nar ## Description Writes a file to a GridFS bucket. ## Tags file, gridfs, mongo, put, store ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PutHubSpot 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/puthubspot.md section: Loading & Unloading Data --- # PutHubSpot 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-hubspot-processors-nar ## Description Upsert a HubSpot object. ## Tags Preview, hubspot ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [com.snowflake.openflow.runtime.processors.hubspot.GetHubSpotObject](/user-guide/data-integration/openflow/processors/gethubspotobject) - [com.snowflake.openflow.runtime.processors.hubspot.GetHubSpotSchema](/user-guide/data-integration/openflow/processors/gethubspotschema) - [com.snowflake.openflow.runtime.processors.hubspot.ListArchivedHubSpotData](/user-guide/data-integration/openflow/processors/listarchivedhubspotdata) - [com.snowflake.openflow.runtime.processors.hubspot.ListHubSpotObjects](/user-guide/data-integration/openflow/processors/listhubspotobjects) --- title: PutIcebergTable 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/puticebergtable.md section: Loading & Unloading Data --- # PutIcebergTable 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-iceberg-processors-nar ## Description Store records in Iceberg using configurable Catalog for managing namespaces and tables. ## Tags analytics, iceberg, openflow, parquet, polaris, s3 ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PutKinesisFirehose 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putkinesisfirehose.md section: Loading & Unloading Data --- # PutKinesisFirehose 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Sends the contents to a specified Amazon Kinesis Firehose. In order to send data to firehose, the firehose delivery stream name has to be specified. ## Tags amazon, aws, firehose, kinesis, put, stream ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PutKinesisStream 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putkinesisstream.md section: Loading & Unloading Data --- # PutKinesisStream 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Sends the contents to a specified Amazon Kinesis. In order to send data to Kinesis, the stream name has to be specified. ## Tags amazon, aws, kinesis, put, stream ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.aws.kinesis.stream.ConsumeKinesisStream](/user-guide/data-integration/openflow/processors/consumekinesisstream) --- title: PutLambda 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putlambda.md section: Loading & Unloading Data --- # PutLambda 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Sends the contents to a specified Amazon Lambda Function. The AWS credentials used for authentication must have permissions execute the Lambda function (lambda:InvokeFunction).The FlowFile content must be JSON. ## Tags amazon, aws, lambda, put ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PutMongo 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putmongo.md section: Loading & Unloading Data --- # PutMongo 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-mongodb-nar ## Description Writes the contents of a FlowFile to MongoDB ## Tags insert, mongodb, put, update, write ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PutMongoBulkOperations 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putmongobulkoperations.md section: Loading & Unloading Data --- # PutMongoBulkOperations 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-mongodb-nar ## Description Writes the contents of a FlowFile to MongoDB as bulk-update ## Tags bulk, insert, mongodb, put, update, write ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PutMongoRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putmongorecord.md section: Loading & Unloading Data --- # PutMongoRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-mongodb-nar ## Description This processor is a record-aware processor for inserting/upserting data into MongoDB. It uses a configured record reader and schema to read an incoming record set from the body of a flowfile and then inserts/upserts batches of those records into a configured MongoDB collection. This processor does not support deletes. The number of documents to insert/upsert at a time is controlled by the "Batch Size" configuration property. This value should be set to a reasonable size to ensure that MongoDB is not overloaded with too many operations at once. ## Tags insert, mongodb, put, record, update, upsert ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PutRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putrecord.md section: Loading & Unloading Data --- # PutRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description The PutRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file, and sends them to a destination specified by a Record Destination Service (i.e. record sink). ## Tags put, record, sink ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PutRedisHashRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putredishashrecord.md section: Loading & Unloading Data --- # PutRedisHashRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-redis-nar ## Description Puts record field data into Redis using a specified hash value, which is determined by a RecordPath to a field in each record containing the hash value. The record fields and values are stored as key/value pairs associated by the hash value. NOTE: Neither the evaluated hash value nor any of the field values can be null. If the hash value is null, the FlowFile will be routed to failure. For each of the field values, if the value is null that field will be not set in Redis. ## Tags hash, put, record, redis ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PutS3Object 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/puts3object.md section: Loading & Unloading Data --- # PutS3Object 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Writes the contents of a FlowFile as an S3 Object to an Amazon S3 Bucket. ## Tags AWS, Amazon, Archive, Put, S3 ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.aws.s3.CopyS3Object](/user-guide/data-integration/openflow/processors/copys3object) - [org.apache.nifi.processors.aws.s3.DeleteS3Object](/user-guide/data-integration/openflow/processors/deletes3object) - [org.apache.nifi.processors.aws.s3.FetchS3Object](/user-guide/data-integration/openflow/processors/fetchs3object) - [org.apache.nifi.processors.aws.s3.GetS3ObjectMetadata](/user-guide/data-integration/openflow/processors/gets3objectmetadata) - [org.apache.nifi.processors.aws.s3.GetS3ObjectTags](/user-guide/data-integration/openflow/processors/gets3objecttags) - [org.apache.nifi.processors.aws.s3.ListS3](/user-guide/data-integration/openflow/processors/lists3) - [org.apache.nifi.processors.aws.s3.TagS3Object](/user-guide/data-integration/openflow/processors/tags3object) --- title: PutSalesforceObject 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsalesforceobject.md section: Loading & Unloading Data --- # PutSalesforceObject 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-salesforce-nar ## Description Creates new records for the specified Salesforce sObject. The type of the Salesforce object must be set in the input flowfile 's' objectType' attribute. This processor cannot update existing records. ## Tags put, salesforce, sobject ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.salesforce.QuerySalesforceObject](/user-guide/data-integration/openflow/processors/querysalesforceobject) --- title: PutSFTP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsftp.md section: Loading & Unloading Data --- # PutSFTP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Sends FlowFiles to an SFTP Server ## Tags archive, copy, egress, files, put, remote, sftp ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [org.apache.nifi.processors.standard.GetSFTP](/user-guide/data-integration/openflow/processors/getsftp) --- title: PutSmbFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsmbfile.md section: Loading & Unloading Data --- # PutSmbFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-smb-nar ## Description Writes the contents of a FlowFile to a samba network location. Use this processor instead of a cifs mounts if share access control is important. Configure the Hostname, Share and Directory accordingly: \[Hostname][Share][pathtoDirectory] ## Tags samba, smb, cifs, files, put ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [org.apache.nifi.processors.smb.FetchSmb](/user-guide/data-integration/openflow/processors/fetchsmb) - [org.apache.nifi.processors.smb.GetSmbFile](/user-guide/data-integration/openflow/processors/getsmbfile) - [org.apache.nifi.processors.smb.ListSmb](/user-guide/data-integration/openflow/processors/listsmb) --- title: PutSnowflakeInternalStageFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsnowflakeinternalstagefile.md section: Loading & Unloading Data --- # PutSnowflakeInternalStageFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowflake-processors-nar ## Description Puts files into a Snowflake internal stage. The internal stage must be created in the Snowflake account beforehand. ## Tags connection, database, jdbc, openflow, snowflake, snowpipe ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PutSnowpipeStreaming 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsnowpipestreaming.md section: Loading & Unloading Data --- # PutSnowpipeStreaming 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowpipe-processors-nar ## Description Streams records into a Snowflake table. The table must be created in the Snowflake account beforehand. ## Tags connection, database, jdbc, openflow, snowflake, snowpipe streaming ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Use cases | Write record-oriented data to a Snowflake table as fast as possible, accepting the possible of occasional duplicates. | | --------------------------------------------------------------------------------------------------------------------- | --- title: PutSnowpipeStreaming2 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsnowpipestreaming2.md section: Loading & Unloading Data --- # PutSnowpipeStreaming2 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowpipe-streaming-2-processors-nar ## Description Send Records formatted as Newline Delimited JSON to Snowflake Database Pipes using Snowpipe Streaming Version 2. ## Tags NDJSON, Preview, Snowflake, Snowpipe Streaming ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PutSNS 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsns.md section: Loading & Unloading Data --- # PutSNS 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Sends the content of a FlowFile as a notification to the Amazon Simple Notification Service ## Tags amazon, aws, publish, pubsub, put, sns, topic ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [org.apache.nifi.processors.aws.sqs.GetSQS](/user-guide/data-integration/openflow/processors/getsqs) - [org.apache.nifi.processors.aws.sqs.PutSQS](/user-guide/data-integration/openflow/processors/putsqs) --- title: PutSplunk 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsplunk.md section: Loading & Unloading Data --- # PutSplunk 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-splunk-nar ## Description Sends logs to Splunk Enterprise over TCP, TCP + TLS/SSL, or UDP. If a Message Delimiter is provided, then this processor will read messages from the incoming FlowFile based on the delimiter, and send each message to Splunk. If a Message Delimiter is not provided then the content of the FlowFile will be sent directly to Splunk as if it were a single message. ## Tags logs, splunk, tcp, udp ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: PutSplunkHTTP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsplunkhttp.md section: Loading & Unloading Data --- # PutSplunkHTTP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-splunk-nar ## Description Sends flow file content to the specified Splunk server over HTTP or HTTPS. Supports HEC Index Acknowledgement. ## Tags http, logs, splunk ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus](/user-guide/data-integration/openflow/processors/querysplunkindexingstatus) --- title: PutSQL 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsql.md section: Loading & Unloading Data --- # PutSQL 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Executes a SQL UPDATE or INSERT command. The content of an incoming FlowFile is expected to be the SQL command to execute. The SQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args. N.type and sql.args. N.value, where N is a positive integer. The sql.args. N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format. ## Tags database, insert, put, rdbms, relational, sql, update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PutSQS 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsqs.md section: Loading & Unloading Data --- # PutSQS 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Publishes a message to an Amazon Simple Queuing Service Queue ## Tags AWS, Amazon, Publish, Put, Queue, SQS ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [org.apache.nifi.processors.aws.sqs.DeleteSQS](/user-guide/data-integration/openflow/processors/deletesqs) - [org.apache.nifi.processors.aws.sqs.GetSQS](/user-guide/data-integration/openflow/processors/getsqs) --- title: PutSyslog 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsyslog.md section: Loading & Unloading Data --- # PutSyslog 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Sends Syslog messages to a given host and port over TCP or UDP. Messages are constructed from the "Message ___" properties of the processor which can use expression language to generate messages from incoming FlowFiles. The properties are used to construct messages of the form: (<PRIORITY>)(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is optional. The constructed messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The timestamp can be an RFC5424 timestamp with a format of "yyyy-MM-dd 'T'HH:mm:ss. S 'Z'" or "yyyy-MM-dd 'T'HH:mm:ss. S+hh:mm", or it can be an RFC3164 timestamp with a format of "MMM d HH:mm:ss". If a message is constructed that does not form a valid Syslog message according to the above description, then it is routed to the invalid relationship. Valid messages are sent to the Syslog server and successes are routed to the success relationship, failures routed to the failure relationship. ## Tags logs, put, syslog, tcp, udp ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [org.apache.nifi.processors.standard.ListenSyslog](/user-guide/data-integration/openflow/processors/listensyslog) - [org.apache.nifi.processors.standard.ParseSyslog](/user-guide/data-integration/openflow/processors/parsesyslog) --- title: PutTCP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/puttcp.md section: Loading & Unloading Data --- # PutTCP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Sends serialized FlowFiles or Records over TCP to a configurable destination with optional support for TLS ## Tags egress, put, remote, tcp ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.ListenTCP](/user-guide/data-integration/openflow/processors/listentcp) - [org.apache.nifi.processors.standard.PutUDP](/user-guide/data-integration/openflow/processors/putudp) --- title: PutUDP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putudp.md section: Loading & Unloading Data --- # PutUDP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description The PutUDP processor receives a FlowFile and packages the FlowFile content into a single UDP datagram packet which is then transmitted to the configured UDP server. The user must ensure that the FlowFile content being fed to this processor is not larger than the maximum size for the underlying UDP transport. The maximum transport size will vary based on the platform setup but is generally just under 64KB. FlowFiles will be marked as failed if their content is larger than the maximum transport size. ## Tags egress, put, remote, udp ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [org.apache.nifi.processors.standard.ListenUDP](/user-guide/data-integration/openflow/processors/listenudp) - [org.apache.nifi.processors.standard.PutTCP](/user-guide/data-integration/openflow/processors/puttcp) --- title: PutUnityCatalogFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putunitycatalogfile.md section: Loading & Unloading Data --- # PutUnityCatalogFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-databricks-processors-nar ## Description Write FlowFile content with max size of 5 GiB to Unity Catalog. ## Tags databricks, openflow, unity catalog ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PutVectaraDocument 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putvectaradocument.md section: Loading & Unloading Data --- # PutVectaraDocument 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-vectara-processors-nar ## Description Generate and upload a JSON document to Vectara's upload endpoint. The input text can be JSON Object, JSON Array, or JSONL format. ## Tags ai, llm, openflow, rag, vectara ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Use Cases Involving Other Components | Publish a PDF file to a Vectara corpus. | | --------------------------------------- | ## See also - [com.snowflake.openflow.runtime.processors.vectara.PutVectaraFile](/user-guide/data-integration/openflow/processors/putvectarafile) --- title: PutVectaraFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putvectarafile.md section: Loading & Unloading Data --- # PutVectaraFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-vectara-processors-nar ## Description Upload a FlowFile content to Vectara's index endpoint. Document filter attributes and metadata attributes can be set by referencing FlowFile attributes. ## Tags ai, llm, openflow, rag, vectara ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [com.snowflake.openflow.runtime.processors.vectara.PutVectaraDocument](/user-guide/data-integration/openflow/processors/putvectaradocument) --- title: PutWebSocket 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putwebsocket.md section: Loading & Unloading Data --- # PutWebSocket 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-websocket-processors-nar ## Description Sends messages to a WebSocket remote endpoint using a WebSocket session that is established by either ListenWebSocket or ConnectWebSocket. ## Tags WebSocket, publish, send ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: PutZendeskTicket 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putzendeskticket.md section: Loading & Unloading Data --- # PutZendeskTicket 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-zendesk-nar ## Description Create Zendesk tickets using the Zendesk API. ## Tags zendesk, ticket ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: QueryAzureDataExplorer 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/queryazuredataexplorer.md section: Loading & Unloading Data --- # QueryAzureDataExplorer 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-azure-nar ## Description Query Azure Data Explorer and stream JSON results to output FlowFiles ## Tags ADX, Azure, Data, Explorer, Kusto ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: QueryDatabaseTable 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/querydatabasetable.md section: Loading & Unloading Data --- # QueryDatabaseTable 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. Query result will be converted to Avro format. Expression Language is supported for several properties, but no incoming connections are permitted. The Environment/System properties may be used to provide values for any property containing Expression Language. If it is desired to leverage flow file attributes to perform these queries, the GenerateTableFetch and/or ExecuteSQL processors can be used for this purpose. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer or cron expression, using the standard scheduling methods. This processor is intended to be run on the Primary Node only. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected. ## Tags database, jdbc, query, select, sql ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.ExecuteSQL](/user-guide/data-integration/openflow/processors/executesql) - [org.apache.nifi.processors.standard.GenerateTableFetch](/user-guide/data-integration/openflow/processors/generatetablefetch) --- title: QueryDatabaseTableRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/querydatabasetablerecord.md section: Loading & Unloading Data --- # QueryDatabaseTableRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. Query result will be converted to the format specified by the record writer. Expression Language is supported for several properties, but no incoming connections are permitted. The Environment/System properties may be used to provide values for any property containing Expression Language. If it is desired to leverage flow file attributes to perform these queries, the GenerateTableFetch and/or ExecuteSQL processors can be used for this purpose. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer or cron expression, using the standard scheduling methods. This processor is intended to be run on the Primary Node only. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected. ## Tags database, jdbc, query, record, select, sql ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## Use cases | Retrieve all rows from a database table. | | -------------------------------------------------------------------------------------------------------------- | | Perform an incremental load of a single database table, fetching only new rows as they are added to the table. | ## Use Cases Involving Other Components | Perform an incremental load of multiple database tables, fetching only new rows as they are added to the tables. | | ---------------------------------------------------------------------------------------------------------------- | ## See also - [org.apache.nifi.processors.standard.ExecuteSQL](/user-guide/data-integration/openflow/processors/executesql) - [org.apache.nifi.processors.standard.GenerateTableFetch](/user-guide/data-integration/openflow/processors/generatetablefetch) --- title: QueryMilvus 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/querymilvus.md section: Loading & Unloading Data --- # QueryMilvus 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-milvus-processors-nar ## Description Queries a given collection in a Milvus database using vectors. Results of query are added to current record under the results record path for each vector searched. ## Tags chatbot, embeddings, gen ai, genai, generative ai, llm, metadata, milvus, openflow, publish, query, search, text, vector ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [com.snowflake.openflow.runtime.processors.milvus.UpsertMilvus](/user-guide/data-integration/openflow/processors/upsertmilvus) --- title: QueryPinecone 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/querypinecone.md section: Loading & Unloading Data --- # QueryPinecone 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-pinecone-nar ## Description Queries Pinecone for vectors that are similar to the input vector, or retrieves a vector by ID. ## Tags chatbot, gen ai, generative ai, llm, openflow, pinecone, query, similarity, vector ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Use Cases Involving Other Components | Query Pinecone for vectors that are similar to some input text | | -------------------------------------------------------------- | --- title: QueryRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/queryrecord.md section: Loading & Unloading Data --- # QueryRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Evaluates one or more SQL queries against the contents of a FlowFile. The result of the SQL query then becomes the content of the output FlowFile. This can be used, for example, for field-specific filtering, transformation, and row-level filtering. Columns can be renamed, simple calculations and aggregations performed, etc. The Processor is configured with a Record Reader Controller Service and a Record Writer service so as to allow flexibility in incoming and outgoing data formats. The Processor must be configured with at least one user-defined property. The name of the Property is the Relationship to route data to, and the value of the Property is a SQL SELECT statement that is used to specify how input data should be transformed/filtered. The SQL statement must be valid ANSI SQL and is powered by Apache Calcite. If the transformation fails, the original FlowFile is routed to the 'failure' relationship. Otherwise, the data selected will be routed to the associated relationship. If the Record Writer chooses to inherit the schema from the Record, it is important to note that the schema that is inherited will be from the ResultSet, rather than the input Record. This allows a single instance of the QueryRecord processor to have multiple queries, each of which returns a different set of columns and aggregations. As a result, though, the schema that is derived will have no schema name, so it is important that the configured Record Writer not attempt to write the Schema Name as an attribute if inheriting the Schema from the Record. See the Processor Usage documentation for more information. ## Tags aggregate, avro, calcite, csv, etl, filter, json, logs, modify, query, record, route, select, sql, text, transform, update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## Use cases | Filter out records based on the values of the records' fields | | ---------------------------------------------------------------------------------------- | | Keep only specific records | | Keep only specific fields in a a Record, where the names of the fields to keep are known | | Route record-oriented data for processing based on its contents | --- title: QuerySalesforceObject 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/querysalesforceobject.md section: Loading & Unloading Data --- # QuerySalesforceObject 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-salesforce-nar ## Description Retrieves records from a Salesforce sObject. Users can add arbitrary filter conditions by setting the 'Custom WHERE Condition' property. The processor can also run a custom query, although record processing is not supported in that case. Supports incremental retrieval: users can define a field in the 'Age Field' property that will be used to determine when the record was created. When this property is set the processor will retrieve new records. Incremental loading and record-based processing are only supported in property-based queries. It 's also possible to define an initial cutoff value for the age, filtering out all older records even for the first run. In case of'Property Based Query 'this processor should run on the Primary Node only. FlowFile attribute' record.count 'indicates how many records were retrieved and written to the output. The processor can accept an optional input FlowFile and reference the FlowFile attributes in the query. When'Include Deleted Records 'is true, the processor will include deleted records (soft-deletes) in the results by using the' queryAll 'API. The'IsDeleted' field will be automatically included in the results when querying deleted records. ## Tags query, salesforce, sobject, soql ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.salesforce.PutSalesforceObject](/user-guide/data-integration/openflow/processors/putsalesforceobject) --- title: QuerySplunkIndexingStatus 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/querysplunkindexingstatus.md section: Loading & Unloading Data --- # QuerySplunkIndexingStatus 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-splunk-nar ## Description Queries Splunk server in order to acquire the status of indexing acknowledgement. ## Tags acknowledgement, http, logs, splunk ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [org.apache.nifi.processors.splunk.PutSplunkHTTP](/user-guide/data-integration/openflow/processors/putsplunkhttp) --- title: ReaderLookup source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/readerlookup.md section: Loading & Unloading Data --- # ReaderLookup This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a RecordReaderFactory that can be used to dynamically select another RecordReaderFactory. This will allow multiple RecordReaderFactories to be defined and registered, and then selected dynamically at runtime by referencing a FlowFile attribute in the Service to Use property. ## Tags lookup, parse, reader, record, row ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: RecordSetWriterLookup source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/recordsetwriterlookup.md section: Loading & Unloading Data --- # RecordSetWriterLookup This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a RecordSetWriterFactory that can be used to dynamically select another RecordSetWriterFactory. This will allow multiple RecordSetWriterFactory's to be defined and registered, and then selected dynamically at runtime by tagging FlowFiles with the attributes and referencing those attributes in the Service to Use property. ## Tags lookup, record, recordset, result, row, serializer, set, writer ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: RecordSinkServiceLookup source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/recordsinkservicelookup.md section: Loading & Unloading Data --- # RecordSinkServiceLookup This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a RecordSinkService that can be used to dynamically select another RecordSinkService. This service requires an attribute named 'record.sink.name' to be passed in when asking for a connection, and will throw an exception if the attribute is missing. The value of 'record.sink.name' will be used to select the RecordSinkService that has been registered with that name. This will allow multiple RecordSinkServices to be defined and registered, and then selected dynamically at runtime by tagging flow files with the appropriate 'record.sink.name' attribute. Note that this controller service is not intended for use in reporting tasks that employ RecordSinkService instances, such as QueryNiFiReportingTask. ## Tags lookup, record, sink ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: RedisConnectionPoolService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/redisconnectionpoolservice.md section: Loading & Unloading Data --- # RedisConnectionPoolService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A service that provides connections to Redis. ## Tags cache, redis ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: RedisDistributedMapCacheClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/redisdistributedmapcacheclientservice.md section: Loading & Unloading Data --- # RedisDistributedMapCacheClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description An implementation of DistributedMapCacheClient that uses Redis as the backing cache. This service relies on the WATCH, MULTI, and EXEC commands in Redis, which are not fully supported when Redis is clustered. As a result, this service can only be used with a Redis Connection Pool that is configured for standalone or sentinel mode. Sentinel mode can be used to provide high-availability configurations. ## Tags cache, distributed, map, redis ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: RemoveFieldRecordReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/removefieldrecordreader.md section: Loading & Unloading Data --- # RemoveFieldRecordReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A wrapper for a RecordReaderFactory that supports filtering out specified fields from NiFi Records. It allows users to specify a list of field names that should be ignored when reading records from the record reader returned from the wrapped RecordReaderFactory. The ignored record fields are specified as dynamic properties. At least one dynamic property must be set. The dynamic property name is used as a description of the field to remove, and the dynamic property value is a RecordPath that identifies the field to be removed. Nested paths are supported. Record paths targeting the root path ("/") are not allowed and will result in a validation error. This service should be used when all of the following criteria are met: - your delegate RecordReaderFactory is configured to infer the schema from the data - you do not have or do not want to define a static schema for the data you 're reading - the fields you set to be ignored should not be serialized to the NiFi content repository for security or performance reasons If any of the above criteria are not met, consider using the RecordFieldRemover processor instead. NOTE: The RecordReader returned by this implementation is hardcoded to drop unknown fields rather than ignoring them. Even when the RecordReader's nextRecord(coerceTypes, dropUnknownFields) method is called with dropUnknownFields set to false, the RecordReader will still drop unknown fields. ## Tags delete, field, filter, reader, record, remove ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: RemoveRecordField 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/removerecordfield.md section: Loading & Unloading Data --- # RemoveRecordField 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Modifies the contents of a FlowFile that contains Record-oriented data (i.e. data that can be read via a RecordReader and written by a RecordWriter) by removing selected fields. This Processor requires that at least one user-defined Property be added. The name of the property is ignored by the processor, but could be a meaningful identifier for the user. The value of the property should indicate a RecordPath that determines the field to be removed. The processor executes the removal in the order in which these properties are added to the processor. Set the "Record Writer" to "Inherit Record Schema" in order to use the updated Record Schema modified when removing Fields. ## Tags avro, csv, delete, freeform, generic, json, record, remove, schema, text, update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## Use cases | Remove one or more fields from a Record, where the names of the fields to remove are known. | | ------------------------------------------------------------------------------------------- | ## See also - [org.apache.nifi.processors.standard.UpdateRecord](/user-guide/data-integration/openflow/processors/updaterecord) --- title: RenameRecordField 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/renamerecordfield.md section: Loading & Unloading Data --- # RenameRecordField 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Renames one or more fields in each Record of a FlowFile. This Processor requires that at least one user-defined Property be added. The name of the Property should indicate a RecordPath that determines the field that should be updated. The value of the Property is the new name to assign to the Record Field that matches the RecordPath. The property value may use Expression Language to reference FlowFile attributes as well as the variables *field.name*, *field.value*, *field.type*, and *record.index* ## Tags avro, csv, field, generic, json, log, logs, record, rename, schema, update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## Use cases | Rename a field in each Record to a specific, known name. | | ---------------------------------------------------------------------------------------- | | Rename a field in each Record to a name that is derived from a FlowFile attribute. | | Rename a field in each Record to a new name that is derived from the current field name. | ## See also - [org.apache.nifi.processors.standard.RemoveRecordField](/user-guide/data-integration/openflow/processors/removerecordfield) - [org.apache.nifi.processors.standard.UpdateRecord](/user-guide/data-integration/openflow/processors/updaterecord) --- title: ReplaceText 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/replacetext.md section: Loading & Unloading Data --- # ReplaceText 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Updates the content of a FlowFile by searching for some textual value in the FlowFile content (via Regular Expression/regex, or literal value) and replacing the section of the content that matches with some alternate value. It can also be used to append or prepend text to the contents of a FlowFile. ## Tags Change, Modify, Regex, Regular Expression, Replace, Text, Update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Use cases | Append text to the end of every line in a FlowFile | | ----------------------------------------------------------------------------------- | | Prepend text to the beginning of every line in a FlowFile | | Replace every occurrence of a literal string in the FlowFile with a different value | | Transform every occurrence of a literal string in a FlowFile | | Completely replace the contents of a FlowFile to a specific text | --- title: ReplaceTextWithMapping 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/replacetextwithmapping.md section: Loading & Unloading Data --- # ReplaceTextWithMapping 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Updates the content of a FlowFile by evaluating a Regular Expression against it and replacing the section of the content that matches the Regular Expression with some alternate value provided in a mapping file. ## Tags Change, Mapping, Modify, Regex, Regular Expression, Replace, Text, Update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: RestLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/restlookupservice.md section: Loading & Unloading Data --- # RestLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Use a REST service to look up values. ## Tags http, json, lookup, rest, xml ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: RetryFlowFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/retryflowfile.md section: Loading & Unloading Data --- # RetryFlowFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description FlowFiles passed to this Processor have a 'Retry Attribute' value checked against a configured 'Maximum Retries' value. If the current attribute value is below the configured maximum, the FlowFile is passed to a retry relationship. The FlowFile may or may not be penalized in that condition. If the FlowFile 's attribute value exceeds the configured maximum, the FlowFile will be passed to a' retries_exceeded 'relationship. WARNING: If the incoming FlowFile has a non-numeric value in the configured'Retry Attribute 'attribute, it will be reset to'1 '. You may choose to fail the FlowFile instead of performing the reset. Additional dynamic properties can be defined for any attributes you wish to add to the FlowFiles transferred to' retries_exceeded'. These attributes support attribute expression language. ## Tags FlowFile, Retry ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: RouteOnAttribute 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/routeonattribute.md section: Loading & Unloading Data --- # RouteOnAttribute 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Routes FlowFiles based on their Attributes using the Attribute Expression Language ## Tags Attribute Expression Language, Expression Language, Regular Expression, attributes, detect, filter, find, regex, regexp, routing, search, string, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## Use cases | Route data to one or more relationships based on its attributes using the NiFi Expression Language. | | --------------------------------------------------------------------------------------------------- | | Keep data only if its attributes meet some criteria, such as its filename ends with .txt. | | Discard or drop a file based on attributes, such as filename. | ## Use Cases Involving Other Components | Route record-oriented data based on whether or not the record's values meet some criteria | | ----------------------------------------------------------------------------------------- | --- title: RouteOnContent 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/routeoncontent.md section: Loading & Unloading Data --- # RouteOnContent 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Applies Regular Expressions to the content of a FlowFile and routes a copy of the FlowFile to each destination whose Regular Expression matches. Regular Expressions are added as User-Defined Properties where the name of the property is the name of the relationship and the value is a Regular Expression to match against the FlowFile content. User-Defined properties do support the Attribute Expression Language, but the results are interpreted as literal values, not Regular Expressions ## Tags content, detect, filter, find, regex, regexp, regular expression, route, search, string, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: RouteText 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/routetext.md section: Loading & Unloading Data --- # RouteText 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Routes textual data based on a set of user-defined rules. Each line in an incoming FlowFile is compared against the values specified by user-defined Properties. The mechanism by which the text is compared to these user-defined properties is defined by the 'Matching Strategy'. The data is then routed according to these rules, routing each line of the text individually. ## Tags Expression Language, Regular Expression, attributes, csv, delimited, detect, filter, find, logs, regex, regexp, routing, search, string, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## Use cases | Drop blank or empty lines from the FlowFile's content. | | -------------------------------------------------------------------------------------------------------------------------------- | | Remove specific lines of text from a file, such as those containing a specific word or having a line length over some threshold. | --- title: RunDatabricksJob 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/rundatabricksjob.md section: Loading & Unloading Data --- # RunDatabricksJob 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-databricks-processors-nar ## Description Triggers a pre-defined Databricks job to run with custom parameters. Job parameters can be set using dynamic properties ## Tags databricks, jobs, openflow ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: RunMongoAggregation 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/runmongoaggregation.md section: Loading & Unloading Data --- # RunMongoAggregation 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-mongodb-nar ## Description A processor that runs an aggregation query whenever a flowfile is received. ## Tags aggregate, aggregation, mongo ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: S3FileResourceService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/s3fileresourceservice.md section: Loading & Unloading Data --- # S3FileResourceService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides an Amazon Web Services (AWS) S3 file resource for other components. ## Tags AWS, Amazon, S3, file, resource ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: SalesforceDataCloudOAuthTokenProvider source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/salesforcedatacloudoauthtokenprovider.md section: Loading & Unloading Data --- # SalesforceDataCloudOAuthTokenProvider This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Retrieves an OAuth2 access token from Salesforce using the configured OAuth2 Access Token Provider and exchanges the token for a Data Cloud API token. The token is then used to authenticate with Salesforce Data Cloud APIs. ## Tags preview, salesforce ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: SampleRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/samplerecord.md section: Loading & Unloading Data --- # SampleRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Samples the records of a FlowFile based on a specified sampling strategy (such as Reservoir Sampling). The resulting FlowFile may be of a fixed number of records (in the case of reservoir-based algorithms) or some subset of the total number of records (in the case of probabilistic sampling), or a deterministic number of records (in the case of interval sampling). ## Tags interval, range, record, reservoir, sample ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: SAP® BDC Connect for Snowflake source: https://docs.snowflake.com/en/user-guide/data-integration/zero-copy/sap-sql/setup-sap-bdc.md section: Loading & Unloading Data --- # SAP® BDC Connect for Snowflake - [](/user-guide/data-integration/zero-copy/about-sap-snowflake) - [](/user-guide/data-integration/zero-copy/sap-sql/setup-tasks) - [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-snowflake) This topic describes the steps to set up an SAP® Business Data Cloud connection for use with an existing Snowflake account. The Snowflake account must be Standard, Enterprise, or Business Critical edition and must be on AWS commercial in a supported region as described in [Supported Cloud Regions](/user-guide/intro-regions). For more information see, [Provisioning SAP Business Data Cloud Connect](https://help.sap.com/docs/business-data-cloud/administering-sap-business-data-cloud/provision-sap-business-data-cloud-connector-for-supported-external-systems). As an SAP® administrator, perform the following steps: 1. Obtain your Snowflake account URL and ensure it follows the format https://orgName-accountName.snowflakecomputing.com. Which should be all lower-case and replace _ (underscore) with - (dash) for RFC compliance. 2. Provision SAP Business Data Cloud Connect as documented here: [Provisioning SAP Business Data Cloud Connect](https://help.sap.com/docs/business-data-cloud/administering-sap-business-data-cloud/provision-sap-business-data-cloud-connector-for-supported-external-systems). 3. Follow steps 1-5 in the wizard 4. In wizard step 6: Configure Parameters: - **External System Instance Identifier**: Enter your Snowflake account URL: https://orgName-accountName.snowflakecomputing.com - **Region**: Select the same region that you used for enabling SAP Business Data Cloud Core. 5. Complete wizard steps 7 and 8. 6. In step 9: Hover over the **View Tenant Notifications** button. A pop-up window opens with an **Invitation Link** that can be used to complete the configuration in Snowflake. 7. Copy the Invitation Link 8. Log into your Snowflake account to complete the remainder of the configuration to create a Zerocopy Connector as described in [](/user-guide/data-integration/zero-copy/sap-sql/setup). ## Next steps In your [SAP for Me](https://me.sap.com/) environment, choose the Customer Landscape tab and, under the Formations tab, choose Include Systems to add the SAP BDC Connect instance to an existing formation. Customers can create additional Zerocopy Connectors in the same Snowflake account and enroll them with the same or different SAP® Business Data Cloud tenant. Each Zerocopy Connector requires a new **Invitation Link** that can be obtained from [SAP for Me](https://me.sap.com/). Each **Invitation Link** can be enrolled only once with SAP® Business Data Cloud. To create a new formation, see [Creating SAP Business Data Cloud Formations](https://help.sap.com/docs/business-data-cloud/administering-sap-business-data-cloud/integrate-sap-business-data-cloud-provisioned-systems?locale=en-US&state=PRODUCTION&version=SHIP). --- title: SAP® BDC Connect for Snowflake Zerocopy Connector — Security and Privileges source: https://docs.snowflake.com/en/user-guide/data-integration/zero-copy/sap-sql/security.md section: Loading & Unloading Data --- # %sapbdc% Zerocopy Connector — Security and Privileges - [](/user-guide/data-integration/zero-copy/sap-sql/setup) - [](/user-guide/data-integration/zero-copy/sap-sql/explore-data-products) This topic describes the privileges required to create and manage a Zerocopy Connector and the catalog-linked databases created from it. ## Access Control Requirements A [role](#label-access-control-overview-roles) used to execute this operation must have the following [privileges](#label-access-control-overview-privileges) at a minimum:
For instructions on creating a custom role with a specified set of privileges, see [](#label-security-custom-role). For general information about roles and privilege grants for performing SQL actions on [securable objects](#label-access-control-securable-objects), see [Overview of Access Control](/user-guide/security-access-control-overview). ## Connector States A Zerocopy Connector transitions through the following states. Understanding the state is important because some operations are only permitted in specific states.
### State Transition Rules - `ALTER ... CONNECT` is permitted when the connector is in `NEW`, `CONNECT_ERROR`, or `DISCONNECTED` state. - `ALTER ... DISCONNECT` is permitted when the connector is in `CONNECTED` or `DISCONNECT_ERROR` state. - Share-back must be disabled before disconnecting. - All catalog-linked databases created from the connector must be dropped before disconnecting. - `DROP ZEROCOPY CONNECTOR` is permitted when the connector is in `NEW`, `CONNECT_ERROR`, `DISCONNECT_ERROR`, or `DISCONNECTED` state. - Catalog-linked databases do not support `UNDROP`. --- title: SAP® Snowflake source: https://docs.snowflake.com/en/user-guide/data-integration/zero-copy/sap-sql/setup-sap-snowflake.md section: Loading & Unloading Data --- # SAP® Snowflake - [](/user-guide/data-integration/zero-copy/about-sap-snowflake) - [](/user-guide/data-integration/zero-copy/sap-sql/setup-tasks) - [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-bdc) This topic describes the steps to configure an instance for %sapsnowflake% for SAP customers without an existing Snowflake account. The SAP® Snowflake account provisioned is the Business Critical edition. As an SAP® administrator, perform the following steps: 1. Sign in to [SAP for Me](https://me.sap.com/) with an S-user ID or login name. 2. From the sidebar menu, choose **Portfolio & products**. 3. In the **My Product Packages** tab, select the **SAP Business Data Cloud** product. 4. Select the **Applications** tab and in the **SAP Snowflake** card, click **Start Provisioning**.
The **Provision SAP® Snowflake** wizard dialog displays and guides you through the provisioning process. 5. In the Provision SAP® Snowflake dialog, configure the following parameters and click **Next**: - **Entitlement System**: Displays the ID of the SAP® Business Data Cloud Entitlement set. Cannot be changed. - **Name**: Enter an appropriate name for the SAP solution. - **Path**: Select or create a resource group under which to group the solution components provisioned for SAP® Business Data Cloud. Create it in the same location selected for the SAP® Business Data Cloud cockpit system. - **Business Type**: Preset to Production. 6. In the **Select Application** step, SAP Snowflake is pre-selected.
The **Configure Parameters** step displays. 7. In the **Configure Parameters** step, configure the following parameters and click **Next**: - **Region**: Choose an available region in the [SAP for Me](https://me.sap.com/) portal. Snowflake recommends choosing the same region as the SAP® Business Data Cloud core for optimal performance. - **Admin email**: Provide the email address of the user to be defined as the administrator of your SAP Snowflake system. This user is responsible for adding additional users and for further configuration. - **Admin First Name**: The first name of the administrator of your SAP Snowflake system. - **Admin Last Name**: The last name of the administrator of your SAP Snowflake system. Provisioning begins and SAP® notifies you that a provisioning request was sent to the specified owner's e-mail address. 8. Click **View in Resources** to view the tenant within the indicated resource group. The **Resources** tab shows the current solution status, which should be `Processing`. 9. Select the tenant below the new solution and click **Details** to view the details of the tenant. 10. On top of the **details** view of the tenant, choose the **View Details** link. A pop-up window opens that provides an activation link to the SAP Snowflake account. If you are the SAP Snowflake system owner, select this link and complete the activation flow in SAP Snowflake (see [Activating the SAP Snowflake Account](https://help.sap.com/docs/business-data-cloud/introducing-sap-snowflake/introducing-sap-snowflake)). If not, share the activation link with the SAP Snowflake owner and ask them to complete the activation flow. 11. After the account has been activated in SAP for Me, the status for your SAP Snowflake solution and tenant changes to `Ready`. In the details view of the SAP Snowflake tenant, in the Path field, select the URL to open SAP Snowflake and log in. ## Next steps The SAP® BDC admin may provision as many SAP® Snowflake accounts as they need with unique account names to help distinguish them. Every SAP® Snowflake account will need to be activated as described in the note below. After activation, the SAP® Snowflake is ready for you to share Data Products from SAP® BDC to SAP® Snowflake. As part of the provisioning process, a Zerocopy Connector called `DEFAULT_SAP_BDC_CONNECTOR` is automatically created under the `CONNECTORS.ZEROCOPY` schema and enrolled with SAP® Business Data Cloud in the SAP® Snowflake account. You are ready to share data products from SAP® BDC and consume them in SAP® Snowflake. For more information, see [](/user-guide/data-integration/zero-copy/sap-sql/explore-data-products). Customers can create additional Zerocopy Connectors in the same SAP® Snowflake account and enroll them with the same or different SAP® Business Data Cloud tenant. Each Zerocopy Connector requires a new Invitation Link that can be obtained from [SAP for Me](https://me.sap.com/). Each Invitation Link can be enrolled only once with SAP® Business Data Cloud. Customers can view the status of provisioning in the **Details** view. After provisioning is complete, the customer can click the Snowflake activation link available in the Details view to activate their SAP® Snowflake account, login, change their username and reset their password, setup MFA, and perform other operations. --- title: ScanAttribute 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/scanattribute.md section: Loading & Unloading Data --- # ScanAttribute 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Scans the specified attributes of FlowFiles, checking to see if any of their values are present within the specified dictionary of terms ## Tags attributes, find, lookup, scan, search, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: ScanContent 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/scancontent.md section: Loading & Unloading Data --- # ScanContent 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Scans the content of FlowFiles for terms that are found in a user-supplied dictionary. If a term is matched, the UTF-8 encoded version of the term will be added to the FlowFile using the 'matching.term' attribute ## Tags aho-corasick, byte sequence, content, dictionary, find, scan, search ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ScriptedFilterRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/scriptedfilterrecord.md section: Loading & Unloading Data --- # ScriptedFilterRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-scripting-nar ## Description This processor provides the ability to filter records out from FlowFiles using the user-provided script. Every record will be evaluated by the script which must return with a boolean value. Records with "true" result will be routed to the "matching" relationship in a batch. Other records will be filtered out. ## Tags filter, groovy, record, script ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Restrictions
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.script.ScriptedPartitionRecord](/user-guide/data-integration/openflow/processors/scriptedpartitionrecord) - [org.apache.nifi.processors.script.ScriptedTransformRecord](/user-guide/data-integration/openflow/processors/scriptedtransformrecord) - [org.apache.nifi.processors.script.ScriptedValidateRecord](/user-guide/data-integration/openflow/processors/scriptedvalidaterecord) --- title: ScriptedLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/scriptedlookupservice.md section: Loading & Unloading Data --- # ScriptedLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Allows the user to provide a scripted LookupService instance in order to enrich records from an incoming flow file. ## Tags groovy, invoke, lookup, record, script ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted ## Restrictions
## System Resource Considerations This component does not specify system resource considerations. --- title: ScriptedPartitionRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/scriptedpartitionrecord.md section: Loading & Unloading Data --- # ScriptedPartitionRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-scripting-nar ## Description Receives Record-oriented data (i.e., data that can be read by the configured Record Reader) and evaluates the user provided script against each record in the incoming flow file. Each record is then grouped with other records sharing the same partition and a FlowFile is created for each groups of records. Two records shares the same partition if the evaluation of the script results the same return value for both. Those will be considered as part of the same partition. ## Tags groovy, group, organize, partition, record, script, segment, split ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Restrictions
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.script.ScriptedFilterRecord](/user-guide/data-integration/openflow/processors/scriptedfilterrecord) - [org.apache.nifi.processors.script.ScriptedTransformRecord](/user-guide/data-integration/openflow/processors/scriptedtransformrecord) - [org.apache.nifi.processors.script.ScriptedValidateRecord](/user-guide/data-integration/openflow/processors/scriptedvalidaterecord) --- title: ScriptedReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/scriptedreader.md section: Loading & Unloading Data --- # ScriptedReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Allows the user to provide a scripted RecordReaderFactory instance in order to read/parse/generate records from an incoming flow file. ## Tags groovy, invoke, record, recordFactory, script ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted ## Restrictions
## System Resource Considerations This component does not specify system resource considerations. --- title: ScriptedRecordSetWriter source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/scriptedrecordsetwriter.md section: Loading & Unloading Data --- # ScriptedRecordSetWriter This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Allows the user to provide a scripted RecordSetWriterFactory instance in order to write records to an outgoing flow file. ## Tags groovy, invoke, record, script, writer ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted ## Restrictions
## System Resource Considerations This component does not specify system resource considerations. --- title: ScriptedRecordSink source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/scriptedrecordsink.md section: Loading & Unloading Data --- # ScriptedRecordSink This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Allows the user to provide a scripted RecordSinkService instance in order to transmit records to the desired target. The script must set a variable 'recordSink' to an implementation of RecordSinkService. ## Tags groovy, invoke, record, record sink, script ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted ## Restrictions
## System Resource Considerations This component does not specify system resource considerations. --- title: ScriptedTransformRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/scriptedtransformrecord.md section: Loading & Unloading Data --- # ScriptedTransformRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-scripting-nar ## Description Provides the ability to evaluate a simple script against each record in an incoming FlowFile. The script may transform the record in some way, filter the record, or fork additional records. See Processor's Additional Details for more information. ## Tags filter, groovy, modify, record, script, transform, update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Restrictions
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.jolt.JoltTransformRecord](/user-guide/data-integration/openflow/processors/jolttransformrecord) - [org.apache.nifi.processors.script.ExecuteScript](/user-guide/data-integration/openflow/processors/executescript) - [org.apache.nifi.processors.standard.LookupRecord](/user-guide/data-integration/openflow/processors/lookuprecord) - [org.apache.nifi.processors.standard.QueryRecord](/user-guide/data-integration/openflow/processors/queryrecord) - [org.apache.nifi.processors.standard.UpdateRecord](/user-guide/data-integration/openflow/processors/updaterecord) --- title: ScriptedValidateRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/scriptedvalidaterecord.md section: Loading & Unloading Data --- # ScriptedValidateRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-scripting-nar ## Description This processor provides the ability to validate records in FlowFiles using the user-provided script. The script is expected to have a record as incoming argument and return with a boolean value. Based on this result, the processor categorizes the records as "valid" or "invalid" and routes them to the respective relationship in batch. Additionally the original FlowFile will be routed to the "original" relationship or in case of unsuccessful processing, to the "failed" relationship. ## Tags groovy, record, script, validate ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Restrictions
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.script.ScriptedFilterRecord](/user-guide/data-integration/openflow/processors/scriptedfilterrecord) - [org.apache.nifi.processors.script.ScriptedPartitionRecord](/user-guide/data-integration/openflow/processors/scriptedpartitionrecord) - [org.apache.nifi.processors.script.ScriptedTransformRecord](/user-guide/data-integration/openflow/processors/scriptedtransformrecord) --- title: SearchElasticsearch 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/searchelasticsearch.md section: Loading & Unloading Data --- # SearchElasticsearch 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-elasticsearch-restapi-nar ## Description A processor that allows the user to repeatedly run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. Search After/Point in Time queries must include a valid "sort" field. The processor will retrieve multiple pages of results until either no more results are available or the Pagination Keep Alive expiration is reached, after which the query will restart with the first page of results being retrieved. ## Tags elasticsearch, elasticsearch7, elasticsearch8, elasticsearch9, json, page, query, scroll, search ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.elasticsearch.ConsumeElasticsearch](/user-guide/data-integration/openflow/processors/consumeelasticsearch) - [org.apache.nifi.processors.elasticsearch.PaginatedJsonQueryElasticsearch](/user-guide/data-integration/openflow/processors/paginatedjsonqueryelasticsearch) --- title: SegmentContent 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/segmentcontent.md section: Loading & Unloading Data --- # SegmentContent 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Segments a FlowFile into multiple smaller segments on byte boundaries. Each segment is given the following attributes: fragment.identifier, fragment.index, fragment.count, segment.original.filename; these attributes can then be used by the MergeContent processor in order to reconstitute the original FlowFile ## Tags segment, split ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.MergeContent](/user-guide/data-integration/openflow/processors/mergecontent) --- title: Set up and access Openflow source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-roles-login.md section: Loading & Unloading Data --- # Set up and access Openflow This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/setup-openflow-byoc) - [](/user-guide/data-integration/openflow/setup-openflow-spcs) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/processors/index) - [](/user-guide/data-integration/openflow/controllers/index) To use Openflow, you must configure roles and permissions in your Snowflake account, and set up a database. This topic describes how to set up the necessary roles and permissions. ## Set up the Openflow admin roles The **Openflow Admin role** is used by a deployment engineer to set up Openflow workflows. A Snowflake administrator adds this role by performing the following steps: 1. Sign in to %sf-web-interface-link%. 2. Open a SQL worksheet. 3. Create a role for the Openflow admin, allowing it the required permissions to manage integrations and compute pools required for deployments. In the SQL below, OPENFLOW_ADMIN is the default name for the Openflow admin, but you can choose any name. ```sql USE ROLE ACCOUNTADMIN; CREATE ROLE IF NOT EXISTS OPENFLOW_ADMIN; GRANT CREATE ROLE ON ACCOUNT TO ROLE OPENFLOW_ADMIN; GRANT CREATE OPENFLOW DATA PLANE INTEGRATION ON ACCOUNT TO ROLE OPENFLOW_ADMIN; GRANT CREATE OPENFLOW RUNTIME INTEGRATION ON ACCOUNT TO ROLE OPENFLOW_ADMIN; ``` 4. Grant the admin role and secondary roles to a user. To prevent issues with login, when you create an Openflow user, Snowflake recommends that you also assign and set default secondary roles to that user. This is helpful because Openflow doesn't allow users with the following roles to log in: ACCOUNTADMIN, ORGADMIN, GLOBALORGADMIN, or SECURITYADMIN. While logged in, Openflow actions can be authorized by any of the authenticated user's roles, not just the default role. Substitute <OPENFLOW_USER> with the appropriate username: ```sql USE ROLE ACCOUNTADMIN; GRANT ROLE OPENFLOW_ADMIN TO USER ; ALTER USER SET DEFAULT_ROLE = OPENFLOW_ADMIN; ALTER USER SET DEFAULT_SECONDARY_ROLES = ('ALL'); ``` ## Accept the Openflow terms of service This step is only required once for your organization. 1. Sign in to Snowflake as a user with the ORGADMIN role. 2. In the navigation menu, select **Ingestion** %raa% **Openflow**. 3. Review the agreement and select **Accept**. ## Start Openflow Log in to Openflow by performing the following steps: 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **Ingestion** %raa% **Openflow**. 3. Select **Launch Openflow**. ### Troubleshooting login issues - If you can log into Snowflake but can't log into Openflow, try the following: - Try changing your role to something other than ACCOUNTADMIN, ORGADMIN, GLOBALORGADMIN, or SECURITYADMIN. - Try adding default secondary roles to the account: ```sql USE ROLE ACCOUNTADMIN; ALTER USER SET DEFAULT_SECONDARY_ROLES = ('ALL'); ``` --- title: Set up Openflow - BYOC source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-byoc.md section: Loading & Unloading Data --- # Set up Openflow - BYOC This feature is not available in the People's Republic of China. Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about-byoc) - [](/user-guide/data-integration/openflow/setup-openflow-byoc-custom-ingress) - [](/user-guide/data-integration/openflow/setup-openflow-byoc-encrypted-volumes) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up Openflow. Setting up Openflow involves the following steps: - [Create a deployment in your cloud](#create-a-deployment-in-your-cloud) - [Create a Runtime environment in your cloud](#create-a-runtime-environment-in-your-cloud) ## Prerequisites The prerequisites to be completed on your Snowflake and AWS accounts are as follows: ### Snowflake account You'll need to first define privileges at the Snowflake account level. 1. Run the following SQL commands to grant the required privileges to the Openflow admin role: ```sql USE ROLE ACCOUNTADMIN; GRANT CREATE OPENFLOW DATA PLANE INTEGRATION ON ACCOUNT TO ROLE $openflow_admin_role; GRANT CREATE OPENFLOW RUNTIME INTEGRATION ON ACCOUNT TO ROLE $openflow_admin_role; ``` The new privileges are assigned to the ACCOUNTADMIN role as part of the default set of privileges, and that role can grant the privileges to a role of their choosing for the Openflow admin role, denoted as $openflow_admin_role in the code. 2. Next, set `default_secondary_roles` to `ALL` for all Openflow users: 1. Sign in to Snowflake with a role that your ACCOUNTADMIN assigned for using Openflow. This may not be any of the following roles: ACCOUNTADMIN, ORGADMIN, GLOBALORGADMIN, or SECURITYADMIN. If you see a blank screen or the error "message: Invalid consent request" when logging into Openflow, change your role to a role that is not one of these listed roles. For more information, see [Prerequisites](#prerequisites). 2. Run the following code, replacing $openflow_user for each Openflow user: ```sql USE ROLE ACCOUNTADMIN; ALTER USER $openflow_user SET DEFAULT_SECONDARY_ROLES = ('ALL'); ``` This setting is required because Openflow actions are authorized by using any of the authenticated user's roles, and not just the default role. #### Deployment integration privileges The deployment integration object represents a set of resources provisioned to deploy one or more Snowflake Openflow runtimes. For organizations bringing their own cloud resources, the deployment integration object represents a managed Kubernetes cluster along with its associated nodes. Users with the CREATE DATA PLANE INTEGRATION privilege on the Snowflake account can create and delete the deployment integration objects. Additional privileges can be defined on deployment integration objects directly to support differentiation of access. You can grant the following privileges on a deployment integration object: - OWNERSHIP: Enables full control over deployment actions objects, including deletion of the deployment. - USAGE: Enables creation of runtime child objects. #### Runtime privileges The runtime object represents a cluster of one or more Snowflake Openflow runtime servers, provisioned to run flow definitions. For Kubernetes deployments, the runtime object represents a stateful set of Snowflake Openflow runtime containers deployed in a namespace, along with supporting components. Users with the OWNERSHIP privilege on the parent deployment integration object and the CREATE RUNTIME INTEGRATION account-level privilege can create runtime integration objects. Additional privileges can be defined on runtime integration objects directly to support differentiation of access. You can grant the following privileges on a runtime integration object: - OWNERSHIP: Enables full control over runtime actions, including deletion of the associated runtime and modification of runtime flow definitions. - USAGE: Enables read access to the deployed runtime for observing health and status, without making any changes. #### Snowflake role A Snowflake role is a Snowflake role that is associated with a specific Openflow runtime and used for the following tasks: - Grant access to Snowflake resources. - Grant access to connector-specific resources Snowflake roles are linked to Openflow Snowflake Managed Token, avoiding the need for customers to create separate service users and key pairs for authentication to Snowflake. <RUNTIMENAME> denotes the name of the associated runtime. To create a Snowflake role: 1. Create the required Snowflake role. ```sql USE ROLE ACCOUNTADMIN; CREATE ROLE IF NOT EXISTS OPENFLOW_RUNTIME_ROLE_ ``` 2. Grant the Snowflake role access to a warehouse. Snowflake recommends using a dedicated warehouse for data ingestion. This warehouse should be used when configuring your connectors for runtimes where you will be using this Snowflake role. ```sql GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE OPENFLOW_RUNTIME_ROLE_; ``` 3. Allow the Snowflake role to use, create or otherwise access Snowflake objects.
Depending on the Openflow connector being created the required underlying objects will vary. The example below is for illustration purposes only.
```sql GRANT USAGE ON DATABASE TO ROLE OPENFLOW_RUNTIME_ROLE_; GRANT USAGE ON SCHEMA TO ROLE OPENFLOW_RUNTIME_ROLE_; ``` 4. Allow the user to use the Snowflake role ```sql GRANT ROLE OPENFLOW_RUNTIME_ROLE_ TO USER ; ``` #### Example for role setup Consider a scenario where the following roles should be set up: - **accountadmin:** Out-of-the box role from Snowflake, which has these two CREATE privileges: - CREATE OPENFLOW DATA PLANE INTEGRATION - CREATE OPENFLOW RUNTIME INTEGRATION - **deployment_manager:** Can create, manage, and delete deployments. - **deployment1_runtime_manager_1:** Can create a runtime only within deployment 1. It can modify and delete a runtime that it created within deployment 1, but not a runtime created by deployment1_runtime_manager_2. - **deployment1_runtime_manager_2:** Can create a runtime only within deployment 1. It can modify and delete a runtime that it created within deployment 1, but not a runtime created by deployment1_runtime_manager_1. - **deployment1_runtime_viewer_1:** Can view a runtime canvas within deployment 1 that was created by deployment1_runtime_manager_1. - **deployment1_runtime_viewer_2:** Can view a runtime canvas within deployment 1 that was created by deployment1_runtime_manager_2. - **deployment2_runtime_manager:** Can create a runtime only within deployment 2. - **deployment2_runtime_viewer:** Can view a runtime canvas within deployment 2. To set up Openflow with these roles, follow these steps: 1. Create new roles and assign the relevant privileges: ```sql use role ACCOUNTADMIN; create role if not exists deployment_manager; create role if not exists deployment1_runtime_manager_1; create role if not exists deployment1_runtime_manager_2; create role if not exists deployment1_runtime_viewer_1; create role if not exists deployment1_runtime_viewer_2; create role if not exists deployment2_runtime_manager; create role if not exists deployment2_runtime_viewer; -- Assign create deployment privilege to roles. (This privilege cannot be granted in Openflow UI.) grant create openflow data plane integration on account to role deployment_manager; -- Assign create runtime privilege to roles. (This privilege cannot be granted in the Control Pane UI.) grant create openflow runtime integration on account to role deployment1_runtime_manager_1; grant create openflow runtime integration on account to role deployment1_runtime_manager_2; grant create openflow runtime integration on account to role deployment2_runtime_manager; -- Grant roles to users. (Repeat this step for each user.) grant role to user ; ``` 2. To create a deployment, follow these steps: 1. Sign in to Snowsight as deployment_manager. 2. In the navigation menu, select **Ingestion** %raa% **Openflow**. 3. To create deployment 1, select **Create a deployment**, and grant the USAGE privilege to deployment1_runtime_manager_1 and deployment1_runtime_manager_2. 4. To create deployment 2, select **Create a deployment**, and grant the USAGE privilege to deployment2_runtime_manager. 3. To create a runtime in deployment 1, follow these steps: 1. Log in as deployment1_runtime_manager_1. 2. Create a runtime as described in the following sections. deployment1_runtime_manager_1 should be able to create runtimes and manage any runtimes it created within this deployment. 3. In the Openflow UI, select deployment1_runtime_viewer_1 and grant it the USAGE privilege. ### AWS account Ensure the following on your AWS account: - You have an AWS account with permissions required to create a CloudFormation stack. - An AWS administrator in your organization can execute CloudFormation script to set up Amazon Elastic Kubernetes Service (EKS) inside a new VPC (created by CloudFormation) or an existing VPC. See [Prerequisites for BYO-VPC (existing VPC)](#prerequisites-for-byo-vpc-existing-vpc). To learn about how the Openflow installation happens in your AWS account and the permissions that are configured by the CloudFormation template, see [Installation process](#installation-process). #### Prerequisites for BYO-VPC (existing VPC) If you want to use an existing VPC and your own subnets, ensure that you have the following: - For Snowflake managed ingress, two public subnets with:
- Different availability zones - At least /27 CIDR ranges with 32 available IPs. - Routes for destination 0.0.0.0/0 and target internet gateway or some other egress routing to the internet. - A tag that allows Openflow to create a load balancer: - Key: `kubernetes.io/role/elb` - Value: `1` - If your public subnets are used by other EKS clusters, a tag that allows Openflow to create a load balancer alongside other load balancers: - Key: `kubernetes.io/cluster/{deployment-key}` - Value: `1` Managing your own ingress eliminates the need for public subnets, but requires additional configuration in your AWS account. For more information, see [](/user-guide/data-integration/openflow/setup-openflow-byoc-custom-ingress).
- Two private subnets with: - Different availability zones - At least /24 CIDR ranges with 255 available IPs. This limits the number and scale of runtimes you can create, so it may be more appropriate to use a larger range for the deployment. - Connectivity to Snowflake and AWS services from Private Subnet 1 where the Openflow deployment runs. - Among many options, you can connect using route tables with a NAT Gateway, a Transit Gateway, or PrivateLink VPC Endpoints. - Without this connectivity, the Openflow deployment will not initialize or set up properly and no infrastructure will be provisioned. - For Snowflake managed ingress, egress connectivity to [LetsEncrypt.org](https://letsencrypt.org), which will provision a TLS certificate. ## Accept the Openflow terms of service This step is only required once for your organization. 1. Sign in to Snowflake as a user with the ORGADMIN role. 2. In the navigation menu, select **Ingestion** %raa% **Openflow**. 3. Accept Openflow terms of services. ## Create a deployment in your cloud ### Configure the deployment in your Snowflake account Sign in to Snowflake with a role that your ACCOUNTADMIN assigned for using Openflow. This may not be any of the following roles: ACCOUNTADMIN, ORGADMIN, GLOBALORGADMIN, or SECURITYADMIN. If you see a blank screen, or the error: "message: Invalid consent request", when logging into Openflow, change your role to a role that is not one of these listed roles. For more information, see [Prerequisites](#prerequisites). 1. In the navigation menu, select **Ingestion** %raa% **Openflow**. 2. Select **Launch Openflow**. 3. In the Openflow UI, select **Create a deployment**. 4. On the **Deployments** tab, select **Create a deployment**. The **Creating a deployment** wizard opens. 5. In the **Prerequisites** step, ensure that you meet all the requirements, and then select **Next**. 6. In the **Deployment location** step, select **Amazon Web Services** as the deployment location, enter a name for your deployment, and then select **Next**. 7. In the **Configuration** step, select one of the following options: - **Managed VPC**: Choose this option if you want your VPC to be managed by Snowflake. - **Bring your own VPC**: Choose this option if you want to use an existing VPC. 1. In the **PrivateLink** step, you can select if you want to establish communication with Snowflake over the private link. Enabling this option requires additional setup in your AWS account. For more information, see [](/user-guide/admin-security-privatelink). - If the **PrivateLink** option is enabled, the **End user authentication over PrivateLink** step displays. - If enabled, browser-based authentication redirects use PrivateLink endpoints. - If disabled, end-user authentication uses public Snowflake URLs. Regardless of this setting, Deployment communications to Snowflake will use PrivateLink. If you access %sf-web-interface% through a PrivateLink URL, ensure it is enabled. If you access %sf-web-interface% through a non-PrivateLink URL, leave it disabled. 2. In the **Custom Ingress** step, you can choose to manage your own ingress configuration for the Openflow deployment, such as specifying custom security groups, load balancer settings, or other network controls. Enabling this option requires additional setup in your AWS account. For more information, see [](/user-guide/data-integration/openflow/setup-openflow-byoc-custom-ingress). 3. Select **Create Deployment**. 4. Once your deployment is configured, a dialog box appears that lets you download the CloudFormation template to complete the setup process in your AWS account. Download this template. Note that Openflow doesn't support modifying the CloudFormation template. Don't modify any values after downloading the template, other than choosing drop-down options. 5. (Optional) To encrypt EBS volumes for your Openflow BYOC deployment, see [](/user-guide/data-integration/openflow/setup-openflow-byoc-encrypted-volumes). ### Apply the CloudFormation template in your AWS account 1. In your AWS account, create a new CloudFormation Stack using the template. After the Openflow deployment agent's Amazon Elastic Compute Cloud (EC2) instance is created, it completes the rest of the [Installation process](#installation-process) using infrastructure as code scripts. You can track the installation progress as described in [Track the installation progress](#track-the-installation-progress). If you're using an existing VPC, upon uploading the CloudFormation template, select the respective values in the drop-down lists for the two private subnets and your VPC. ### Create a network rule for Openflow in your Snowflake account This step is required only if you're using network policies to control access to Snowflake. A network policy is a set of rules that control which IP addresses can access your Snowflake account. 1. Navigate to your Snowflake account. 2. Identify the NAT gateway public IP address that was created as part of the CloudFormation stack. You can find this either by searching for NAT Gateway on AWS console or checking the output of the CloudFormation stack. The NAT gateway is responsible for Openflow egress for both the Data Plane Agent (DPA) and EKS. Both DPA and EKS run in the Private Subnet 1 of the installation. 3. Create a network rule for Openflow and add it to your existing network policy. Replace \{$NAT_GATEWAY_PUBLIC_IP\} in the following code snippet with the NAT gateway public IP address that was created as part of the CloudFormation stack. ```sql USE ROLE ACCOUNTADMIN; USE DATABASE {REPLACE_WITH_YOUR_DB_NAME}; CREATE NETWORK RULE allow_openflow_deployment MODE = INGRESS TYPE = IPV4 VALUE_LIST = ('{$NAT_GATEWAY_PUBLIC_IP}/32'); ``` 4. Find your currently active network policy. ```sql SHOW PARAMETERS LIKE 'NETWORK_POLICY' IN ACCOUNT; ``` 5. Copy the value column from the output, and use it to create a network rule: ```sql ALTER NETWORK POLICY {ENTER_YOUR_ACTIVE_NETWORK_POLICY_NAME} ADD ALLOWED_NETWORK_RULE_LIST = (allow_openflow_deployment); ``` ### Set up an event table to log Openflow events (required) Use one of the following options to set up an event table: - Create a new Openflow-specific event table (recommended): ```sql USE ROLE ACCOUNTADMIN; CREATE DATABASE IF NOT EXISTS openflow; USE openflow; CREATE SCHEMA IF NOT EXISTS openflow; USE SCHEMA openflow; GRANT CREATE EVENT TABLE ON SCHEMA openflow.openflow TO ROLE $role_of_deployment_owner; USE ROLE $role_of_deployment_owner; CREATE EVENT TABLE IF NOT EXISTS openflow.openflow.openflow_events; -- Find the Data Plane Integrations SHOW OPENFLOW DATA PLANE INTEGRATIONS; ALTER OPENFLOW DATA PLANE INTEGRATION $openflow_dataplane_name SET EVENT_TABLE = 'openflow.openflow.openflow_events'; ``` - Create an account-specific event table: ```sql USE DATABASE openflow; CREATE SCHEMA IF NOT EXISTS openflow.telemetry; CREATE EVENT TABLE IF NOT EXISTS openflow.telemetry.events; ALTER ACCOUNT SET EVENT_TABLE = openflow.telemetry.events; ``` - Use an existing account-specific event table: ```sql USE ROLE ACCOUNTADMIN; ALTER ACCOUNT SET EVENT_TABLE = 'existing_database.existing_schema.existing_event_table'; ``` ### Verify the deployment 1. In the navigation menu, select **Ingestion** %raa% **Openflow**. Creating a deployment takes about 45 minutes on AWS. Once it's created, you can view your deployment in the Deployments tab of Openflow UI with its state marked as **Active**. ## Create a runtime environment in your cloud 1. In **Openflow Control Plane**, select **Create a runtime**. The **Create Runtime** dialog box appears. 2. From the **Deployment** drop-down list, choose the deployment in which you want to create a runtime. 3. Enter a name for your runtime. 4. Choose a node type from the **Node type** drop-down list. This specifies the size of your nodes. 5. In the **Min/Max node** range selector, select a range. The minimum value specifies the number of nodes that the runtime starts with when idle and the maximum value specifies the number of nodes that the runtime can scale up to, in the event of high data volume or CPU load. 6. Select **Create**. The runtime takes a couple of minutes to get created. Once created, you can view your runtime by navigating to the **Runtimes** tab of the Openflow control plane. Click the runtime to open the Openflow canvas. ## Next step Deploy a connector in a runtime. For a list of connectors available in Openflow, see [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors). ## Networking considerations: Openflow EKS to source systems For BYOC deployments, take note of the following considerations: - Openflow CloudFormation stack creates one VPC with two public subnets and two private subnets. - Public subnets host the AWS Network Load Balancer, which is created later. Private subnets host the EKS Cluster and all of the EC2 instances backing the node groups. Openflow runtimes run within Private subnet 1. - NAT Gateway is currently the egress for both DPA and EKS. Both DPA and EKS run in the Private subnet 1 of the installation. For BYO-VPC deployments, take note of the following considerations: - Openflow requires you to enter the two private subnets that will run Openflow and two public subnets for the AWS Load Balancer. - You have to provide your own egress routing to the Internet from those private subnets, which can be the central NAT Gateway. - No Internet Gateway is created by Openflow. You have to provide appropriate public internet egress routing. The network connectivity generally is as follows: **An Openflow EC2 Instance** (Agent or EKS) runs in a **private subnet** that requires **Route Table entries** to send egress traffic to a **Transit Gateway**, a **PrivateLink VPC Endpoint**, or a **NAT Gateway** connected to an **Internet Gateway**. ### Example: BYOC deployment with a new VPC to communicate with RDS in a different VPC of the same account To enable communication between the Openflow EKS cluster and the RDS instance, you need to create a new security group, with the EKS cluster security group as the source for the inbound rule for RDS connectivity, and attach the group in RDS. 1. Find the EKS cluster security group, navigate to EKS and find your deployment key. You can also find it on the Openflow UI by performing the following steps: 1. Sign in to Openflow. 2. Go to the **Deployments** tab. 3. Select the More options icon next to your deployment. 4. Select **View details**. The value in the field **Key** is your deployment key. 2. After finding the deployment key, you can use it to filter your AWS resources by the key value. 3. Create a new security group that allows access from the Openflow EKS cluster using the relevant database port. For PostgreSQL the default port is 5432. 4. Attach it in RDS as a new security group. If you need to troubleshoot, the [Reachability Analyzer](https://docs.aws.amazon.com/vpc/latest/reachability/getting-started.html) can be useful. It will give you detailed information about what may be blocking connectivity by using tracing capabilities within the AWS platform. See the following AWS docs for accessing DB instances using VPC peering and the associated security group configuration: - [Scenarios for accessing a DB instance in a VPC - Amazon Relational Database Service](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.Scenarios.html#USER_VPC.Scenario3) - [Update your security groups to reference peer security groups - Amazon Virtual Private Cloud](https://docs.aws.amazon.com/vpc/latest/peering/vpc-peering-security-groups.html) ## Configuring PrivateLink in AWS This section explains how to access and configure Openflow using private connectivity. ### Access Openflow over PrivateLink Before starting with the private link configuration, enable PrivateLink for your account as described in [](/user-guide/admin-security-privatelink). 1. Using the `ACCOUNTADMIN` role, call the `SYSTEM$GET_PRIVATELINK_CONFIG` function in your Snowflake account and identify the value for `openflow-privatelink-url`. This is the URL for accessing Openflow over PrivateLink. 2. Create a `CNAME` record in your DNS to resolve the URL value to your VPC endpoint. 3. Confirm that your DNS settings can resolve the value. 4. Confirm that you can connect to Openflow UI using this URL from your browser. ### Configure a new deployment using PrivateLink Snowflake recommends that you use the **Bring your own VPC** version of Openflow deployment and create a VPC endpoint in your VPC before applying the CloudFormation template. Before starting with the PrivateLink configuration, make sure that PrivateLink is enabled for your account as described in [](/user-guide/admin-security-privatelink). Perform the following steps: 1. Retrieve Snowflake's VPC endpoint service ID and Openflow PrivateLink URLs: 1. Run the following SQL command using the `ACCOUNTADMIN` role: ```sql SELECT SYSTEM$GET_PRIVATELINK_CONFIG() ``` 2. From the output, identify and save the values for the following keys: - `privatelink-vpce-id` - `openflow-privatelink-url` - `external-telemetry-privatelink-url` 2. Create a VPC endpoint with parameters: - Type: **PrivateLink Ready partner services** - Service: `privatelink-vpce-id` value obtained in the previous step. - VPC: The VPC where your Openflow deployment will be running. - Subnets: Select two availability zones and private subnets where your Openflow deployment will be running. 3. Set up Route 53 private hosted zone with the following parameters: 1. Domain: `privatelink.snowflakecomputing.com` 2. Type: **Private hosted zone** 3. Select the region and VPC where your Openflow deployment will be running. 4. Add two `CNAME` records for the URLs identified in the first step: 1. For `openflow-privatelink-url` - Record name: `openflow-privatelink-url` value obtained in the first step - Record type: `CNAME` - Value: DNS name of your VPC endpoint 2. For `external-telemetry-privatelink-url` - Record name: `external-telemetry-privatelink-url` value obtained in the first step - Record type: `CNAME` - Value: DNS name of your VPC endpoint 5. Create a dedicated security group for the deployment and enable traffic from the security group to the VPC endpoint: 1. Open the security group associated with your VPC endpoint. 2. Add an inbound rule to the security group that allows **All traffic** from the security group created for your deployment. 6. Create a new deployment and apply the CloudFormation Stack following the instructions in the [Create a deployment in your cloud](#create-a-deployment-in-your-cloud) section and ensure that: - The **PrivateLink** option is enabled. The **End user authentication over PrivateLink** option can be either enabled or disabled. - The security group created for the deployment is used when creating the CloudFormation stack. 7. Wait until the EKS cluster for your deployment is created. To confirm successful creation, navigate to AWS Console under **Elastic Kubernetes Service**. Verify that a cluster identified as `` displays status **ACTIVE**. 8. Allow for traffic from your EKS to the VPC endpoint: 1. Open the security group associated with your VPC endpoint. 2. Add an inbound rule to the security group that allows **All traffic** from the security group assigned to your EKS cluster. The EKS cluster's security group starts with `eks-cluster-sg--`. ### Configuring VPC Gateway Endpoints for S3 in AWS Configuring an AWS VPC Gateway Endpoint for S3 is the primary method to allow an Agent EC2 instance in a private subnet to access the Amazon Linux 2023 repository privately, without requiring an Internet Gateway, a NAT Gateway, or a public IP address on the instance. The Agent EC2 instance uses this repository to install its dependencies, for instance Docker. To configure a VPC Gateway Endpoint for S3: 1. Open a browser to the AWS VPC dashboard. 2. In the navigation pane, select **Endpoints**. 3. Click **Create endpoint** and create a new VPC endpoint with parameters: - Type: **AWS services** - Service: `com.amazonaws..s3` of type `Gateway` - VPC: Select the VPC of your deployment - Route tables: Select the route table(s) that are associated with your private subnet(s) - Policy: Choose **Full access** ## Configuring private deployments Private deployments are a feature that allows you to deploy Openflow in a VPC without the need for public internet ingress or egress. To configure private deployments, you need to choose the following options when creating a new deployment: 1. In the **Deployment location** step, select **Amazon Web Services** as the deployment location. 2. In the **VPC Configuration** step, select **Bring your own VPC** to use an existing VPC. 3. In the **PrivateLink** step, enable the PrivateLink feature. Enabling this option requires additional setup in your AWS account, see [Configuring PrivateLink in AWS](#configuring-privatelink-in-aws). The **End user authentication over PrivateLink** option can be either enabled or disabled. 4. In the **Custom ingress** step, enable the custom ingress feature. Enabling this option requires additional setup in your AWS account. For more information, see [](/user-guide/data-integration/openflow/setup-openflow-byoc-custom-ingress). Private deployments require that your existing VPC is able to access the following domains: - `*.amazonaws.com`, a detailed list of services being accessed includes: - `com.amazonaws.iam` - `com.amazonaws..s3` - `com.amazonaws..ec2` - `com.amazonaws..ecr.api` - `com.amazonaws..ecr.dkr` - `com.amazonaws..secretsmanager` - `com.amazonaws..sts` - `com.amazonaws..eks` - `com.amazonaws..autoscaling` - `*.privatelink.snowflakecomputing.com` - `oidc-eks..api.aws` - `shield.us-east-1.amazonaws.com` ## Installation process Between the CloudFormation stack and the Openflow Agent, there are several coordinated steps that the BYOC deployment installation process manages. The goal is to separate responsibilities between a cold-start that gives organizations an easy way to provide inputs to their BYOC deployment (solved via CloudFormation), and the configuration of the deployment and its core software components that will need to change over time (solved by the Openflow Agent). The deployment Agent facilitates the creation of the Openflow deployment infrastructure and installation of the deployment software components including the deployment service. The deployment agent authenticates with Snowflake System Image Registry to obtain Openflow container images. The steps are as follows: When using BYO-VPC, you will choose a VPC ID and two private subnet IDs from the template, and the CloudFormation stack will use the selected ones rather than creating the resources mentioned in steps 1a, 1b, and 1c. 1. The CloudFormation template creates the following and configures with the AWS permissions mentioned in [Configured AWS permissions](#configured-aws-permissions): 1. One VPC with two public subnets and two private subnets. Public subnets host the AWS Network Load Balancer (created later). Private Subnets host the EKS cluster and all of the EC2 instances backing the NodeGroups. Openflow runtimes run within a private subnet. 2. Internet Gateway for egress from the VPC 3. NAT Gateway for egress from the private subnets 4. AWS Secrets Manager entry for the OIDC configuration input by the user 5. IAM role and instance profile for the Openflow Agent to use from its EC2 instance 6. An EC2 instance for Openflow deployment agent, complete with a UserData script to automatically run the initialization process. This script sets environment variables for the Openflow deployment agent to use, derived from the input CloudFormation parameters. 7. EC2 Instance Connect endpoint for the Openflow deployment agent to upgrade the deployment when needed. - When using BYO-VPC, by default the CloudFormation stack will create an EC2 Instance Connect endpoint. However, this default behavior can be modified. When using the managed VPC option, the CloudFormation stack will always create an EC2 Instance Connect endpoint. - The Instance Connect endpoint can be shared across many VPCs. - If a deployment is deleted, along with deleting the CloudFormation stack, it will also remove the endpoint. This would block access to other BYO-VPC agents if the endpoint is shared. - To add an EC2 Instance Connect endpoint, perform the following steps in your AWS account: 1. In the left navigation, navigate to **VPC** %raa% **Endpoints**. 2. Select **Create Endpoint**. 3. Choose the endpoint type as EC2 Instance Connect Endpoint. 4. Select a VPC. Leave all the security groups clear (not selected) to use the default VPC security group. 5. When selecting a subnet, use the same value as Private Subnet 1 in the CloudFormation parameters. 6. Select **Create**. It takes approximately 5 minutes for the endpoint to be created. 8. S3 Bucket that stores the Terraform state, logs, and outputs for the Openflow Agent 2. The Openflow deployment agent creates the following: 1. An EKS cluster containing: - Node groups - Autoscaling groups - AWS VPC Container Network Interface (CNI) add-on - Amazon Elastic Block Store (EBS) CSI add-on 1. Secrets manager records for PostgreSQL, OAuth credentials, and so on. 2. IAM policies and roles for various K8s service accounts to retrieve their secrets from AWS Secrets Manager. 3. K8s components - Namespaces - Cluster autoscaler - EBS CSI expandable storage - AWS Load Balancer Controller, which creates the publicly accessible Network Load Balancer - Let's Encrypt certificate issuer - Nginx Ingress, configured for Let's Encrypt - Metrics Server - Certificate manager from [Jetstack](http://jetstack.io/) - [External secrets operator](http://external-secrets.io/) - Service accounts for Temporal, deployment service, and OIDC - Secrets stores for Temporal, deployment service, and OIDC - External secrets for Temporal and deployment service. The external secret for OIDC is created and managed by the runtime operator. - PostgreSQL - Temporal - Self-signed certificate issuer and ingress configuration for communications between runtime nodes - Openflow runtime operator - Openflow deployment service By default, all AWS accounts have a quota of five Elastic IP addresses per region, because public (IPv4) internet addresses are a scarce public resource. Snowflake strongly recommends that you use Elastic IP addresses primarily for their ability to remap the address to another instance in the case of instance failure, and to use DNS hostnames for all other inter-node communication. ### Track the installation progress After the CloudFormation stack moves into the CREATE_COMPLETE state, the Openflow agent automatically creates the rest of the infrastructure. There are a few steps that can take 10-15 minutes each, such as: 1. Creating the EKS cluster 2. Installing the EBS CSI add-on to the EKS cluster 3. Creating the RDS PostgreSQL database Status reporting for the Openflow agent is not available yet. In the meantime, you can view logs on the Openflow agent to verify whether the BYOC deployment is ready for runtimes. To do this, perform the following steps: 1. In the EC2 instances list, locate the following two instances: - openflow-agent-\{data-plane-key\}: This is the Openflow agent that you will use to manage runtimes - \{data-plane-key\}-mgmt-group: This is a node in the BYOC deployment's EKS cluster that runs an operator and other core software 2. Right-click on the openflow-agent-\{data-plane-key\} instance and select **Connect**. 3. Switch from **EC2 Instance Connect** to **Connect using EC2 Instance Connect Endpoint**. Leave the default EC2 Instance Connect Endpoint in place. 4. Click **Connect**. A new browser tab or window will appear with a command-line interface. 5. Run the following command to tail the installation logs of the docker image that is configuring your deployment: ```bash journalctl -xe -f -n 100 -u docker ``` 6. Once the installation is complete, you'll see the following output: ```text {timestamp} - app stack applied successfully {timestamp} - All resources applied successfully ``` ### Configured AWS permissions This section lists the AWS permissions configured by Openflow BYOC stack based on the roles. \{key\} represents the deployment key that uniquely identifies cloud resources created and managed by Openflow for a particular deployment. **Administrative user** `cloudformation` and all of the following permissions. **IAM Role: openflow-agent-role-\{key\}** This role is assumed by the Openflow deployment agent EC2 instance through the instance profile `OpenflowAgentEC2InstanceProfile-{key}`. The following Openflow-managed policies are attached to the role. Openflow-managed policy: `openflow-agent-ec2-policy-{key}` ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ec2:CreateSecurityGroup", "ec2:AuthorizeSecurityGroupIngress", "ec2:AuthorizeSecurityGroupEgress", "ec2:RevokeSecurityGroupIngress", "ec2:RevokeSecurityGroupEgress", "ec2:DeleteSecurityGroup" ], "Condition": { "StringLike": { "aws:ResourceTag/Name": [ "{key}-*", "k8s-traffic-{key}-*", "eks-cluster-sg-{key}-*" ] } }, "Resource": "arn:aws:ec2:{Region}:{Account_ID}:security-group/*" }, { "Effect": "Allow", "Action": [ "ec2:CreateLaunchTemplateVersion", "ec2:ModifyLaunchTemplate" ], "Condition": { "StringLike": { "aws:ResourceTag/Name": "{key}-*-group" } }, "Resource": [ "arn:aws:ec2:{Region}:{Account_ID}:launch-template/*" ] } ] } ``` Openflow-managed policy: `openflow-agent-eks-policy-{key}` ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "eks:ListTagsForResource", "eks:TagResource", "eks:UntagResource", "eks:UpdateClusterVersion", "eks:UpdateNodegroupVersion" ], "Resource": [ "arn:aws:eks:{Region}:{Account_ID}:cluster/{key}", "arn:aws:eks:{Region}:{Account_ID}:nodegroup/{key}/*", "arn:aws:eks:{Region}:{Account_ID}:addon/{key}/*" ] }, { "Effect": "Allow", "Action": [ "eks:DescribeAddonVersions" ], "Resource": "*" } ] } ``` Openflow-managed policy: `openflow-agent-iam-policy-{key}` ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "iam:TagRole", "iam:UntagRole" ], "Resource": [ "arn:aws:iam::{Account_ID}:role/{key}-*" ] }, { "Effect": "Allow", "Action": [ "iam:ListOpenIDConnectProviderTags", "iam:TagOpenIDConnectProvider", "iam:UntagOpenIDConnectProvider" ], "Resource": "arn:aws:iam::{Account_ID}:oidc-provider/oidc.eks.{Region}.amazonaws.com/id/*" }, { "Effect": "Allow", "Action": [ "iam:CreatePolicy", "iam:DeletePolicy", "iam:DeletePolicyVersion", "iam:GetPolicy", "iam:GetPolicyVersion", "iam:ListPolicyVersions", "iam:CreatePolicyVersion", "iam:TagPolicy", "iam:UntagPolicy" ], "Resource": [ "arn:aws:iam::{Account_ID}:policy/*-role-policy-{key}" ] }, { "Effect": "Allow", "Action": [ "iam:AttachRolePolicy", "iam:CreateRole", "iam:UpdateRole", "iam:DeleteRole", "iam:DeleteRolePolicy", "iam:DetachRolePolicy", "iam:GetRolePolicy", "iam:ListAttachedRolePolicies", "iam:ListInstanceProfilesForRole", "iam:ListRolePolicies", "iam:PutRolePolicy", "iam:TagRole", "iam:UntagRole", "iam:UpdateAssumeRolePolicy" ], "Resource": [ "arn:aws:iam::{Account_ID}:role/*-role-{key}", "arn:aws:iam::{Account_ID}:role/{key}-*" ] } ] } ``` Openflow-managed policy: `openflow-agent-misc-policy-{key}` ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "secretsmanager:TagResource", "secretsmanager:UntagResource" ], "Resource": "arn:aws:secretsmanager:{Region}:{Account_ID}:secret:*-{key}*" }, { "Effect": "Allow", "Action": [ "ssm:GetParameter" ], "Resource": [ "arn:aws:ssm:{Region}::parameter/aws/service/eks/optimized-ami/*" ] }, { "Effect": "Allow", "Action": [ "elasticloadbalancing:DeleteTargetGroup" ], "Condition": { "StringEquals": { "aws:ResourceTag/elbv2.k8s.aws/cluster": "{key}" } }, "Resource": "arn:aws:elasticloadbalancing:{Region}:{Account_ID}:targetgroup/*/*" }, { "Effect": "Allow", "Action": [ "elasticloadbalancing:DescribeListeners", "elasticloadbalancing:DescribeLoadBalancers", "elasticloadbalancing:DescribeTags", "elasticloadbalancing:DescribeTargetGroups" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "elasticloadbalancing:DeleteLoadBalancer", "elasticloadbalancing:SetSecurityGroups" ], "Resource": "arn:aws:elasticloadbalancing:{Region}:{Account_ID}:loadbalancer/net/runtime-ingress-{key}*" } ] } ``` The following inline policies are also attached to the role. Inline policy: `managed-policy-creation-permission` ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "iam:CreatePolicy", "iam:DeletePolicy", "iam:DeletePolicyVersion", "iam:GetPolicy", "iam:GetPolicyVersion", "iam:ListPolicyVersions", "iam:CreatePolicyVersion", "iam:TagPolicy", "iam:UntagPolicy" ], "Resource": [ "arn:aws:iam::{Account_ID}:policy/openflow-agent-ec2-policy-{key}", "arn:aws:iam::{Account_ID}:policy/openflow-agent-iam-policy-{key}", "arn:aws:iam::{Account_ID}:policy/openflow-agent-eks-policy-{key}", "arn:aws:iam::{Account_ID}:policy/openflow-agent-misc-policy-{key}" ] } ] } ``` Inline policy: `OpenflowAgentPolicy` ```json { "Version": "2012-10-17", "Statement": [ { "Action": [ "autoscaling:DescribeTags", "ec2:DescribeImages", "ec2:DescribeInstances", "ec2:DescribeLaunchTemplates", "ec2:DescribeLaunchTemplateVersions", "ec2:DescribeNetworkInterfaces", "ec2:DescribeSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeTags", "ec2:DescribeVolumes", "ec2:DescribeVpcs", "ec2:DescribeVpcAttribute", "iam:GetRole", "iam:GetOpenIDConnectProvider", "ec2:RunInstances", "ec2:CreateLaunchTemplate", "ec2:CreateSecurityGroup", "ec2:CreateTags", "ec2:DeleteTags" ], "Resource": "*", "Effect": "Allow" }, { "Condition": { "StringLike": { "aws:ResourceTag/Name": [ "{key}-oidc-provider" ] } }, "Action": [ "iam:CreateOpenIDConnectProvider", "iam:DeleteOpenIDConnectProvider", "iam:TagOpenIDConnectProvider", "iam:UpdateOpenIDConnectProviderThumbprint" ], "Resource": "arn:aws:iam::{Account_ID}:oidc-provider/oidc.eks.{Region}.amazonaws.com/id/*", "Effect": "Allow" }, { "Action": [ "iam:CreatePolicy", "iam:DeletePolicy", "iam:DeletePolicyVersion", "iam:GetPolicy", "iam:GetPolicyVersion", "iam:ListPolicyVersions", "iam:CreatePolicyVersion", "iam:TagPolicy", "iam:UntagPolicy" ], "Resource": [ "arn:aws:iam::{Account_ID}:policy/dp-service-role-policy-{key}", "arn:aws:iam::{Account_ID}:policy/oauth2-role-policy-{key}", "arn:aws:iam::{Account_ID}:policy/temporal-service-role-policy-{key}", "arn:aws:iam::{Account_ID}:policy/oidc-service-role-policy-{key}", "arn:aws:iam::{Account_ID}:policy/dps-temporal-role-policy-{key}", "arn:aws:iam::{Account_ID}:policy/dps-postgres-role-policy-{key}", "arn:aws:iam::{Account_ID}:policy/token-refresh-role-policy-{key}" ], "Effect": "Allow" }, { "Action": [ "iam:AttachRolePolicy", "iam:CreateRole", "iam:UpdateRole", "iam:DeleteRole", "iam:DeleteRolePolicy", "iam:DetachRolePolicy", "iam:GetRolePolicy", "iam:ListAttachedRolePolicies", "iam:ListInstanceProfilesForRole", "iam:ListRolePolicies", "iam:PutRolePolicy", "iam:TagRole", "iam:UntagRole", "iam:UpdateAssumeRolePolicy" ], "Resource": [ "arn:aws:iam::{Account_ID}:role/openflow-agent-role-{key}", "arn:aws:iam::{Account_ID}:role/{key}-*", "arn:aws:iam::{Account_ID}:role/dps-temporal-role-{key}", "arn:aws:iam::{Account_ID}:role/dps-postgres-role-{key}", "arn:aws:iam::{Account_ID}:role/dp-service-role-{key}", "arn:aws:iam::{Account_ID}:role/oauth2-role-{key}", "arn:aws:iam::{Account_ID}:role/oidc-service-role-{key}", "arn:aws:iam::{Account_ID}:role/token-refresh-role-{key}" ], "Effect": "Allow" }, { "Action": [ "autoscaling:CreateOrUpdateTags", "autoscaling:DeleteTags" ], "Resource": "arn:aws:autoscaling:{Region}:{Account_ID}:autoScalingGroup:*:autoScalingGroupName/eks-{key}-*", "Effect": "Allow" }, { "Condition": { "StringLike": { "aws:ResourceTag/Name": [ "{key}-EC2SecurityGroup-*", "k8s-traffic-{key}-*", "eks-cluster-sg-{key}-*", "{key}-cluster-sg", "{key}-custom-ingress-default-sg" ] } }, "Action": [ "ec2:AuthorizeSecurityGroupEgress", "ec2:AuthorizeSecurityGroupIngress", "ec2:RevokeSecurityGroupIngress", "ec2:RevokeSecurityGroupEgress", "ec2:DeleteSecurityGroup", "ec2:CreateTags", "ec2:DeleteTags", "ec2:CreateNetworkInterface", "ec2:DeleteNetworkInterface" ], "Resource": "arn:aws:ec2:{Region}:{Account_ID}:security-group/*", "Effect": "Allow" }, { "Condition": { "StringLike": { "aws:ResourceTag/elbv2.k8s.aws/cluster": "{key}" } }, "Action": [ "ec2:AuthorizeSecurityGroupEgress", "ec2:AuthorizeSecurityGroupIngress", "ec2:RevokeSecurityGroupEgress", "ec2:DeleteSecurityGroup", "ec2:CreateTags", "ec2:DeleteTags", "ec2:CreateNetworkInterface", "ec2:DeleteNetworkInterface" ], "Resource": "arn:aws:ec2:{Region}:{Account_ID}:security-group/*", "Effect": "Allow" }, { "Action": [ "ec2:CreateSecurityGroup" ], "Resource": "arn:aws:ec2:{Region}:{Account_ID}:vpc/{VPC_ID}", "Effect": "Allow" }, { "Condition": { "StringLike": { "ec2:ResourceTag/Name": "openflow-agent-{key}" } }, "Action": [ "ec2:AttachNetworkInterface" ], "Resource": "arn:aws:ec2:{Region}:{Account_ID}:instance/*", "Effect": "Allow" }, { "Condition": { "StringLike": { "aws:ResourceTag/Name": "{key}-*-group" } }, "Action": [ "ec2:DeleteLaunchTemplate" ], "Resource": "arn:aws:ec2:{Region}:{Account_ID}:launch-template/*", "Effect": "Allow" }, { "Action": [ "eks:CreateCluster", "eks:CreateAccessEntry", "eks:CreateAddon", "eks:CreateNodegroup", "eks:DeleteCluster", "eks:DescribeCluster", "eks:ListClusters", "eks:ListNodeGroups", "eks:DescribeUpdate", "eks:UpdateClusterConfig", "eks:TagResource" ], "Resource": "arn:aws:eks:{Region}:{Account_ID}:cluster/{key}", "Effect": "Allow" }, { "Action": [ "eks:DescribeAddon", "eks:DescribeAddonVersions", "eks:UpdateAddon", "eks:DeleteAddon", "eks:DescribeUpdate" ], "Resource": "arn:aws:eks:{Region}:{Account_ID}:addon/{key}/*", "Effect": "Allow" }, { "Action": [ "eks:DeleteNodegroup", "eks:DescribeNodegroup", "eks:ListNodegroups", "eks:UpdateNodegroupConfig", "eks:TagResource", "eks:DescribeUpdate" ], "Resource": "arn:aws:eks:{Region}:{Account_ID}:nodegroup/{key}/*", "Effect": "Allow" }, { "Action": [ "s3:CreateBucket", "s3:ListBucket" ], "Resource": "arn:aws:s3:::byoc-tf-state-{key}-{Region}", "Effect": "Allow" }, { "Action": [ "s3:DeleteObject", "s3:GetObject", "s3:PutObject" ], "Resource": "arn:aws:s3:::byoc-tf-state-{key}-{Region}/*", "Effect": "Allow" }, { "Action": [ "secretsmanager:CreateSecret", "secretsmanager:DeleteSecret", "secretsmanager:DescribeSecret", "secretsmanager:GetResourcePolicy", "secretsmanager:GetSecretValue", "secretsmanager:PutSecretValue", "secretsmanager:UpdateSecretVersionStage", "secretsmanager:TagResource", "secretsmanager:UntagResource" ], "Resource": "arn:aws:secretsmanager:{Region}:{Account_ID}:secret:*-{key}*", "Effect": "Allow" }, { "Condition": { "StringLike": { "iam:AWSServiceName": "eks.amazonaws.com" } }, "Action": [ "iam:CreateServiceLinkedRole" ], "Resource": "arn:aws:iam::*:role/aws-service-role/eks.amazonaws.com/AWSServiceRoleForAmazonEKS", "Effect": "Allow" }, { "Condition": { "StringLike": { "iam:AWSServiceName": "eks-nodegroup.amazonaws.com" } }, "Action": [ "iam:CreateServiceLinkedRole" ], "Resource": "arn:aws:iam::*:role/aws-service-role/eks-nodegroup.amazonaws.com/AWSServiceRoleForAmazonEKSNodegroup", "Effect": "Allow" }, { "Action": [ "eks:AssociateAccessPolicy", "eks:ListAssociatedAccessPolicies", "eks:DisassociateAccessPolicy" ], "Resource": "arn:aws:eks:{Region}:{Account_ID}:access-entry/{key}/*", "Effect": "Allow" }, { "Action": "iam:PassRole", "Resource": "*", "Effect": "Allow" }, { "Action": [ "iam:TagRole", "iam:UntagRole" ], "Resource": "arn:aws:iam::{Account_ID}:role/{key}-*", "Effect": "Allow" }, { "Action": [ "iam:UntagOpenIDConnectProvider" ], "Resource": "arn:aws:iam::{Account_ID}:oidc-provider/oidc.eks.{Region}.amazonaws.com/id/*", "Effect": "Allow" }, { "Action": [ "eks:TagResource", "eks:UntagResource", "eks:UpdateNodegroupVersion" ], "Resource": [ "arn:aws:eks:{Region}:{Account_ID}:cluster/{key}", "arn:aws:eks:{Region}:{Account_ID}:nodegroup/{key}/*", "arn:aws:eks:{Region}:{Account_ID}:addon/{key}/*" ], "Effect": "Allow" }, { "Condition": { "StringLike": { "aws:ResourceTag/Name": "{key}-*-group" } }, "Action": [ "ec2:CreateLaunchTemplateVersion", "ec2:ModifyLaunchTemplate" ], "Resource": "arn:aws:ec2:{Region}:{Account_ID}:launch-template/*", "Effect": "Allow" }, { "Action": [ "ssm:GetParameter" ], "Resource": "arn:aws:ssm:{Region}::parameter/aws/service/eks/optimized-ami/*", "Effect": "Allow" } ] } ``` **IAM Role: \{key\}-cluster-ServiceRole** AWS-managed policies: - AmazonEKSClusterPolicy - AmazonEKSVPCResourceController ```json { "Version": "2012-10-17", "Statement": [ { "Action": [ "cloudwatch:PutMetricData" ], "Effect": "Allow", "Resource": "*" } ] } { "Version": "2012-10-17", "Statement": [ { "Action": [ "ec2:DescribeAccountAttributes", "ec2:DescribeAddresses", "ec2:DescribeInternetGateways" ], "Effect": "Allow", "Resource": "*" } ] } ``` **IAM Role: \{key\}-addon-vpc-cni-Role** AWS-managed policies: - AmazonEKS_CNI_Policy **IAM Role: \{key\}-eks-role** AWS-managed policies: - AmazonEBSCSIDriverPolicy - AmazonEC2ContainerRegistryReadOnly - AmazonEKS_CNI_Policy - AmazonEKSWorkerNodePolicy - AmazonSSMManagedInstanceCore - AutoScalingFullAccess - ElasticLoadBalancingFullAccess ```json { "Version": "2012-10-17", "Statement": [ { "Action": [ "ec2:CreateSecurityGroup", "ec2:CreateTags" ], "Effect": "Allow", "Resource": [ "arn:aws:ec2:{Region}:{Account_ID}:security-group/*", "arn:aws:ec2:{Region}:{Account_ID}:vpc/{VPC_ID}" ], "Sid": "CreateOpenflowEKSSecurityGroupAndTags" }, { "Action": [ "ec2:AuthorizeSecurityGroupIngress", "ec2:DeleteSecurityGroup" ], "Condition": { "StringLike": { "aws:ResourceTag/Name": "eks-cluster-sg-{key}-*" } }, "Effect": "Allow", "Resource": [ "arn:aws:ec2:{Region}:{Account_ID}:security-group/*" ], "Sid": "OpenflowManageEKSSecurityGroup" } ] } ``` \{VPC_ID\} represents the identifier of the VPC that was either created by BYOC or used by BYO-VPC. The following roles are used by Kubernetes service accounts to read their secrets from AWS Secrets Manager. Each role has a single Openflow-managed policy attached whose name matches the role name with a `-policy` suffix (for example, the `oidc-service-role-{key}` role uses the `oidc-service-role-policy-{key}` policy). **IAM Role: oidc-service-role-\{key\}** Openflow-managed policy: `oidc-service-role-policy-{key}` ```json { "Statement": [ { "Action": [ "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret", "secretsmanager:GetResourcePolicy", "secretsmanager:ListSecretVersionIds" ], "Effect": "Allow", "Resource": [ "arn:aws:secretsmanager:{Region}:{Account_ID}:secret:oidc-{key}*" ] } ], "Version": "2012-10-17" } ``` **IAM Role: dps-postgres-role-\{key\}** Openflow-managed policy: `dps-postgres-role-policy-{key}` ```json { "Statement": [ { "Action": [ "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret", "secretsmanager:GetResourcePolicy", "secretsmanager:ListSecretVersionIds" ], "Effect": "Allow", "Resource": [ "arn:aws:secretsmanager:{Region}:{Account_ID}:secret:postgres_creds-{key}*" ] } ], "Version": "2012-10-17" } ``` **IAM Role: dps-temporal-role-\{key\}** Openflow-managed policy: `dps-temporal-role-policy-{key}` ```json { "Statement": [ { "Action": [ "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret", "secretsmanager:GetResourcePolicy", "secretsmanager:ListSecretVersionIds" ], "Effect": "Allow", "Resource": [ "arn:aws:secretsmanager:{Region}:{Account_ID}:secret:temporal_creds-{key}*" ] } ], "Version": "2012-10-17" } ``` **IAM Role: dp-service-role-\{key\}** Openflow-managed policy: `dp-service-role-policy-{key}` ```json { "Statement": [ { "Action": [ "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret", "secretsmanager:GetResourcePolicy", "secretsmanager:ListSecretVersionIds" ], "Effect": "Allow", "Resource": [ "arn:aws:secretsmanager:{Region}:{Account_ID}:secret:dps_creds-{key}*", "arn:aws:secretsmanager:{Region}:{Account_ID}:secret:snowflake-oauth2-{key}*" ] } ], "Version": "2012-10-17" } ``` **IAM Role: oauth2-role-\{key\}** Openflow-managed policy: `oauth2-role-policy-{key}` ```json { "Statement": [ { "Action": [ "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret", "secretsmanager:GetResourcePolicy", "secretsmanager:ListSecretVersionIds" ], "Effect": "Allow", "Resource": [ "arn:aws:secretsmanager:{Region}:{Account_ID}:secret:snowflake-oauth2-{key}*" ] } ], "Version": "2012-10-17" } ``` **IAM Role: token-refresh-role-\{key\}** Openflow-managed policy: `token-refresh-role-policy-{key}` ```json { "Statement": [ { "Action": [ "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret", "secretsmanager:GetResourcePolicy", "secretsmanager:ListSecretVersionIds" ], "Effect": "Allow", "Resource": [ "arn:aws:secretsmanager:{Region}:{Account_ID}:secret:snowflake-oauth2-{key}*" ] } ], "Version": "2012-10-17" } ``` **IAM Role: \{key\}-nodegroup-NodeInstanceRole** AWS-managed policies: - AmazonEBSCSIDriverPolicy - AmazonEC2ContainerRegistryReadOnly - AmazonEKS_CNI_Policy - AmazonEKSWorkerNodePolicy - AmazonSSMManagedInstanceCore - AutoScalingFullAccess - ElasticLoadBalancingFullAccess ```json { "Version": "2012-10-17", "Statement": [ { "Action": [ "servicediscovery:CreateService", "servicediscovery:DeleteService", "servicediscovery:GetService", "servicediscovery:GetInstance", "servicediscovery:RegisterInstance", "servicediscovery:DeregisterInstance", "servicediscovery:ListInstances", "servicediscovery:ListNamespaces", "servicediscovery:ListServices", "servicediscovery:GetInstancesHealthStatus", "servicediscovery:UpdateInstanceCustomHealthStatus", "servicediscovery:GetOperation", "route53:GetHealthCheck", "route53:CreateHealthCheck", "route53:UpdateHealthCheck", "route53:ChangeResourceRecordSets", "route53:DeleteHealthCheck", "appmesh:*" ], "Effect": "Allow", "Resource": "*" } ] } { "Version": "2012-10-17", "Statement": [ { "Action": [ "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeAutoScalingInstances", "autoscaling:DescribeLaunchConfigurations", "autoscaling:DescribeScalingActivities", "autoscaling:DescribeTags", "ec2:DescribeInstanceTypes", "ec2:DescribeLaunchTemplateVersions" ], "Effect": "Allow", "Resource": "*" }, { "Action": [ "autoscaling:SetDesiredCapacity", "autoscaling:TerminateInstanceInAutoScalingGroup", "ec2:DescribeImages", "ec2:GetInstanceTypesFromInstanceRequirements", "eks:DescribeNodegroup" ], "Effect": "Allow", "Resource": "*" } ] } { "Version": "2012-10-17", "Statement": [ { "Action": [ "iam:CreateServiceLinkedRole" ], "Condition": { "StringEquals": { "iam:AWSServiceName": "elasticloadbalancing.amazonaws.com" } }, "Effect": "Allow", "Resource": "*" }, { "Action": [ "ec2:DescribeAccountAttributes", "ec2:DescribeAddresses", "ec2:DescribeAvailabilityZones", "ec2:DescribeInternetGateways", "ec2:DescribeVpcs", "ec2:DescribeVpcPeeringConnections", "ec2:DescribeSubnets", "ec2:DescribeSecurityGroups", "ec2:DescribeInstances", "ec2:DescribeNetworkInterfaces", "ec2:DescribeTags", "ec2:GetCoipPoolUsage", "ec2:DescribeCoipPools", "elasticloadbalancing:DescribeLoadBalancers", "elasticloadbalancing:DescribeLoadBalancerAttributes", "elasticloadbalancing:DescribeListeners", "elasticloadbalancing:DescribeListenerCertificates", "elasticloadbalancing:DescribeSSLPolicies", "elasticloadbalancing:DescribeRules", "elasticloadbalancing:DescribeTargetGroups", "elasticloadbalancing:DescribeTargetGroupAttributes", "elasticloadbalancing:DescribeTargetHealth", "elasticloadbalancing:DescribeTags" ], "Effect": "Allow", "Resource": "*" }, { "Action": [ "cognito-idp:DescribeUserPoolClient", "acm:ListCertificates", "acm:DescribeCertificate", "iam:ListServerCertificates", "iam:GetServerCertificate", "waf-regional:GetWebACL", "waf-regional:GetWebACLForResource", "waf-regional:AssociateWebACL", "waf-regional:DisassociateWebACL", "wafv2:GetWebACL", "wafv2:GetWebACLForResource", "wafv2:AssociateWebACL", "wafv2:DisassociateWebACL", "shield:GetSubscriptionState", "shield:DescribeProtection", "shield:CreateProtection", "shield:DeleteProtection" ], "Effect": "Allow", "Resource": "*" }, { "Action": [ "ec2:AuthorizeSecurityGroupIngress", "ec2:RevokeSecurityGroupIngress" ], "Effect": "Allow", "Resource": "*" }, { "Action": [ "ec2:CreateSecurityGroup" ], "Effect": "Allow", "Resource": "*" }, { "Action": [ "ec2:CreateTags" ], "Condition": { "Null": { "aws:RequestTag/elbv2.k8s.aws/cluster": "false" }, "StringEquals": { "ec2:CreateAction": "CreateSecurityGroup" } }, "Effect": "Allow", "Resource": "arn:aws:ec2:*:*:security-group/*" }, { "Action": [ "ec2:CreateTags", "ec2:DeleteTags" ], "Condition": { "Null": { "aws:RequestTag/elbv2.k8s.aws/cluster": "true", "aws:ResourceTag/elbv2.k8s.aws/cluster": "false" } }, "Effect": "Allow", "Resource": "arn:aws:ec2:*:*:security-group/*" }, { "Action": [ "ec2:AuthorizeSecurityGroupIngress", "ec2:RevokeSecurityGroupIngress", "ec2:DeleteSecurityGroup" ], "Condition": { "Null": { "aws:ResourceTag/elbv2.k8s.aws/cluster": "false" } }, "Effect": "Allow", "Resource": "*" }, { "Action": [ "elasticloadbalancing:CreateLoadBalancer", "elasticloadbalancing:CreateTargetGroup" ], "Condition": { "Null": { "aws:RequestTag/elbv2.k8s.aws/cluster": "false" } }, "Effect": "Allow", "Resource": "*" }, { "Action": [ "elasticloadbalancing:CreateListener", "elasticloadbalancing:DeleteListener", "elasticloadbalancing:CreateRule", "elasticloadbalancing:DeleteRule" ], "Effect": "Allow", "Resource": "*" }, { "Action": [ "elasticloadbalancing:AddTags", "elasticloadbalancing:RemoveTags" ], "Condition": { "Null": { "aws:RequestTag/elbv2.k8s.aws/cluster": "true", "aws:ResourceTag/elbv2.k8s.aws/cluster": "false" } }, "Effect": "Allow", "Resource": [ "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*", "arn:aws:elasticloadbalancing:*:*:loadbalancer/net/*/*", "arn:aws:elasticloadbalancing:*:*:loadbalancer/app/*/*" ] }, { "Action": [ "elasticloadbalancing:AddTags", "elasticloadbalancing:RemoveTags" ], "Effect": "Allow", "Resource": [ "arn:aws:elasticloadbalancing:*:*:listener/net/*/*/*", "arn:aws:elasticloadbalancing:*:*:listener/app/*/*/*", "arn:aws:elasticloadbalancing:*:*:listener-rule/net/*/*/*", "arn:aws:elasticloadbalancing:*:*:listener-rule/app/*/*/*" ] }, { "Action": [ "elasticloadbalancing:ModifyLoadBalancerAttributes", "elasticloadbalancing:SetIpAddressType", "elasticloadbalancing:SetSecurityGroups", "elasticloadbalancing:SetSubnets", "elasticloadbalancing:DeleteLoadBalancer", "elasticloadbalancing:ModifyTargetGroup", "elasticloadbalancing:ModifyTargetGroupAttributes", "elasticloadbalancing:DeleteTargetGroup" ], "Condition": { "Null": { "aws:ResourceTag/elbv2.k8s.aws/cluster": "false" } }, "Effect": "Allow", "Resource": "*" }, { "Action": [ "elasticloadbalancing:AddTags" ], "Condition": { "Null": { "aws:RequestTag/elbv2.k8s.aws/cluster": "false" }, "StringEquals": { "elasticloadbalancing:CreateAction": [ "CreateTargetGroup", "CreateLoadBalancer" ] } }, "Effect": "Allow", "Resource": [ "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*", "arn:aws:elasticloadbalancing:*:*:loadbalancer/net/*/*", "arn:aws:elasticloadbalancing:*:*:loadbalancer/app/*/*" ] }, { "Action": [ "elasticloadbalancing:RegisterTargets", "elasticloadbalancing:DeregisterTargets" ], "Effect": "Allow", "Resource": "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*" }, { "Action": [ "elasticloadbalancing:SetWebAcl", "elasticloadbalancing:ModifyListener", "elasticloadbalancing:AddListenerCertificates", "elasticloadbalancing:RemoveListenerCertificates", "elasticloadbalancing:ModifyRule" ], "Effect": "Allow", "Resource": "*" } ] } { "Version": "2012-10-17", "Statement": [ { "Action": [ "route53:ChangeResourceRecordSets" ], "Effect": "Allow", "Resource": "arn:aws:route53:::hostedzone/*" } ] } { "Version": "2012-10-17", "Statement": [ { "Action": [ "route53:GetChange" ], "Effect": "Allow", "Resource": "arn:aws:route53:::change/*" } ] } { "Version": "2012-10-17", "Statement": [ { "Action": [ "route53:ListResourceRecordSets", "route53:ListHostedZonesByName" ], "Effect": "Allow", "Resource": "*" } ] } { "Version": "2012-10-17", "Statement": [ { "Action": [ "ec2:CreateSnapshot", "ec2:AttachVolume", "ec2:DetachVolume", "ec2:ModifyVolume", "ec2:DescribeAvailabilityZones", "ec2:DescribeInstances", "ec2:DescribeSnapshots", "ec2:DescribeTags", "ec2:DescribeVolumes", "ec2:DescribeVolumesModifications" ], "Effect": "Allow", "Resource": "*" }, { "Action": [ "ec2:CreateTags" ], "Condition": { "StringEquals": { "ec2:CreateAction": [ "CreateVolume", "CreateSnapshot" ] } }, "Effect": "Allow", "Resource": [ "arn:aws:ec2:*:*:volume/*", "arn:aws:ec2:*:*:snapshot/*" ] }, { "Action": [ "ec2:DeleteTags" ], "Effect": "Allow", "Resource": [ "arn:aws:ec2:*:*:volume/*", "arn:aws:ec2:*:*:snapshot/*" ] }, { "Action": [ "ec2:CreateVolume" ], "Condition": { "StringLike": { "aws:RequestTag/ebs.csi.aws.com/cluster": "true" } }, "Effect": "Allow", "Resource": "*" }, { "Action": [ "ec2:CreateVolume" ], "Condition": { "StringLike": { "aws:RequestTag/CSIVolumeName": "*" } }, "Effect": "Allow", "Resource": "*" }, { "Action": [ "ec2:DeleteVolume" ], "Condition": { "StringLike": { "ec2:ResourceTag/ebs.csi.aws.com/cluster": "true" } }, "Effect": "Allow", "Resource": "*" }, { "Action": [ "ec2:DeleteVolume" ], "Condition": { "StringLike": { "ec2:ResourceTag/CSIVolumeName": "*" } }, "Effect": "Allow", "Resource": "*" }, { "Action": [ "ec2:DeleteVolume" ], "Condition": { "StringLike": { "ec2:ResourceTag/kubernetes.io/created-for/pvc/name": "*" } }, "Effect": "Allow", "Resource": "*" }, { "Action": [ "ec2:DeleteSnapshot" ], "Condition": { "StringLike": { "ec2:ResourceTag/CSIVolumeSnapshotName": "*" } }, "Effect": "Allow", "Resource": "*" }, { "Action": [ "ec2:DeleteSnapshot" ], "Condition": { "StringLike": { "ec2:ResourceTag/ebs.csi.aws.com/cluster": "true" } }, "Effect": "Allow", "Resource": "*" } ] } { "Version": "2012-10-17", "Statement": [ { "Action": [ "route53:ChangeResourceRecordSets" ], "Effect": "Allow", "Resource": "arn:aws:route53:::hostedzone/*" } ] } { "Version": "2012-10-17", "Statement": [ { "Action": [ "route53:ListHostedZones", "route53:ListResourceRecordSets", "route53:ListTagsForResource" ], "Effect": "Allow", "Resource": "*" } ] } ``` --- title: Set up Openflow - Snowflake Deployment - Task overview source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs.md section: Loading & Unloading Data --- # Set up Openflow - Snowflake Deployment - Task overview This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about-spcs) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) To setup an %ofsfspcs%, perform the following tasks:
Note that steps 3, 4 and 5 are typically repeated for each connector you want to configure in a given deployment. ## Next steps [](/user-guide/data-integration/openflow/setup-openflow-spcs-sf) --- title: Set up Openflow - Snowflake Deployment: Configure allowed domains for Openflow connectors source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list.md section: Loading & Unloading Data --- " /> # Set up %ofsfspcs%: Configure allowed domains for Openflow connectors This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about-spcs) - [](/user-guide/data-integration/openflow/setup-openflow-spcs) %ofsfspcs-plural% access external domain resources. Snowflake controls access to external domains using [network rules](/user-guide/network-rules) and [external access integrations](/developer-guide/external-network-access/creating-using-external-network-access) to either grant or deny access to specific domains. This topic describes the process of [creating a network rule](/sql-reference/sql/create-network-rule) and [creating an external access integration](/sql-reference/sql/create-external-access-integration) to grant access to a specific domain. In addition, the known domains used by Openflow connectors are provided. Two possible workflows exist for managing access to external domains: - [Create a new network rule and external access integration](#label-openflow-create-new-network-rule-grant-domain-access): Create a new network rule that defines a list of allowed domain/port combinations and create a new external access integration using the newly created network rule. - [Alter an existing network rule](#label-openflow-alter-existing-network-rule-grant-domain-access): Alter an existing network rule to add a list of allowed domain/port combinations. ## Create a network rule granting access to one or more domains To create a new network rule that grants access to one or more domain/port combinations, execute an SQL statement similar to: ```sql USE ROLE SECURITYADMIN; CREATE NETWORK RULE MY_OPENFLOW_NETWORK_RULE TYPE = HOST_PORT MODE = EGRESS VALUE_LIST = ('', ''); ``` For example, to allow Snowflake to access `googleads.googleapis.com`, execute the following. ```sql USE ROLE SECURITYADMIN; CREATE NETWORK RULE GOOGLEADS_OPENFLOW_NETWORK_RULE TYPE = HOST_PORT MODE = EGRESS VALUE_LIST = ('googleads.googleapis.com'); ``` For more information, see [](/sql-reference/sql/create-network-rule). After the network rule is created, a external access integration has to be created. To create a new integration, execute an SQL statement similar to: ```sql USE ROLE SECURITYADMIN; CREATE EXTERNAL ACCESS INTEGRATION MY_OPENFLOW_EAI ALLOWED_NETWORK_RULES = (MY_OPENFLOW_NETWORK_RULE) ENABLED = TRUE COMMENT = 'External Access Integration for Openflow connectivity'; ``` ## Alter an existing network rule granting access to one or more domains To alter an existing network rule to grant access to one or more domain/port combinations, execute an SQL statement similar to: ```sql USE ROLE SECURITYADMIN; ALTER NETWORK RULE GOOGLEADS_OPENFLOW_NETWORK_RULE SET VALUE_LIST = ('', '', 'googleads.googleapis.com'); ``` For more information, see [](/sql-reference/sql/alter-network-rule). Use [](/sql-reference/sql/show-network-rules) to list the existing network rules.
Use [](/sql-reference/sql/desc-network-rule) to describe the properties of a specific network rule. If the altered network rule is already associated with an external access integration, it will be updated automatically. If you do not have an external access integration for the altered network rule, refer to the section above for instructions on creating a new integration. ## Next steps 1. Associate an external access integration with your runtime: 1. Navigate to the Openflow canvas. 2. Select the **Runtimes** tab. 3. For the runtime which requires the new external access integration, click the %vertical-more-icon% menu. 4. Select **External access integrations**. 5. Select all required external access integrations from the dropdown list.
Note you may select multiple external access integrations. 6. Click **Save**. Restarting the runtime is not required and the changes are applied immediately. 2. Deploy a connector in a runtime, for a list of connectors available in Openflow, see [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors). ## Domains used by Openflow connectors The following domains are used by Openflow connectors and require network rules to be granted access. ### Amazon Ads The following domains are used by the Amazon Ads connector. - `advertising-api.amazon.com` - `advertising-api-eu.amazon.com` - `advertising-api-fe.amazon.com` - `api.amazon.com` - `api.amazon.co.uk` - `api.amazon.co.jp` - Report location. For example, `offline-report-storage-eu-west-1-prod.s3.eu-west-1.amazonaws.com` is used to download reports. The exact report URL location is not always known before creating a report. Snowflake recommends allow listing all s3 regions:
- `*.s3.eu-west-[1-3].amazonaws.com` - `*.s3.eu-central-[1-2].amazonaws.com` - `*.s3.eu-north-1.amazonaws.com` - `*.s3.eu-south-[1-2].amazonaws.com` - `*.s3.il-central-1.amazonaws.com`
- For advertising-api-fe.amazon.com (Far East / APAC): - `*.s3.ap-northeast-[1-3].amazonaws.com` - `*.s3.ap-south-[1-2].amazonaws.com` - `*.s3.ap-southeast-[1-7].amazonaws.com` - `*.s3.ap-east-[1-2].amazonaws.com` - `*.s3.me-south-1.amazonaws.com` - `*.s3.me-central-1.amazonaws.com` - `*.s3.af-south-1.amazonaws.com` The last domain is obtained from the report URL is returned after the report is ready to fetch. This is an Amazon S3 bucket where the report is stored. Customers will need to specify their own AWS region. for example, `us-east-1` or `eu-west-1` and a specific bucket. As it may be not possible to know the exact region and bucket, Snowflake suggests using wildcards and listing all possible regions for a given location. ### AWS Secret Manager The following domains are used by the AWS Secret Manager connector. - `secretsmanager.us-west-2.amazonaws.com` - `sts.us-west-2.amazonaws.com` - `aws.amazon.com` - `amazonaws.com` ### Box The following domains are used by the Box connector.
- `api.box.com` - `box.com`
### Confluence The following domains are used by the Confluence connector.
- Customer-specific domain name, such as `https://company-name.atlassian.net/`. - For OAuth, [https://atlassian.company-name.com/](https://atlassian.company-name.com/)
### Microsoft Dataverse The following domains are used by the Dataverse connector. - Customer-specific domain name, such as `org12345467.crm.dynamics.com` - For OAuth, `login.microsoftonline.com` ### Google Ads The following domains are used by the Google Ads connector. - `googleads.googleapis.com` ### Google Drive The following domains are used by the Google Drive connector: - `drive.google.com` - `www.googleapis.com` - `oauth2.googleapis.com` - `www.googleapis.com` ### Google Sheets The following domains are used by the Google Sheets connector. - `sheets.googleapis.com` ### Hubspot The following domains are used by the HubSpot connector. - `api.hubapi.com` ### Jira Cloud The following domains are used by the Jira Cloud connector. - Customer-specific domain name, for example `company-name.atlassian.net` - `api.atlassian.com` ### Kafka The following domains are used by the Kafka connector. - Customer Kafka bootstrap servers and all Kafka brokers ### Kinesis The following domains are used by the Kinesis connector. - AWS region dependent. For example: for us-west-2: - `kinesis.us-west-2.amazonaws.com` - `kinesis-fips.us-west-2.api.aws` - `kinesis-fips.us-west-2.amazonaws.com` - `kinesis.us-west-2.api.aws` - `*.control-kinesis.us-west-2.amazonaws.com` - `*.control-kinesis.us-west-2.api.aws` - `*.data-kinesis.us-west-2.amazonaws.com` - `*.data-kinesis.us-west-2.api.aws` - `dynamodb.us-west-2.amazonaws.com` ### LinkedIn Ads The following domains are used by the LinkedIn Ads connector. - `www.linkedin.com` - `api.linkedin.com` ### Meta Ads The following domains are used by the Meta Ads connector. - `graph.facebook.com` ### MySQL The following domains are used by the MySQL connector. - Customer-specific domain and port combination. ### PostgreSQL The following domains are used by the PostgreSQL connector. - Customer-specific domain and port combination. ### SharePoint The following domains are used by the SharePoint connector. - Customer-specific domain—for example, `company-domain.sharepoint.com` or an alias that redirects to `company-domain.sharepoint.com` - `graph.microsoft.com:80` - `graph.microsoft.com:443` - `login.microsoftonline.com` ### Slack The following domains are used by the Slack connector. - `slack.com` - `api.slack.com` - `hooks.slack.com` - `files.slack.com` - `wss-primary.slack.com` - `wss-backup.slack.com` ### SQL Server The following domains are used by the SQL Server connector. - Customer-specific domain and port combination. ### Workday The following domains are used by the Workday connector. - Customer-specific domain and port combination. For example, `company-domain.tenant.myworkday.com`. To obtain the domain, you can use the report URL (base URL is always the same). --- title: Set up Openflow - Snowflake Deployment: Core Snowflake source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs-sf.md section: Loading & Unloading Data --- # Set up %ofsfspcs%: Core Snowflake This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about-spcs) - [](/user-guide/data-integration/openflow/setup-openflow-spcs) %ofsfspcs% requires the creation of the following Snowflake specific resources:
1. [Create the OPENFLOW_ADMIN role](#create-the-openflow-admin-role) 2. [Configure required privileges](#configure-required-privileges)
To complete these tasks, Sign in to %sf-web-interface-link% and open a SQL worksheet. ## Create the OPENFLOW_ADMIN role Create the required Openflow administration role. `` denotes the user that will be used to access Openflow. ```sql USE ROLE ACCOUNTADMIN; CREATE ROLE IF NOT EXISTS OPENFLOW_ADMIN; GRANT ROLE OPENFLOW_ADMIN TO USER ; ``` Users with a default role of ACCOUNTADMIN can't login to %ofsfspcs% runtimes and will get an error message when attempting to do so. Snowflake recommends assigning a different default role to any user that will login to a runtime. In addition, Snowflake recommends setting default secondary roles to `ALL` for all Openflow users. To change the default role and enable all secondary roles, execute the following: For example: ```sql USE ROLE ACCOUNTADMIN; ALTER USER SET DEFAULT_ROLE = ; ALTER USER SET DEFAULT_SECONDARY_ROLES = ('ALL'); ``` ## Configure required privileges Openflow requires defining specific Snowflake Account level privileges. These privileges are assigned to the ACCOUNTADMIN role as part of the default set of privileges. ACCOUNTADMIN will automatically have the following privileges and will be able to grant them to a role of their choosing for the Openflow admin role, shown as `OPENFLOW_ADMIN` role in the following example: ```sql USE ROLE ACCOUNTADMIN; GRANT CREATE OPENFLOW DATA PLANE INTEGRATION ON ACCOUNT TO ROLE OPENFLOW_ADMIN; GRANT CREATE OPENFLOW RUNTIME INTEGRATION ON ACCOUNT TO ROLE OPENFLOW_ADMIN; GRANT CREATE COMPUTE POOL ON ACCOUNT TO ROLE OPENFLOW_ADMIN; ``` ## Next steps Optionally, [Set up PrivateLink UI access](/user-guide/data-integration/openflow/setup-openflow-spcs-configure-pr-ui) to access the Snowflake Openflow Runtime UI using private connectivity. [Create deployment](/user-guide/data-integration/openflow/setup-openflow-spcs-deployment) --- title: Set up Openflow - Snowflake Deployment: Create deployment source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs-deployment.md section: Loading & Unloading Data --- # Set up %ofsfspcs%: Create deployment This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). After configuring core Snowflake, create an Openflow deployment. A deployment is the control plane component that manages your runtimes and connectors. Each deployment can host multiple runtimes, and each runtime can run multiple connectors, giving you flexibility to isolate workloads by project, team, or environment. There is no separate charge for the deployment itself; only active runtimes consume Snowflake credits. 1. [Create a deployment](#label-openflow-spcs-create-deployment) - create the deployment itself. 2. [[Optional] Configure an Openflow-specific event table](#label-openflow-spcs-event-table) - configure an Openflow-specific event table to store Openflow logs and metrics. ## Create a deployment To access the Openflow Runtime UI using PrivateLink as described in [Setup PrivateLink UI access](/user-guide/data-integration/openflow/setup-openflow-spcs-configure-pr-ui), ensure the **PrivateLink** option is enabled when creating a new %ofsfspcs%. 1. Sign in to %sf-web-interface-link% with a role defined in [Configure core Snowflake requirements](/user-guide/data-integration/openflow/setup-openflow-spcs-sf). 2. In the navigation menu, select **Ingestion** %raa% **Openflow**. 3. Select **Launch Openflow**. 4. In the Openflow UI, select **Create a deployment**. The **Deployments** tab opens. 5. Select **Create a deployment**. The Creating a deployment wizard opens. 6. In the **Prerequisites** step, ensure that you meet all the requirements. Select **Next**. 7. In the **Deployment location** step, select **Snowflake** as the deployment location. Enter a name for your deployment. Select **Next**. 8. Select **Create Deployment**. Your deployment will then be created. ## [Optional] Configure an Openflow-specific event table Openflow generates logs and metrics and sends them to the Snowflake Event Table. For helpful queries to analyze this telemetry data, see [Monitor Openflow](/user-guide/data-integration/openflow/monitor). By default, Openflow uses the [account event table](#label-logging-event-table-default) (SNOWFLAKE.TELEMETRY.EVENTS), but you can configure an Openflow-specific event table per deployment. A dedicated event table is recommended to optimize query performance, enable granular access control, and simplify Openflow monitoring and maintenance. 1. To store the event table outside the Openflow database, grant the OPENFLOW_ADMIN role access to the `` and `` where you want to store it: ```sql USE ROLE ACCOUNTADMIN; GRANT USAGE ON DATABASE TO ROLE OPENFLOW_ADMIN; GRANT USAGE ON SCHEMA . TO ROLE OPENFLOW_ADMIN; ``` 2. Create the event table: ```sql USE ROLE OPENFLOW_ADMIN; CREATE EVENT TABLE IF NOT EXISTS ..EVENTS; ``` 3. Get your dataplane name, which you use in the next step, from the `name` column: ```sql SHOW OPENFLOW DATA PLANE INTEGRATIONS; ``` 4. Set the event table for this deployment, replacing `` with the value from the previous step: ```sql ALTER OPENFLOW DATA PLANE INTEGRATION SET EVENT_TABLE = '..EVENTS'; ``` ## [Optional] Create a monitoring role A monitoring role lets data engineers or operations teams monitor Openflow without having the OPENFLOW_ADMIN role. - To create a monitoring role, run the following code: ```sql USE ROLE OPENFLOW_ADMIN; -- Create a role for monitoring Openflow deployments and runtimes if it doesn't yet exist CREATE ROLE IF NOT EXISTS ; GRANT MONITOR ON INTEGRATION TO ROLE ; -- Add to role hierarchy so administrators can manage objects owned by this role GRANT ROLE TO ROLE ; -- Grant the role to the appropriate Snowflake users GRANT ROLE TO USER ; ``` ### Next steps [Create Snowflake role](/user-guide/data-integration/openflow/setup-openflow-spcs-create-rr) --- title: Set up Openflow - Snowflake Deployment: Create runtime source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs-create-runtime.md section: Loading & Unloading Data --- # Set up %ofsfspcs%: Create runtime This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about-spcs) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) A runtime is a containerized Apache NiFi instance that executes your data integration flows – connectors and custom flow definitions. Each runtime is isolated for security and resource control, and can scale from one node up to fifty to handle varying data volumes. To create a runtime in your Snowflake deployment: 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **Ingestion** %raa% **Openflow**. 3. Select **Launch Openflow**. A new tab opens for the Openflow canvas. 4. In **Openflow Control Plane**, select **Create a runtime**. The **Create Runtime** dialog box appears. 5. In the **Create Runtime** populate the following fields:
6. Select **Create**. The runtime takes a couple of minutes to be created. Once created, view your runtime by navigating to the **Runtimes** tab of the Openflow control plane. Select the runtime to open the Openflow canvas. ## [Optional] Grant MONITOR privileges on the runtime If you created a [monitoring role](#label-openflow-spcs-monitoring-role) when setting up your deployment, you can add the runtime to that role. This allows data engineers or operations teams to monitor the runtime without having the OPENFLOW_ADMIN role. - To add the runtime to the monitoring role, run the following code, replacing `` with the name of the Openflow runtime integration: ```sql USE ROLE OPENFLOW_ADMIN; GRANT MONITOR ON INTEGRATION TO ROLE ; ``` ## Next step Configure allowed domains for Openflow connectors. See [](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list). --- title: Set up Openflow - Snowflake Deployment: Create Snowflake role source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs-create-rr.md section: Loading & Unloading Data --- # Set up %ofsfspcs%: Create Snowflake role This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about-spcs) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) %ofsfspcs% requires the creation of a number of resources which are specific not to a deployment but to a specific runtime. Typically such resources include: - Creation of Runtime specific Snowflake role - Creation of Runtime specific network rules and External Access Integrations (EAI) This topic describes the creation of these resources. 1. Create a Snowflake Role and associated privileges to write data to Snowflake Role for Runtimes on Snowflake Deployment Section 2. Associate Snowflake Role. See Snowflake Role for Runtimes in the Snowflake Deployment Section. 3. Create External Access Integrations and associate them to Runtimes. See [Creating External Access Integrations](#label-create-network-rules-and-external-access-integrations) 4. When Outbound PrivateLink connectivity is required to connect to a private system using SPCS Egress. ## Create a Snowflake role When creating and editing Openflow Runtimes, Runtime Owners will have the ability to associate a role with the Runtime. This role will be used for flows that execute within the Runtime. For more information about Snowflake Roles, see [](#label-openflow-spcs-what-is-runtime-role). Creating a Snowflake role is a prerequisite for creating a Runtime and involves the following steps: 1. Create the role itself 2. Grant the role access to the warehouse used by the Runtime. 3. Grant the role access to the Snowflake objects used by the Runtime. 4. Grant the role access to the External Access Integrations used by the Runtime. To create a Snowflake role: 1. Create the required Snowflake role. `` denotes the name of the associated runtime. ```sql USE ROLE ACCOUNTADMIN; CREATE ROLE IF NOT EXISTS OPENFLOW_RUNTIME_ROLE_; GRANT ROLE OPENFLOW_RUNTIME_ROLE_ TO USER ; ``` 2. Allow the Snowflake role to use an existing warehouse that you are planning to use for data ingestion. Use this warehouse later when configuring your connectors for runtimes where you will be using this Snowflake role. ```sql GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE OPENFLOW_RUNTIME_ROLE_; ``` 3. Allow the Snowflake role to use, create or otherwise access Snowflake objects. Depending on the Openflow connector being created the required underlying objects will vary. The example below is for illustration purposes only. ```sql GRANT USAGE ON DATABASE TO ROLE OPENFLOW_RUNTIME_ROLE_; GRANT USAGE ON SCHEMA TO ROLE OPENFLOW_RUNTIME_ROLE_; ``` ### Creating Network Rules and External Access Integrations Snowflake's security model provides secure access to specific endpoints and systems external to Snowflake using [network policies](/user-guide/network-policies). Two key aspects of network policies are [](/user-guide/network-rules) and [External Access Integrations (EAI)](/developer-guide/external-network-access/external-network-access-overview). Each of which is used to provide secure access to external resources required by the runtime. There are three steps that are required to create network rules and external access integrations: 1. Create the network rule, grouping the network identifiers into logical areas. 2. Create the external access integration (EAI), specifying the list of network rules and assuring the Snowflake Role has USAGE on the EAI. 3. Associate the EAI with the Runtime in the Openflow UI when creating Runtimes. To create the required network rule and EAI, perform the following steps: These examples use RUNTIME_NAME as a placeholder for the name of the Runtime being created. 1. Create an appropriate network rule. See [](/sql-reference/sql/create-network-rule) for more information. `` denotes the name of the database that will contain the network rule. Snowflake suggests creating a specific database for network rules and external access integrations related to Openflow. ```sql USE DATABASE ; CREATE NETWORK RULE IF NOT EXISTS OPENFLOW__NETWORK_RULE MODE = EGRESS TYPE = HOST_PORT VALUE_LIST = ('comma separated list of host:port pairs'); ``` 2. Create an external access integration, or add the network rule to an existing one. See [](/sql-reference/sql/create-external-access-integration) for more information. To create a new EAI: ```sql USE ROLE ACCOUNTADMIN; CREATE EXTERNAL ACCESS INTEGRATION IF NOT EXISTS OPENFLOW__EAI ALLOWED_NETWORK_RULES = (OPENFLOW__NETWORK_RULE) ENABLED = TRUE; ``` To add the network rule to an existing EAI, first check which rules are already associated with it, then update the EAI to include both the existing and new rules: ```sql USE ROLE ACCOUNTADMIN; -- Check the current rules on the EAI DESCRIBE EXTERNAL ACCESS INTEGRATION OPENFLOW__EAI; ``` In the output, find the `ALLOWED_NETWORK_RULES` property and note the existing rules. Then update the EAI, listing all existing rules along with the new one: ```sql ALTER EXTERNAL ACCESS INTEGRATION OPENFLOW__EAI SET ALLOWED_NETWORK_RULES = ( , , OPENFLOW__NETWORK_RULE ); ``` 3. Grant access to the EAI to the previously created Snowflake role. ```sql GRANT USAGE ON INTEGRATION OPENFLOW__EAI TO ROLE OPENFLOW_RUNTIME_ROLE_; ``` ## Next steps [Create runtime](/user-guide/data-integration/openflow/setup-openflow-spcs-create-runtime) --- title: Set up Openflow Connector for Amazon Kinesis Data Streams source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kinesis/setup.md section: Loading & Unloading Data --- # Set up %kinesis% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/kinesis/about) - [](/user-guide/data-integration/openflow/connectors/kinesis/maintenance) - [](/user-guide/data-integration/openflow/connectors/kinesis/troubleshoot) - [](/user-guide/data-integration/openflow/connectors/kinesis/performance-tuning) This topic describes how to set up %kinesis%. %kinesis% is designed for JSON message ingestion from Kinesis streams to Snowflake tables, with schema evolution capabilities. ## Set up the Openflow Connector for Kinesis ### Prerequisites 1. Review [](/user-guide/data-integration/openflow/connectors/kinesis/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If you are using Openflow - Snowflake Deployments, ensure that you have reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the Kinesis connector. ### Set up IAM roles and policies in AWS As an AWS administrator, perform the following actions in your AWS account: 1. Create an AWS IAM user or role that Openflow will use to access the Kinesis data stream. For more information, see [Creating IAM users](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html) in the AWS documentation. 2. Ensure that the AWS user has configured [Access Key credentials](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html). 3. Grant the AWS user the following IAM permissions:
Example IAM policy: ```json { "Version": "2012-10-17", "Statement": [ { "Sid": "KinesisStreamAccess", "Effect": "Allow", "Action": [ "kinesis:DescribeStream", "kinesis:DescribeStreamConsumer", "kinesis:GetRecords", "kinesis:GetShardIterator", "kinesis:ListShards", "kinesis:RegisterStreamConsumer" ], "Resource": "arn:aws:kinesis:${REGION}:${ACCOUNT_ID}:stream/${STREAM_NAME}" }, { "Sid": "KinesisConsumerAccess", "Effect": "Allow", "Action": [ "kinesis:DeregisterStreamConsumer", "kinesis:DescribeStreamConsumer", "kinesis:SubscribeToShard" ], "Resource": "arn:aws:kinesis:${REGION}:${ACCOUNT_ID}:stream/${STREAM_NAME}/consumer/*" }, { "Sid": "DynamoDBTableAccess", "Effect": "Allow", "Action": [ "dynamodb:CreateTable", "dynamodb:DeleteTable", "dynamodb:DescribeTable", "dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:Query", "dynamodb:Scan", "dynamodb:UpdateItem" ], "Resource": [ "arn:aws:dynamodb:${REGION}:${ACCOUNT_ID}:table/${APPLICATION_NAME}", "arn:aws:dynamodb:${REGION}:${ACCOUNT_ID}:table/${APPLICATION_NAME}_migration" ] } ] } ``` Before using the example policy, replace the following placeholders:
- The `${APPLICATION_NAME}_migration` table is a temporary DynamoDB table created only during a one-time migration from legacy checkpoint tables to the new schema. It's deleted automatically when migration completes. If your deployment has never used the legacy KCL-based connector, you can omit the migration table ARN from the policy. - The `dynamodb:DeleteTable` action is used during the migration process and can be removed from the policy after migration is confirmed complete. - The `kinesis:DeregisterStreamConsumer` action is invoked when the processor is removed from the canvas. If the IAM principal doesn't have this permission, the consumer must be deregistered manually through the AWS console or CLI. ### Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 2. Create a new role or use an existing role and grant the [database privileges](/sql-reference/sql/grant-privilege). The connector requires the user to create the destination table. Make sure the user has the required privileges for managing Snowflake objects:
Snowflake recommends creating a separate user and role for each Kinesis stream for better access control. You can use the following script to create and configure a custom role (requires SECURITYADMIN or equivalent): ```sql USE ROLE securityadmin; CREATE ROLE openflow_kinesis_connector_role_1; GRANT USAGE ON DATABASE kinesis_db TO ROLE openflow_kinesis_connector_role_1; GRANT USAGE ON SCHEMA kinesis_schema TO ROLE openflow_kinesis_connector_role_1; ``` Privileges must be granted directly to the connector role and can't be inherited. 3. Configure the destination table We highly recommend using server-side schema evolution for schema changes and [an error table for DML error logging](#label-kinesis-dml-error-logging). The example below shows how to create a table and add OWNERSHIP permissions. ```sql USE ROLE openflow_kinesis_connector_role_1; CREATE TABLE kinesis_db.kinesis_schema. ( kinesisMetadata object ) ENABLE_SCHEMA_EVOLUTION = TRUE ERROR_LOGGING = TRUE; USE ROLE securityadmin; GRANT OWNERSHIP ON TABLE TO ROLE openflow_kinesis_connector_role_1; ``` These connectors provide support for automatic schema detection and evolution. The structure of tables in Snowflake is defined and evolved automatically to support the structure of new data loaded by the connector. It will automatically map the record content's first-level keys to table columns matching by name (case-insensitive). With Schema evolution enabled, Snowflake can automatically expand the destination table by adding new columns that are detected in the incoming stream and dropping NOT NULL constraints to accommodate new data patterns. For more information, see [Table schema evolution](/user-guide/data-load-schema-evolution). If ENABLE_SCHEMA_EVOLUTION is not enabled, then you have to create the schema manually by extending the table definition. The connector tries to match the record content's first-level keys to the table columns by name. If keys from the JSON do not match the table columns, the connector ignores the keys. 4. (Optional) Configure a secrets manager Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. 1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you use the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In the Openflow canvas, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 5. Grant access to users Any other Snowflake users who require access to the raw ingested data by the connector (for example, for custom processing in Snowflake), should be granted the role created in step 2. ### (Optional) Configure outbound AWS PrivateLink If you're running the connector in Openflow - Snowflake Deployments and want to route the connector's Kinesis traffic over [outbound private connectivity](/user-guide/private-connectivity-outbound) (AWS PrivateLink) instead of the public internet, follow the steps in this section. The connector makes outbound calls to the following AWS services:
Amazon DynamoDB doesn't support Private DNS for its PrivateLink endpoint. See [Considerations when using AWS PrivateLink for Amazon DynamoDB](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/privatelink-interface-endpoints.html#privatelink-considerations) in the AWS documentation. Because Snowflake's `PRIVATE_HOST_PORT` network rule type relies on Private DNS, the connector can't route DynamoDB traffic through a PrivateLink endpoint. Configure DynamoDB using `HOST_PORT` (public endpoint) as shown in the example. Only checkpoint metadata flows through the public endpoint to DynamoDB. Stream records flow through the private Kinesis endpoint. To configure outbound AWS PrivateLink, complete the following steps: 1. As ACCOUNTADMIN, provision an outbound PrivateLink endpoint for Amazon Kinesis Data Streams in the region where your stream is located. Replace `` with your AWS region (for example, `us-east-1`): ```sql USE ROLE ACCOUNTADMIN; SELECT SYSTEM$PROVISION_PRIVATELINK_ENDPOINT( 'com.amazonaws..kinesis-streams', 'kinesis..amazonaws.com' ); ``` For more information, see [SYSTEM$PROVISION_PRIVATELINK_ENDPOINT](/sql-reference/functions/system_provision_privatelink_endpoint) and [Managing outbound private connectivity endpoints on AWS](/user-guide/private-manage-endpoints-aws). 2. Create network rules that reach `kinesis..amazonaws.com` through the private endpoint and DynamoDB through the public endpoint. Replace `` with the schema you use to host network rules: ```sql USE ROLE ACCOUNTADMIN; USE SCHEMA ; CREATE OR REPLACE NETWORK RULE openflow_kinesis_private_network_rule MODE = EGRESS TYPE = PRIVATE_HOST_PORT VALUE_LIST = ('kinesis..amazonaws.com'); CREATE OR REPLACE NETWORK RULE openflow_kinesis_public_network_rule MODE = EGRESS TYPE = HOST_PORT VALUE_LIST = ('dynamodb..amazonaws.com:443'); ``` 3. Attach both network rules to an external access integration, then grant the runtime role permission to use the integration: ```sql USE ROLE ACCOUNTADMIN; CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION openflow_kinesis_eai ALLOWED_NETWORK_RULES = ( openflow_kinesis_private_network_rule, openflow_kinesis_public_network_rule ) ENABLED = TRUE COMMENT = 'External access integration for the Openflow Connector for Kinesis'; GRANT USAGE ON INTEGRATION openflow_kinesis_eai TO ROLE ; ``` For the steps to associate the integration with a runtime, see [](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list). ### Set up the connector As a data engineer, perform the following tasks to install and configure the connector: #### Install the connector 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the **Openflow connector for Amazon Kinesis Data Streams** and select **Add to runtime**. 3. In the Select runtime dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database, schema, and a table in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. #### Configure the connector 1. If needed, customize the connector configuration before configuring the built-in parameters. 2. Populate the process group parameters 1. Right-click on the imported process group and select **Parameters**. 2. Fill out the required parameter values. ##### Common parameters
#### Start the connector 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the plane and select **Start**. The connector starts data ingestion. ## Understanding KINESISMETADATA column The connector populates the KINESISMETADATA structure with metadata about the Kinesis record. The structure contains the following information:
## Measuring ingestion latency For change tracking, incremental processing, and Time Travel queries based on row modification time, the ROW_TIMESTAMP feature can be used. It can be enabled by running the following command on your destination table: ```sql ALTER TABLE SET ROW_TIMESTAMP = TRUE; ``` After row timestamps are enabled, tables expose the `METADATA$ROW_LAST_COMMIT_TIME` column, which returns the timestamp when each row was last modified. For more information, see [Row timestamps](/user-guide/data-engineering/row-timestamps). Row timestamp isn't available for interactive tables. For more information, see [](#label-limitations-of-interactive-tables). ## Using the connector with Apache Iceberg™ tables The connector can ingest data into Snowflake-managed Apache Iceberg™ tables but must meet the following requirements: - You must have been granted the USAGE privilege on the external volume associated with your Apache Iceberg™ table. - You must create an Apache Iceberg™ table before running the connector. ### Grant usage on an external volume For example, if your Iceberg table uses the `kinesis_external_volume` external volume and the connector uses the role `openflow_kinesis_connector_role_1`, run the following statement: ```sql USE ROLE ACCOUNTADMIN; GRANT USAGE ON EXTERNAL VOLUME kinesis_external_volume TO ROLE openflow_kinesis_connector_role_1; ``` ### Create an Apache Iceberg™ table for ingestion The connector does not create Iceberg tables automatically and does not support schema evolution. Before you run the connector, you must create an Iceberg table manually. When you create an Iceberg table, you can use Iceberg data types (including VARIANT) or [compatible Snowflake types](/user-guide/tables-iceberg-data-types). For example, consider the following message: ```json { "id": 1, "name": "Steve", "body_temperature": 36.6, "approved_coffee_types": ["Espresso", "Doppio", "Ristretto", "Lungo"], "animals_possessed": { "dogs": true, "cats": false }, "options": { "can_walk": true, "can_talk": false }, "date_added": "2024-10-15" } ``` To create an Iceberg table for the example message, use one of the following statements: ```sql CREATE OR REPLACE ICEBERG TABLE my_iceberg_table ( kinesisMetadata OBJECT( stream STRING, shardId STRING, approximateArrival STRING, partitionKey STRING, sequenceNumber STRING, subSequenceNumber INTEGER, shardedSequenceNumber STRING ), id INT, name string, body_temperature float, approved_coffee_types array(string), animals_possessed variant, date_added date, options object(can_walk boolean, can_talk boolean) ) EXTERNAL_VOLUME = 'my_volume' CATALOG = 'SNOWFLAKE' BASE_LOCATION = 'my_location/my_iceberg_table' ICEBERG_VERSION = 3; ``` ## Using the connector with Interactive Tables Interactive tables are a special type of Snowflake table optimized for low-latency, high-concurrency queries. You can find out more about interactive tables in the [interactive tables documentation](/user-guide/interactive). 1. Create an interactive table: ```sql CREATE INTERACTIVE TABLE REALTIME_METRICS ( metric_name VARCHAR, metric_value NUMBER, source_topic VARCHAR, timestamp TIMESTAMP_NTZ ) CLUSTER BY (metric_name) AS (SELECT $1:M_NAME::VARCHAR, $1:M_VALUE::NUMBER, $1:RECORD_METADATA.topic::VARCHAR, $1:RECORD_METADATA.timestamp::TIMESTAMP_NTZ from TABLE(DATA_SOURCE(TYPE => 'STREAMING'))); ``` Important considerations: - Interactive tables have specific limitations and query restrictions. Review the [interactive tables documentation](/user-guide/interactive) before using them with the connector. - For interactive tables, any required transformations must be handled in the table definition. - Interactive warehouses are required to query interactive tables efficiently. ## Using the connector with a customer-defined schema for the destination table The connector treats each Kinesis record as a row to be inserted into a Snowflake table. For example, if you have a Kinesis topic with the content of the message structured like the following JSON: ```json { "order_id": 12345, "customer_name": "John", "order_total": 100.00, "isPaid": true } ``` By default you don't have to specify all fields from the JSON. Schema evolution will take care of it. However, if you prefer a static schema, it can be created by running: ```sql CREATE TABLE ORDERS ( kinesisMetadata OBJECT, order_id NUMBER, customer_name VARCHAR, order_total FLOAT, ispaid BOOLEAN ); ``` ## Using the connector with a customer-defined PIPE If you choose to create your own pipe, you can define the data transformation logic in the pipe's [COPY INTO](/sql-reference/sql/copy-into-table) statement. You can rename columns as required and cast the data types as needed. For example: ```sql CREATE TABLE ORDERS ( order_id VARCHAR, customer_name VARCHAR, order_total VARCHAR, ispaid VARCHAR ); CREATE PIPE ORDERS AS COPY INTO ORDERS FROM ( SELECT $1:order_id::STRING, $1:customer_name, $1:order_total::STRING, $1:isPaid::STRING FROM TABLE(DATA_SOURCE(TYPE => 'STREAMING')) ); ``` When you define your own pipe your destination table columns do not have to match the JSON keys. You can rename the columns to your desired names and cast the data types if required. To adjust the connector to work with a custom pipe, perform the following tasks: 1. Right-click on the PublishSnowpipeStreaming processor used in your Kinesis ingestion flow in the Openflow canvas. 2. Select **Configure** from the context menu. 3. Navigate to the **Properties** tab. 4. In the Destination type field, pick **Pipe**. 5. In the Pipe field, type the name of your pipe. 6. Select **Apply** to save the configuration. ## Customizing error handling Error handling is split between Openflow-side failures and server-side failures within the Snowpipe Streaming service. - **Openflow Errors (Client-Side Failures)**: Errors such as unparseable payloads or custom transformation failures occur before records reach Snowflake. By default these records are discarded. It's possible to process these errors in Openflow - use FlowFiles from the parse failure relationship in the ConsumeKinesis processor. - **Snowpipe Streaming Errors (Server-Side Failures)**: Errors for records that successfully reach Snowflake but are incompatible with the destination table's schema (for example, type mismatches) are captured by the Snowflake infrastructure. When error logging is enabled on the destination table (`error_logging = true`), these failed rows are automatically ingested into the destination Error table. ## Next steps - [](/user-guide/data-integration/openflow/connectors/kinesis/performance-tuning) - [](/user-guide/data-integration/openflow/connectors/kinesis/maintenance) - [](/user-guide/data-integration/openflow/connectors/kinesis/troubleshoot) --- title: Set up PrivateLink UI access in Openflow - Snowflake Deployments source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs-configure-pr-ui.md section: Loading & Unloading Data --- # Set up PrivateLink UI access in %ofsfspcs-plural% This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about-spcs) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic explains how to configure access to the Snowflake Openflow Runtime UI using private connectivity. This is an optional task. If you will not be accessing the Openflow Runtime UI using public connectivity, you can skip this task. There are two tasks to configure access to the Snowflake Openflow Runtime UI using private connectivity: 1. [](#label-openflow-spcs-configure-pr-ui-access-ui) 2. [](#label-openflow-spcs-configure-pr-ui-create-deployment) ## Prerequisites Before configuring private link for the Openflow Runtime UI, enable PrivateLink for your account as described in [](/user-guide/admin-security-privatelink). ## Determine PrivateLink URLs 1. Using the ACCOUNTADMIN role, call the SYSTEM$GET_PRIVATELINK_CONFIG function in your Snowflake account and identify the value for `openflow-privatelink-url`. This is the URL for accessing Openflow UI over PrivateLink in the form: - `-.openflow..privatelink.snowflakecomputing.com` 2. The URL for accessing the Runtime UI in a Snowflake deployment will be in the form: - `of---.spcs..privatelink.snowflake.app` 3. Create CNAME records in your DNS to resolve these URL values to your VPC endpoint. 4. Confirm that your DNS settings can resolve the value. 5. Confirm that you can connect to Openflow UI using this URL from your browser. 6. Confirm that you can connect to Runtime UI using this URL from your browser. ## Configure PrivateLink for Openflow Runtime UI access Perform the following steps: 1. Retrieve Snowflake's VPC endpoint service ID and Openflow PrivateLink URLs: 1. As a user with the ACCOUNTADMIN role, execute ```sql SELECT SYSTEM$GET_PRIVATELINK_CONFIG(); ``` 1. From the output, identify and save the values for the following keys: - `privatelink-vpce-id` - `openflow-privatelink-url` - `external-telemetry-privatelink-url` 2. Construct the Runtime URL - `of---.spcs..privatelink.snowflake.app` 2. Create a VPC endpoint with parameters: If the Snowflake account where you plan to create your Openflow Deployment had previously configured PrivateLink for %sf-web-interface%, use the existing AWS VPC endpoint and add the additional OpenFlow DNS records to your Route 53. - Type: `PrivateLink Ready partner services` - Service: `privatelink-vpce-id` value obtained in the previous step. - VPC: The VPC where your Openflow deployment will be running. - Subnets: Select two availability zones and private subnets where your Openflow deployment will run. 3. Set up a Route 53 private hosted zone for Openflow UI with the following parameters: - Domain: `privatelink.snowflakecomputing.com` - Type: `Private hosted zone` - Select the region and VPC where your Openflow deployment will run. 4. Set up a Route 53 private hosted zone for Openflow UI with the following parameters: - Domain: `privatelink.snowflakecomputing.com` - Type: `Private hosted zone` - Select the region and VPC where your Openflow deployment will run. 5. Set up a Route 53 private hosted zone for Runtime UI with the following parameters: - Domain: `privatelink.snowflake.app` - Type: `Private hosted zone` - Select the region and VPC where your Openflow deployment will run. 6. Add two CNAME records for the URLs identified in the first step: - For `openflow-privatelink-url` - Record name: `openflow-privatelink-url` value obtained in the first step - Record type: `CNAME` - Value: DNS name of your VPC endpoint - For Runtime UI URL - Record name: `openflow-runtime-ui-privatelink-url` value obtained in the first step - Record type: `CNAME` - Value: DNS name of your VPC endpoint When creating a new %ofsfspcs%, ensure the **PrivateLink** option is enabled. ### Next steps [Create deployment](/user-guide/data-integration/openflow/setup-openflow-spcs-deployment) --- title: Set Up SAP® BDC Connect for Snowflake Zerocopy Connector source: https://docs.snowflake.com/en/user-guide/data-integration/zero-copy/sap-sql/setup.md section: Loading & Unloading Data --- # Set Up %sapbdc% Zerocopy Connector - [](/user-guide/data-integration/zero-copy/about-sap-snowflake) - [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-snowflake) - [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-bdc) - [](/user-guide/data-integration/zero-copy/sap-sql/security) - [](/user-guide/data-integration/zero-copy/sap-sql/explore-data-products) - [](/user-guide/data-integration/zero-copy/sap-sql/publish-data) The Zerocopy Connector is subject to the [SAP® BDC Connect for Snowflake Terms](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/sap-bdc-connect-snowflake/). The steps to share Data Products from SAP® BDC to SAP® Snowflake accounts and existing Snowflake accounts that use the SAP® BDC Connect for Snowflake are exactly the same. This topic describes how to create and manage a Zerocopy Connector for %sapbdc% on the Snowflake side. For the SAP® side setup, see [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-snowflake) or [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-bdc). For the privileges required for each operation, see [](/user-guide/data-integration/zero-copy/sap-sql/security). ## Prerequisites Before creating a Zerocopy Connector: - An `ORGADMIN` must accept the SAP® BDC Connect for Snowflake Terms. This only needs to be done once per Snowflake organization. Terms of Service cannot be self-revoked — contact Snowflake support and legal to revoke them. To accept the SAP® BDC Connect for Snowflake Terms in Snowsight: 1. Sign in to Snowflake as a user with the `ORGADMIN` role. 2. In the navigation menu, select **Admin** %raa% **Terms**. 3. In the **Snowflake Marketplace** section, next to **SAP® BDC Connect for Snowflake Terms**, select **Review**. 4. Select **Acknowledge & Continue**. - Complete the SAP® side setup described in [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-snowflake) or [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-bdc). - The role used to create the connector must have `CREATE ZEROCOPY CONNECTOR` on the target schema. By default, the owner role of a schema has this privilege. ## Create a Database and Schema A Zerocopy Connector is a schema-level object. Before creating one, ensure you have a target database and schema, or create new ones. For reference, see [](/sql-reference/sql/create-database) and [](/sql-reference/sql/create-schema). ```sql CREATE DATABASE IF NOT EXISTS my_db; CREATE SCHEMA IF NOT EXISTS my_db.my_schema; ``` ## Create a Zerocopy Connector A Zerocopy Connector is a schema-level object. You can specify a fully qualified name (`..`), a partially qualified name, or a plain name when the database and schema are set in the current session context. ```sql CREATE [ OR REPLACE ] ZEROCOPY CONNECTOR [ IF NOT EXISTS ] PARTNER = SAP_BDC; ``` ```sql CREATE ZEROCOPY CONNECTOR IF NOT EXISTS my_db.my_schema.my_sap_connector PARTNER = SAP_BDC; ``` After creation, the connector is in `NEW` state. No connection is established until you run `ALTER ... CONNECT`. ## Enroll with SAP® BDC The connector must be in `NEW`, `CONNECT_ERROR`, or `DISCONNECTED` state. See [](#connector-states) for details. ```sql ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector CONNECT WITH CONFIG = ( INVITATION_LINK = '' ); ``` The connector immediately enters `CONNECTING` state while the connection is established asynchronously. Use `DESC ZEROCOPY CONNECTOR` to check the current state. ### Verify Connector State Use `DESCRIBE` to check the current state of a connector: ```sql DESC ZEROCOPY CONNECTOR my_db.my_schema.my_sap_connector; ``` #### Output
To list all connectors visible to the current role: ```sql SHOW ZEROCOPY CONNECTORS IN SCHEMA my_db.my_schema; SHOW ZEROCOPY CONNECTORS IN DATABASE my_db; SHOW ZEROCOPY CONNECTORS IN ACCOUNT; ``` ## Set Properties You can set optional properties on a connector using `ALTER ... SET`: ```sql -- Set a comment ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector SET COMMENT = 'SAP BDC connector for sales data products'; -- Enabling share back allows publishing data from Snowflake to SAP BDC ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector SET SHARE_BACK = TRUE; ``` To unset a property and restore its default value: ```sql ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector UNSET COMMENT; ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector UNSET SHARE_BACK; ``` ## Disconnect the Connector All catalog-linked databases created from the connector must be dropped before disconnecting. Share-back must be disabled before disconnecting. The connector must be in `CONNECTED` or `DISCONNECT_ERROR` state. ```sql ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector DISCONNECT; ``` The connector immediately enters `DISCONNECTING` state while the connection is dropped asynchronously. When successful, it transitions to `DISCONNECTED`. ## Drop the Connector You can only drop a connector that is in `NEW`, `CONNECT_ERROR`, `DISCONNECT_ERROR`, or `DISCONNECTED` state. Zerocopy Connectors do not support `UNDROP`. ```sql DROP ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector; ``` ## Next Steps Once the connector is in `CONNECTED` state, you can: - List available SAP® data products and create catalog-linked databases. See [](/user-guide/data-integration/zero-copy/sap-sql/explore-data-products). - Publish Snowflake data back to SAP® BDC. See [](/user-guide/data-integration/zero-copy/sap-sql/publish-data). --- title: Set up tasks for the Openflow Connector for Oracle source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/setup-tasks.md section: Loading & Unloading Data --- # Set up tasks for the %oracleofc% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). The %oracleofc% is also subject to additional terms of service beyond the standard connector terms of service. For more information, see the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/). - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/oracle/about) - [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms) - [](/user-guide/data-integration/openflow/connectors/oracle/data-mapping) - [](/user-guide/data-integration/openflow/connectors/oracle/setup-oracledb) This topic describes the overall tasks required to set up, configure, and run the %oracleofc%. ## Prerequisites Before you set up the %oracleofc%, verify that the following prerequisites are met: 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/oracle/about). 2. Ensure that you have set up an Openflow deployment: - [](/user-guide/data-integration/openflow/setup-openflow-byoc) - [Set up Openflow - Snowflake Deployment](/user-guide/data-integration/openflow/setup-openflow-spcs) 3. Ensure that you add only one connector instance per runtime. ## Tasks Perform the following tasks to set up, configure, and run the %oracleofc%.
## Next steps - [Monitor the flow](/user-guide/data-integration/openflow/monitor). - [Maintenance](/user-guide/data-integration/openflow/connectors/oracle/maintenance) for reinstalling the connector or changing the XStream position. --- title: Set up the Atlassian Jira Cloud (Agile) flow source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/jira-cloud/setup-agile.md section: Loading & Unloading Data --- # Set up the %jiraagile% flow This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/jira-cloud/about) - [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-core) - [](/user-guide/data-integration/openflow/connectors/jira-cloud/migrate-from-legacy) This topic describes the steps to install and configure the %jiraagile% flow, the agile flow of the %jira%. The core flow is documented separately in [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-core). The agile flow is independent of the core flow. It uses its own API token, parameter contexts, state service, and Snowflake destination configuration. Both flows can write to the same Snowflake database and schema, since they create tables with different names. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/jira-cloud/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %OFSFSPCS-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-jira-cloud) connector. ## Get the credentials As a Jira Cloud administrator, perform the following tasks in your Atlassian account. You can reuse the API token from the core flow or create a separate token. The core flow and agile flow can use the same token, but they always share the underlying Jira API rate budget regardless. 1. Navigate to the [API tokens page](https://id.atlassian.com/manage-profile/security/api-tokens). 2. Select **Create API token with scopes**. 3. In the **Create an API token** dialog box, provide a descriptive name for the API token and select an expiration date for the API token. This can range from 1 to 365 days. 4. Select the API token app **Jira**. 5. Select the agile scopes listed in [Required API scopes](#label-jira-agile-api-scopes). 6. Select **Create token**. 7. In the **Copy your API token** dialog box, select **Copy** to copy your generated API token and then paste the token to the connector parameters, or save it securely. 8. Select **Close** to close the dialog box. ### Required API scopes The agile flow always requires the following baseline Jira API scopes: - `read:board-scope:jira-software`, `read:board-scope.admin:jira-software`, `read:project:jira` (covers the always-created `BOARD` table) - `read:jira-user` (covers the connection verification that runs at startup against `GET /rest/api/3/myself`) The API token owner additionally needs the **Browse projects** Jira permission on every project whose boards you want to ingest, as well as access to each board's saved filter (used when reading board configuration). Some optional tables require additional scopes on top of the baseline:
If you reuse a single API token across both flows, combine these scopes with the core flow scopes documented in [](#label-jira-core-api-scopes). Tokens without scopes are also supported and grant access based solely on the API token owner's permissions. However, tokens with scopes are recommended for fine-grained access control. ## Set up Snowflake account If you've already completed the Snowflake account setup for the core flow, you can reuse the same role, service user, key pair, database, schema, and warehouse for the agile flow. The agile flow parameters point at this same Snowflake configuration. Otherwise, perform the following tasks: As a Snowflake account administrator, perform the following tasks: 1. Create a new role or use an existing role. 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Configure a secrets manager supported by Openflow (recommended), for example, AWS, Azure, and HashiCorp, and store the public and private keys in the secret store. If for any reason, you don't want to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. After the secrets manager is configured, determine how you will authenticate to it. On AWS, use the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the main menu (⋮) in the upper-right corner. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point, all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 6. If any other Snowflake users require access to the tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 7. Create a database and schema in Snowflake for the connector to store ingested data. Grant the following [](#label-database-privileges) to the role created in the first step. ```sql CREATE DATABASE jira_destination_db; CREATE SCHEMA jira_destination_db.jira_destination_schema; GRANT USAGE ON DATABASE jira_destination_db TO ROLE ; GRANT USAGE ON SCHEMA jira_destination_db.jira_destination_schema TO ROLE ; GRANT CREATE TABLE ON SCHEMA jira_destination_db.jira_destination_schema TO ROLE ; ``` 8. Create a warehouse that the connector will use or use an existing one. Start with the smallest warehouse size, then experiment with size depending on the amount of data transferred. Large data volumes typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. 9. Ensure that the user with the role used by the connector has the required privileges to use the warehouse. If that's not the case then grant the required privileges to the role. ```sql CREATE WAREHOUSE jira_connector_warehouse WITH WAREHOUSE_SIZE = 'X-Small'; GRANT USAGE ON WAREHOUSE jira_connector_warehouse TO ROLE ; ``` ## Set up the connector The agile flow is shipped as the %jiraagile% process group. As a data engineer, perform the following tasks to install and configure it. ### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. After import, the agile flow appears on the canvas as the %jiraagile% process group. ### Configure the connector 1. Right-click on the imported %jiraagile% process group and select **Parameters**. 2. Populate the required parameter values as described in [Flow parameters](#label-jira-agile-flow-parameters). ### Flow parameters The agile flow uses its own separate parameter contexts. The Jira credentials and Snowflake destination must be configured independently from the core flow. Both flows can point to the same Snowflake destination database and schema. - [Jira Cloud (Agile) Source Parameters](#label-jira-agile-source-parameters): Used to establish connection with the Jira API. - [Jira Cloud (Agile) Destination Parameters](#label-jira-agile-destination-parameters): Used to establish connection with Snowflake. - [Jira Cloud (Agile) Ingestion Parameters](#label-jira-agile-ingestion-parameters): Used to define the configuration of data ingested from Jira. #### Jira Cloud (Agile) Source Parameters
#### Jira Cloud (Agile) Destination Parameters
#### Jira Cloud (Agile) Ingestion Parameters
## Run the flow 1. Right-click on the canvas and select **Enable all Controller Services**. 2. Right-click on the %jiraagile% process group and select **Start**. The flow starts the data ingestion. On first run, the flow creates the required Snowflake tables in the destination schema. See [](#label-jira-entities) for the full list of tables created by the agile flow and the parameters that control which optional tables are populated. ## Resetting the connector state If you want to restart the ingestion from scratch, clear the agile flow's ingestion state. The agile flow uses its own centralized state service rather than per-processor state. To reset the state, perform the following steps: 1. Right-click the %jiraagile% process group and select **Stop**. 2. Navigate to the **Controller Settings** for the process group. 3. Find the **StandardJiraIngestionStateService** controller service and select **View State**. 4. Select **Clear State**. This clears the agile flow's ingestion tracking. 5. Optionally, update the connector parameters if needed. 6. Right-click the %jiraagile% process group and select **Start**. The agile flow's destination tables (`BOARD`, `SPRINT`, `BOARD_SPRINT`, `BOARD_PROJECT`, `BOARD_ISSUE`) are fully refreshed on every scheduled run, regardless of whether you clear the state. ## Next steps - [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-core) if you haven't yet installed the core flow. - [](/user-guide/data-integration/openflow/connectors/jira-cloud/migrate-from-legacy) if you're moving from a previous version of the Jira Cloud connector. --- title: Set up the Atlassian Jira Cloud (Core) flow source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/jira-cloud/setup-core.md section: Loading & Unloading Data --- # Set up the %jiracore% flow This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/jira-cloud/about) - [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-agile) - [](/user-guide/data-integration/openflow/connectors/jira-cloud/migrate-from-legacy) This topic describes the steps to install and configure the %jiracore% flow, the core flow of the %jira%. The agile flow is documented separately in [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-agile). ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/jira-cloud/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %OFSFSPCS-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-jira-cloud) connector. ## Get the credentials As a Jira Cloud administrator, perform the following tasks in your Atlassian account: 1. Navigate to the [API tokens page](https://id.atlassian.com/manage-profile/security/api-tokens). 2. Select **Create API token with scopes**. 3. In the **Create an API token** dialog box, provide a descriptive name for the API token and select an expiration date for the API token. This can range from 1 to 365 days. 4. Select the API token app **Jira**. 5. Select the required scopes based on the features you plan to use. See [Required API scopes](#label-jira-core-api-scopes) for details. 6. Select **Create token**. 7. In the **Copy your API token** dialog box, select **Copy** to copy your generated API token and then paste the token to the connector parameters, or save it securely. 8. Select **Close** to close the dialog box. ### Required API scopes The core flow always requires the following baseline Jira API scopes: - `read:jira-work` (covers issues, projects, fields, comments, changelogs, worklogs, votes, watchers, remote links, permissions, project components, and project versions) - `read:jira-user` (covers users and user groups, and the connection verification and timezone lookup that run at startup against `GET /rest/api/3/myself`) The API token owner additionally needs the **Browse projects** Jira permission on every project that you want to ingest. Some optional tables require additional scopes or permissions on top of the baseline:
Comments restricted to specific roles or groups are visible only when the API token owner is a member of these roles or groups, regardless of the token scope or permission configuration. Tokens without scopes are also supported and grant access based solely on the API token owner's permissions. However, tokens with scopes are recommended for fine-grained access control. ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1. Create a new role or use an existing role. 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Configure a secrets manager supported by Openflow (recommended), for example, AWS, Azure, and HashiCorp, and store the public and private keys in the secret store. If for any reason, you don't want to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. After the secrets manager is configured, determine how you will authenticate to it. On AWS, use the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the main menu (⋮) in the upper-right corner. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point, all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 6. If any other Snowflake users require access to the tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 7. Create a database and schema in Snowflake for the connector to store ingested data. Grant the following [](#label-database-privileges) to the role created in the first step. ```sql CREATE DATABASE jira_destination_db; CREATE SCHEMA jira_destination_db.jira_destination_schema; GRANT USAGE ON DATABASE jira_destination_db TO ROLE ; GRANT USAGE ON SCHEMA jira_destination_db.jira_destination_schema TO ROLE ; GRANT CREATE TABLE ON SCHEMA jira_destination_db.jira_destination_schema TO ROLE ; ``` 8. Create a warehouse that the connector will use or use an existing one. Start with the smallest warehouse size, then experiment with size depending on the amount of data transferred. Large data volumes typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. 9. Ensure that the user with the role used by the connector has the required privileges to use the warehouse. If that's not the case then grant the required privileges to the role. ```sql CREATE WAREHOUSE jira_connector_warehouse WITH WAREHOUSE_SIZE = 'X-Small'; GRANT USAGE ON WAREHOUSE jira_connector_warehouse TO ROLE ; ``` ## Set up the connector The core flow is shipped as the %jiracore% process group. As a data engineer, perform the following tasks to install and configure it. ### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. After import, the core flow appears on the canvas as the %jiracore% process group. ### Configure the connector 1. Right-click on the imported %jiracore% process group and select **Parameters**. 2. Populate the required parameter values as described in [Flow parameters](#label-jira-core-flow-parameters). ### Flow parameters The core flow uses the following parameter contexts: - [Jira Cloud (Core) Source Parameters](#label-jira-core-source-parameters): Used to establish connection with the Jira API. - [Jira Cloud (Core) Destination Parameters](#label-jira-core-destination-parameters): Used to establish connection with Snowflake. - [Jira Cloud (Core) Ingestion Parameters](#label-jira-core-ingestion-parameters): Used to define the configuration of data ingested from Jira. #### Jira Cloud (Core) Source Parameters
#### Jira Cloud (Core) Destination Parameters
#### Jira Cloud (Core) Ingestion Parameters
## Run the flow 1. Right-click on the canvas and select **Enable all Controller Services**. 2. Right-click on the %jiracore% process group and select **Start**. The flow starts the data ingestion. On first run, the flow creates the required Snowflake tables in the destination schema. See [](#label-jira-entities) for the full list of tables created by the core flow and the parameters that control which optional tables are populated. ## Resetting the connector state If you need to change the project filter or want to restart the ingestion from scratch, you must clear the ingestion state. The core flow uses a centralized state service rather than per-processor state. To reset the state, perform the following steps: 1. Right-click the %jiracore% process group and select **Stop**. 2. Navigate to the **Controller Settings** for the process group. 3. Find the **StandardJiraIngestionStateService** controller service and select **View State**. 4. Select **Clear State**. This clears all project tracking, pagination, and timestamp state. 5. Optionally, update the connector parameters if needed. 6. Right-click the %jiracore% process group and select **Start**. Clearing the ingestion state causes the connector to re-fetch all data from the beginning. The destination tables are not truncated. Existing rows are updated in place, and rows that no longer exist in Jira are flagged with `_SNOWFLAKE_DELETED = TRUE`. ## Accessing the data Data fetched from Jira is available in the destination tables with explicit column schemas. There is no need to use JSON flattening or views to query the data. Each entity is stored in its own table. For example, to query issues and their comments: ```sql SELECT i.KEY, i.SUMMARY, c.BODY AS comment_body, c.CREATED AS comment_created FROM ISSUE i JOIN COMMENT c ON i.ID = c.ISSUE_ID ORDER BY c.CREATED DESC; ``` To exclude deleted issues from query results, filter on the connector-managed `_SNOWFLAKE_DELETED` column. The connector sets this flag to `TRUE` on the matching `ISSUE` row when an issue is deleted in Jira, so no anti-join against `DELETED_ISSUE` is needed: ```sql SELECT i.* FROM ISSUE i WHERE i._SNOWFLAKE_DELETED = FALSE; ``` The `DELETED_ISSUE` table is still useful when you need the deletion timestamp or the user who performed the deletion. See [](#label-jira-metadata-columns) for the full set of connector-managed metadata columns. ## Enabled tables configuration The `Enabled Tables` parameter controls which optional tables are populated. Ingestion of the `ISSUE`, `PROJECT`, `USER`, and `FIELD` tables is always enabled and can't be disabled. Enabling all tables may cause performance issues and require a larger runtime. Available values for `Enabled Tables`: - `CHANGELOG` (field change history for issues) - `COMMENT` (comments on issues) - `ISSUE_REMOTE_LINK` (remote links attached to issues) - `ISSUE_SECURITY_SCHEME` (issue-level security configurations) - `ISSUE_VOTE` (users who voted on issues) - `ISSUE_WATCHER` (users watching issues) - `PERMISSION` (global and project permission definitions) - `PROJECT_COMPONENT` (components defined in a project) - `PROJECT_VERSION` (release versions of a project) - `USER_GROUP` (group memberships per user) - `WORKLOG` (time tracking entries on issues) The per-issue tables (`CHANGELOG`, `COMMENT`, `ISSUE_REMOTE_LINK`, `ISSUE_VOTE`, `ISSUE_WATCHER`, `WORKLOG`) and per-project tables (`PROJECT_COMPONENT`, `PROJECT_VERSION`) only ingest data for issues and projects that are also covered by `Project Keys Filter`. Some tables are populated by calling the Jira API once per parent entity (for example, once per user or once per issue). On large Jira instances, enabling these tables can significantly increase the number of API calls and the load on the ingestion runtime, and can slow down population of the parent table due to back-pressure on the upstream processor. Enable these tables only when the corresponding data is required. ## Issue fields configuration The `ISSUE` table schema depends on the `Issue Fields` parameter. The parameter accepts a comma-separated list of field IDs or one of the special values below. Prefix a field with a minus (`-`) to exclude it. For example, `*all,-description` returns all fields except `description`. - `*standard` (default): Fetches standard Jira fields (Summary, Status, Priority, Assignee). - `*navigable`: Fetches all navigable fields. - `*all`: Fetches all fields, including custom fields. - Individual field IDs can be specified (for example, `summary,status,customfield_10001`). The default value `*standard` **doesn't include custom fields**. To ingest custom fields, set this parameter to `*all` or list the custom fields explicitly by ID, for example, `*standard,customfield_10001`. To find custom field IDs, follow [this guide](https://confluence.atlassian.com/jirakb/get-custom-field-ids-for-jira-and-jira-service-management-744522503.html). Column names in the `ISSUE` table are derived from Jira field display names by: 1. Uppercasing the display name. 2. Replacing spaces with underscores. 3. Removing every character that isn't a letter, digit, or underscore. For example, the display name `OF Test (Multi-User)` becomes the column `OF_TEST_MULTIUSER`. If two fields produce the same column name after this transformation, the second field's column is suffixed with `__` to keep names unique. For example, two fields with display name `Custom Field` and IDs `customfield_1` and `customfield_2` produce columns `CUSTOM_FIELD` and `CUSTOM_FIELD__CUSTOMFIELD_2`. Jira field types are mapped to Snowflake column types as follows:
## Next steps - [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-agile) to install the agile flow. - [](/user-guide/data-integration/openflow/connectors/jira-cloud/migrate-from-legacy) if you're moving from a previous version of the Jira Cloud connector. --- title: Set up the Openflow Connector for Amazon Ads source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/amazon-ads/setup.md section: Loading & Unloading Data --- # Set up the %amazonads% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up the %amazonads%. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/amazon-ads/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you have reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-amazon-ads) connector. ## Get the credentials As an Amazon Ads administrator, perform the following actions: 1. Make sure that you have access to an [Amazon Ads account](https://advertising.amazon.com/). 2. [Acquire Access to Amazon Ads API](https://advertising.amazon.com/API/docs/en-us/guides/onboarding/overview) and complete the onboarding process. 3. [Get client ID and client secret](https://advertising.amazon.com/API/docs/en-us/guides/get-started/retrieve-access-token). 4. [Create an authorization grant](https://advertising.amazon.com/API/docs/en-us/guides/get-started/create-authorization-grant) and [retrieve a refresh token](https://advertising.amazon.com/API/docs/en-us/guides/get-started/retrieve-access-token). 5. Review the [available regions](https://advertising.amazon.com/API/docs/en-us/reference/api-overview#api-endpoints) and get a base URL used for requests based on the region in which you are advertising. 6. [Fetch profile IDs](https://advertising.amazon.com/API/docs/en-us/guides/get-started/retrieve-profiles) for report configuration. ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1. Create a new role or use an existing role and grant the [](#label-database-privileges). 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. ## Set up the connector As a data engineer, perform the following tasks to install and configure the connector: ### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values as described in [Flow parameters](#flow-parameters). ### Flow parameters This section describes the flow parameters that you can configure based on the following parameter contexts: - [Amazon Ads source parameters](#amazon-ads-source-parameters): Used to establish connection with Amazon Ads API. - [Amazon Ads destination parameters](#amazon-ads-destination-parameters): Used to establish connection with Snowflake. - [Amazon Ads ingestion parameters](#amazon-ads-ingestion-parameters): Used to define the configuration of data downloaded from Amazon Ads. #### Amazon Ads source parameters
#### Amazon Ads destination parameters
#### Amazon Ads Ingestion Parameters
Data retention in the Amazon Ads API is a specific timeframe, ranging from 60 to 365 days depending on the report type, during which historical advertising performance data is stored and accessible for retrieval. After this period, older data may no longer be available. ## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. --- title: Set up the Openflow Connector for Box source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/box/setup.md section: Loading & Unloading Data --- # Set up the %box% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up the %box%. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/box/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you have reviewed [configuring requireddomains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-box) connector. ## Get the credentials As a **Box developer** or **Box administrator**, create a [Box Platform application](https://developer.box.com/guides/applications/app-types/platform-apps/) as follows: 1. Navigate to [Box Developer Console](https://app.box.com/developers/console). 2. Select **Create Platform App**. 3. Select **Custom App** as the application type. 4. Provide a name and description for the app, and select a purpose from the drop-down list. 5. Select **Server Authentication (with JWT)** as the authentication method. 6. Select **Create App**. 7. To configure the app, navigate to the **Configuration** tab. 8. In the **App Access Level** section, select **App + Enterprise Access**. 9. In the **Application Scopes** section, select the following options: - **Read all files and folders stored in Box**. - **Write all files and folders stored in Box**: To download files and folders. Note that the connector can't upload any files. Snowflake recommends granting the service account with only the Viewer role. To grant the application access to files in Box, select a folder that you want to synchronize. Share it with the app service account using the email of the service account from step n. %box% is able to discover and download files from the specified folder and all its subfolders, but it cannot modify the files. - **Manage users**: To read users in the enterprise. - **Manage groups**: To read groups and their members in the enterprise. - **Manage enterprise properties**: To read enterprise events. 10. In the **Add and Manage Public Keys** section, generate a public/private key pair. Box downloads a JSON configuration file with a private key. 11. Save the changes. 12. Navigate to the **Authorization** tab, and submit the app for authorization for access to the enterprise. 13. Request your enterprise administrator to approve the app. 14. After the approval is granted, go to the **General Settings** tab and save the app service account email address. For more information, see [Setup with JWT](https://developer.box.com/guides/authentication/jwt/jwt-setup/). ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks manually or by using the script included below: 1. Create a new role or use an existing role and grant the [](#label-database-privileges). 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. ### Example setup
```sql --The following script assumes you'll need to create all required roles, users, and objects. --However, you may want to reuse some that are already in existence. --Create a Snowflake service user to manage the connector USE ROLE USERADMIN; CREATE USER TYPE=SERVICE COMMENT='Service user for Openflow automation'; --Create a pair of secure keys (public and private). For more information, see --key-pair authentication. Store the private key for the user in a file to supply --to the connector’s configuration. Assign the public key to the Snowflake service user: ALTER USER SET RSA_PUBLIC_KEY = ''; --Create a role to manage the connector and the associated data and --grant it to that user USE ROLE SECURITYADMIN; CREATE ROLE ; GRANT ROLE TO USER ; --The following block is for the use case: Ingest files and perform processing with Cortex --Create a role for read access to the cortex search service created by this connector. --This role should be granted to any role that will use the service CREATE ROLE ; GRANT ROLE TO ROLE ; --Create the database the data will be stored in and grant usage to the roles created USE ROLE ACCOUNTADMIN; --use whatever role you want to own your DB CREATE DATABASE IF NOT EXISTS ; GRANT USAGE ON DATABASE TO ROLE ; --Create the schema the data will be stored in and grant the necessary privileges --on that schema to the connector admin role: USE DATABASE ; CREATE SCHEMA IF NOT EXISTS ; GRANT USAGE ON SCHEMA TO ROLE ; GRANT CREATE TABLE, CREATE DYNAMIC TABLE, CREATE STAGE, CREATE SEQUENCE, CREATE CORTEX SEARCH SERVICE ON SCHEMA TO ROLE ; --The following block is for use case: Ingest files and perform processing with Cortex --Grant the Cortex read-only role access to the database and schema GRANT USAGE ON DATABASE TO ROLE ; GRANT USAGE ON SCHEMA TO ROLE ; --Create the warehouse this connector will use if it doesn't already exist. Grant the --appropriate privileges to the connector admin role. Adjust the size according to your needs. CREATE WAREHOUSE WITH WAREHOUSE_SIZE = 'MEDIUM' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE; GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE ; ```
## Use cases You can configure the connector for the following use cases: - [Ingest files only](#ingest-files-only) - [Ingest files and perform processing with Cortex](#ingest-files-and-perform-processing-with-cortex) - [Extract Box metadata using Box AI and ingest it into a Snowflake table](#extract-box-metadata-using-box-ai-and-ingest-it-into-a-snowflake-table) - [Synchronize Box file metadata instances with a Snowflake table](#synchronize-box-file-metadata-instances-with-a-snowflake-table) ### Ingest files only Use the connector definition to perform custom processing on ingested files. #### Set up the connector As a data engineer, perform the following tasks to install and configure the connector: ##### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ##### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Enter the required parameter values as described in [Box ingestion parameters](#box-ingestion-parameters), [Box destination parameters](#box-destination-parameters) and [Box source parameters](#box-source-parameters). ###### Box source parameters
###### Box destination parameters
###### Box ingestion parameters
#### Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. After starting the connector, it retrieves all files from the specified folder, and then consumes `admin_logs_streaming` events within the last 14 days. This is done to capture data that may otherwise have been missed during the initialization process. During that time, `not found` errors may occur, which are caused by files that appear in the events but are no longer present. ### Ingest files and perform processing with Cortex Use the connector definition to: - Create AI assistants for public documents within your organization's Box enterprise - Enable your AI assistants to adhere to access controls specified in your organization's Box enterprise #### Set up the connector As a data engineer, perform the following tasks to install and configure the connector: ##### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ##### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values as described in [Box Cortex Connect Ingestion Parameters](#box-cortex-connect-ingestion-parameters), [Box Cortex Connect Destination Parameters](#box-cortex-connect-destination-parameters) and [Box Cortex Connect Source Parameters](#box-cortex-connect-source-parameters). ###### Box Cortex Connect Source Parameters
###### Box Cortex Connect Destination Parameters
###### Box Cortex connect ingestion parameters
#### Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. After starting the connector, it retrieves all files from the specified folder, and then consumes `admin_logs_streaming` events within the last 14 days. This is done to capture any data that may have been missed during the initialization process. During that time, `not found` errors may occur, caused by the files that appear in the events but are no longer present. #### Query the Cortex Search service You can use the [Cortex Search](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) service to build chat and search applications to chat with or query your documents in Box. After you install and configure the connector and it begins ingesting content from Box, you can query the Cortex Search service. For more information about using Cortex Search, see [Query a Cortex Search service](/user-guide/snowflake-cortex/cortex-search/query-cortex-search-service). **Filter responses** To restrict responses from the Cortex Search service to documents that a specific user has access to in Box, you can specify a filter containing the user ID or email address of the user when you query Cortex Search. For example, `filter.@contains.user_ids` or `filter.@contains.user_emails`. The name of the Cortex Search service created by the connector is `search_service` in the schema `Cortex`. Run the following SQL code in a SQL worksheet to query the Cortex Search service with files ingested from your Box site. Replace the following: - application_instance_name: Name of your database and connector application instance. - user_emailID: Email ID of the user who you want to filter the responses for. - your_question: The question that you want to get responses for. - number_of_results: Maximum number of results to return in the response. The maximum value is 1,000 and the default value is 10. ```sql SELECT PARSE_JSON( SNOWFLAKE.CORTEX.SEARCH_PREVIEW( '.cortex.search_service', '{ "query": "", "columns": ["chunk", "web_url"], "filter": {"@contains": {"user_emails": ""} }, "limit": }' ) )['results'] AS results ``` Here is a complete list of values that you can enter for `columns`:
**Example: Query an AI assistant for human resources (HR) information** You can use Cortex Search to query an AI assistant for employees to chat with the latest versions of HR information, such as onboarding, code of conduct, team processes, and organization policies. Using response filters, you can also allow HR team members to query employee contracts while adhering to access controls configured in Box.
Run the following in a [SQL worksheet](#label-snowsight-worksheets-create-file) to query the Cortex Search service with files ingested from Box. Select the database as your application instance name and schema as **Cortex**. Replace the following: - application_instance_name: Name of your database and connector application instance. - user_emailID: Email ID of the user who you want to filter the responses for. ```sql SELECT PARSE_JSON( SNOWFLAKE.CORTEX.SEARCH_PREVIEW( '.cortex.search_service', '{ "query": "What is my vacation carryover policy?", "columns": ["chunk", "web_url"], "filter": {"@contains": {"user_emails": ""} }, "limit": 1 }' ) )['results'] AS results ```
**Python:** Run the following code in a [Python worksheet](#label-snowsight-worksheets-create) to query the Cortex Search service with files ingested from Box. Ensure that you add the `snowflake.core` package to your database. Replace the following: - application_instance_name: Name of your database and connector application instance. - user_emailID: Email ID of the user who you want to filter the responses for. ```python from snowflake.snowpark import Session from snowflake.core import Root def main(session: snowpark.Session): root = Root(session) # fetch service my_service = (root .databases[""] .schemas["cortex"] .cortex_search_services["search_service"] ) # query service resp = my_service.search( query="What is my vacation carryover policy?", columns = ["chunk", "web_url"], filter = {"@contains": {"user_emails": ""} }, limit=1 ) return (resp.to_json()) ``` **REST API:** Execute the following code in a command-line interface to query the Cortex Search service with files ingested from your Box. Access to the Snowflake REST APIs requires authentication via both key pair authentication and OAuth. For more information, see [](#label-cortex-search-query-syntax-rest) and [](/developer-guide/snowflake-rest-api/authentication). Replace the following: - application_instance_name: Name of your database and connector application instance. - account_url: Your Snowflake account URL. For instructions on finding your account URL, see [](#label-account-name-find). ```bash curl --location "https:///api/v2/databases//schemas/cortex/cortex-search-services/search_service" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer " \ --data '{ "query": "What is my vacation carryover policy?", "columns": ["chunk", "web_url"], "limit": 1 }' ``` Sample response: ```text { "results" : [ { "web_url" : "https://.box.com/sites//", "chunk" : "Answer to the question asked." } ] } ``` ### Extract Box metadata using Box AI and ingest it into a Snowflake table Use the connector definition to: - Extract metadata about your Box files and ingest them to into a Snowflake table - Perform operations on the metadata of your files stored in Box #### Create a Snowflake table for storing the Box metadata 1. Ensure that Box AI is enabled for the extraction of metadata to occur. For more information, see [Configuring Box AI](https://support.box.com/hc/en-us/articles/22166647877011-Configuring-Box-AI). 2. Create a Snowflake table where the metadata will be sent For the connector to know what kind of metadata to extract, you must create a Snowflake table in your database and schema with the column names of the fields you would like to extract. Add descriptions to each column to improve the performance of the model used to extract the metadata from the files. 3. In the table created in the previous step, ensure that there is a column to store the Box file ID and that it is of type VARCHAR. The name of this column is required to be entered as the Box File Identifier Column parameter in later steps. The list of supported columns types for the metadata table is VARCHAR, STRING, TEXT, FLOAT, DOUBLE, and DATE. Here is an example of the table that you can create for this connector: ```sql CREATE OR REPLACE TABLE OPENFLOW.BOX_METADATA_SCHEMA.LOAN_AGREEMENT_METADATA ( BOX_FILE_ID VARCHAR COMMENT 'Box file identifier column', LOAN_ID STRING COMMENT 'Unique loan agreement identifier (e.g. L-2025-0001)', BORROWER_NAME STRING COMMENT 'Name of the borrower entity or individual', LENDER_NAME STRING COMMENT 'Name of the lending institution', LOAN_AMOUNT DOUBLE COMMENT 'Principal amount of the loan (in USD)', INTEREST_RATE FLOAT COMMENT 'Annual interest rate (%)', EFFECTIVE_DATE DATE COMMENT 'Date on which the loan becomes effective', MATURITY_DATE DATE COMMENT 'Scheduled loan maturity date', LOAN_TERM_MONTHS FLOAT COMMENT 'Original term length in months', COLLATERAL_DESCRIPTION TEXT COMMENT 'Description of collateral securing the loan', CREDIT_SCORE FLOAT COMMENT 'Borrower credit score', JURISDICTION STRING COMMENT 'Governing law jurisdiction (e.g. NY, CA)' ); ``` #### Set up the connector As a data engineer, perform the following tasks to install and configure the connector: ##### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ##### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values as described in [Box Ingest Metadata Source Parameters](#box-ingest-metadata-source-parameters), [Box Ingest Metadata Destination Parameters](#box-ingest-metadata-destination-parameters) and [Box Ingest Metadata Ingestion Parameters](#box-ingest-metadata-ingestion-parameters). ###### Box Ingest Metadata Source Parameters
###### Box Ingest Metadata Destination Parameters
###### Box Ingest Metadata Ingestion Parameters
#### Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. After starting the connector, it retrieves all files from the specified folder, and then consumes `admin_logs_streaming` events from the last 14 days. This is done to capture any data that may have been missed during the initialization process. During that time, `not found` errors may occur, caused by the files that appear in the events but are no longer present. ### Synchronize Box file metadata instances with a Snowflake table Use the connector definition to perform a data transformation on metadata from Box in a Snowflake table and add the changes back to a Box metadata instance. #### Create a Snowflake stream for storing the Box metadata 1. Create a Snowflake stream for the metadata table you want to use. The stream is used to monitor any changes that occur to the table with which you want to synchronize your Box files. To learn how to create a table for storing Box metadata, see [Create a Snowflake table for storing the Box metadata](#create-a-snowflake-table-for-storing-the-box-metadata). If the connector is stopped beyond the data retention time and the stream becomes stale, then you must recreate a stream and replace the previous one. To learn more about managing streams, see [](/user-guide/streams-manage). Here is an example of a stream that you can create for this connector: ```sql CREATE OR REPLACE STREAM OPENFLOW.BOX_METADATA_SCHEMA.LOAN_AGREEMENT_METADATA_STREAM ON TABLE OPENFLOW.BOX_METADATA_SCHEMA.LOAN_AGREEMENT_METADATA ``` 2. In the metadata table, ensure that there is a column to store the Box file ID and that it is of type VARCHAR. The name of this column is required to be entered as the Box File Identifier Column parameter in later steps. The list of supported columns types for the metadata table is VARCHAR, STRING, TEXT, FLOAT, DOUBLE, and DATE. #### Set up the connector As a data engineer, perform the following tasks to install and configure the connector: ##### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ##### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values as described in [Box Publish Metadata Source Parameters](#box-publish-metadata-source-parameters), [Box Publish Metadata Destination Parameters](#box-publish-metadata-destination-parameters) and [Box Publish Metadata Ingestion Parameters](#box-publish-metadata-ingestion-parameters). ###### Box Publish Metadata Source Parameters
###### Box Publish Metadata Destination Parameters
###### Box Publish Metadata Ingestion Parameters
#### Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. After running the flow, you can query the Cortex Search service. For information on how to query the Cortex Search service, see [Query the Cortex Search service](#query-the-cortex-search-service). ### Finding files in stage Files stored in the stage may have unreadable names. To find specific files, use the metadata tables as your source of truth. These tables contain the mapping between file names and their corresponding file IDs in the stage. For Cortex-enabled setups, use the following query to find files: ```sql SELECT DISTINCT METADATA:id FROM DOCS_CHUNKS WHERE METADATA:fullName LIKE '%'; ``` For non-Cortex setups, use the following query: ```sql SELECT FILE_ID FROM DOC_METADATA WHERE FILE_NAME = ''; ``` Replace `` with the name or partial name of the file you're looking for. The files in the stage start with the ID returned from these queries. --- title: Set up the Openflow Connector for Google Ads source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/google-ads/setup.md section: Loading & Unloading Data --- # Set up the Openflow Connector for Google Ads This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up the Openflow Connector for Google Ads. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/google-ads/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-google-ads) connector. ## Get the credentials As a Google Ads administrator, perform the following steps: - Ensure that you have access to a Google Cloud project or [create a new one](https://developers.google.com/workspace/guides/create-project). - Ensure that the [Google Ads API](https://cloud.google.com/endpoints/docs/openapi/enable-api) is enabled for your Google Cloud project. Google Ads API access is required to ingest data. - [Configure](https://developers.google.com/google-ads/api/docs/oauth/service-accounts) Service account authentication for Google Ads. - Obtain developer token for your organization following [instructions](https://developers.google.com/google-ads/api/docs/get-started/dev-token). Developer token should have Access Level either Basic or Standard. For more information about Access Level please see [documentation](https://developers.google.com/google-ads/api/docs/access-levels). ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1. Create a new role or use an existing role and grant the [](#label-database-privileges). 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. ## Set up the connector As a data engineer, perform the following tasks to install and configure the connector: ### Install the connector 1. Create a database and schema in Snowflake for the connector to store ingested data. Grant required [](#label-database-privileges) to the role created in the first step. Substitute the role placeholder with the actual value and use the following sql commands:
```sql CREATE DATABASE GOOGLE_ADS_DESTINATION_DB; CREATE SCHEMA GOOGLE_ADS_DESTINATION_DB.GOOGLE_ADS_DESTINATION_SCHEMA; GRANT USAGE ON DATABASE GOOGLE_ADS_DESTINATION_DB TO ROLE ; GRANT USAGE ON SCHEMA GOOGLE_ADS_DESTINATION_DB.GOOGLE_ADS_DESTINATION_SCHEMA TO ROLE ; GRANT CREATE TABLE ON SCHEMA GOOGLE_ADS_DESTINATION_DB.GOOGLE_ADS_DESTINATION_SCHEMA TO ROLE ; ```
To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values as described in [Flow parameters](#flow-parameters). #### Flow parameters There are three parameter contexts. *Google Ads Destination Parameters* and *Google Ads Source Parameters* are respectively responsible for allowing connections with GoogleAds API and Snowflake. *Google Ads Ingestion Parameters* is used to define the reconfiguration of data downloaded from Google Ads. *Google Ads Parameters* aggregates all of them in one. ##### Google Ads Ingestion Parameters
The easiest way to obtain proper combination of *Report Attributes*, *Report Metrics* and *Report Segments* is to use [Google Ads Query Builder](https://developers.google.com/google-ads/api/fields/v19/overview_query_builder). Select the resource based on the one inserted into parameter *Google Ads Resource Name* and construct the query. Then copy and pase attributes, metrics and segments to corresponding parameters. ##### Google Ads Source Parameters
##### Google Ads Destination Parameters
## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. ## How to reset the connector To fully reset connector to the initial state, do the following: 1. Ensure that there are no more flow files in the queues. 2. Stop all the processors. 3. Clear the state of the initial processor. 1. Right click on the processor `Get Google Ads Report` and select **View State**. 2. Select the option **Clear State**. This resets the state of the processor. 4. Drop the destination table in Snowflake. --- title: Set up the Openflow Connector for Google Drive source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/google-drive/setup.md section: Loading & Unloading Data --- # Set up the Openflow Connector for Google Drive This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up the Openflow Connector for Google Drive. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/google-drive/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-google-drive) connector. ## Get the credentials Setting up the connector requires specific permissions and account settings for Snowflake Openflow processors to read data from Google. This access is provided in part through setting up a service account and a key for Openflow to authenticate as that service account. For more information, see: - [Configure access to the Google Cloud Search API](https://developers.google.com/cloud-search/docs/guides/project-setup#create_service_account_credentials) - [Delegating domain-wide authority to the service account](https://developers.google.com/identity/protocols/oauth2/service-account#delegatingauthority) As a Google Drive administrator, perform the following steps: ### Prerequisites Ensure that you meet the following requirements: - You have a Google user with Super Admin permissions - You have a Google Cloud Project with the following roles: - Organization Policy Administrator - Organization Administrator ### Enable service account key creation By default Google disables service account key creation. For Openflow to use the service account JSON, this key creation policy must be turned off. 1. Log in to the [Google Cloud Console](https://console.cloud.google.com/) with a super admin account that has the Organizational Policy Admin Role. 2. Ensure you are in the project associated with your organization, not the project in your organization. 3. Click **Organization Policies**. 4. Select the **Disable service account key creation** policy. 5. Click **Manage Policy** and turn off enforcement. 6. Click **Set Policy**. ### Create service account and key 1. Open the [Google Cloud Console](https://console.cloud.google.com/) and authenticate using a user that has been granted access to create service accounts. 2. Ensure you are in a project of your organization. 3. In the left navigation, under the **IAM & Admin**, select the **Service Accounts** tab. 4. Click **Create Service Account**. 5. Enter the service account name and click **Create and Continue**. 6. Click **Done**. In the table with the service accounts listed, find the **OAuth 2 Client ID** column. Copy the Client ID as this will be required later to set up domain-wide delegation in the next section. 7. On the newly created service account, click the menu under the table with the service accounts listed for that service account and select **Manage keys**. 8. Select **Add key** and then **Create new key**. 9. Leave the default selection of JSON and click **Create**. The key is downloaded into your browser Downloads directory as a .json file. ### Grant service account domain-wide delegation for listed scopes 1. Log in to your Google Admin account. 2. Select **Admin** from **Google Apps selector**. 3. In the left navigation, expand **Security** and then **Access** and select **Data control** then click on **API Controls**. 4. On the API **Controls** screen, select **Manage domain wild delegation**. 5. Click **Add new**. 6. Enter the OAuth 2 Client ID taken from the Create Service Account and Key section and the following scopes: - [https://www.googleapis.com/auth/drive](https://www.googleapis.com/auth/drive) - [https://www.googleapis.com/auth/drive.metadata.readonly](https://www.googleapis.com/auth/drive.metadata.readonly) - [https://www.googleapis.com/auth/admin.directory.group.member.readonly](https://www.googleapis.com/auth/admin.directory.group.member.readonly) - [https://www.googleapis.com/auth/admin.directory.group.readonly](https://www.googleapis.com/auth/admin.directory.group.readonly) - [https://www.googleapis.com/auth/drive.file](https://www.googleapis.com/auth/drive.file) - [https://www.googleapis.com/auth/drive.metadata](https://www.googleapis.com/auth/drive.metadata) 7. Click **Authorize**. ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks manually or by using the script included below: 1. Create a new role or use an existing role and grant the [](#label-database-privileges). 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. ### Example setup
```sql --The following script assumes you'll need to create all required roles, users, and objects. --However, you may want to reuse some that are already in existence. --Create a Snowflake service user to manage the connector USE ROLE USERADMIN; CREATE USER TYPE=SERVICE COMMENT='Service user for Openflow automation'; --Create a pair of secure keys (public and private). For more information, see --key-pair authentication. Store the private key for the user in a file to supply --to the connector’s configuration. Assign the public key to the Snowflake service user: ALTER USER SET RSA_PUBLIC_KEY = ''; --Create a role to manage the connector and the associated data and --grant it to that user USE ROLE SECURITYADMIN; CREATE ROLE ; GRANT ROLE TO USER ; --The following block is for USE CASE 2 (Cortex connect) ONLY --Create a role for read access to the cortex search service created by this connector. --This role should be granted to any role that will use the service CREATE ROLE ; GRANT ROLE TO ROLE ; --Create the database the data will be stored in and grant usage to the roles created USE ROLE ACCOUNTADMIN; --use whatever role you want to own your DB CREATE DATABASE IF NOT EXISTS ; GRANT USAGE ON DATABASE TO ROLE ; --Create the schema the data will be stored in and grant the necessary privileges --on that schema to the connector admin role: USE DATABASE ; CREATE SCHEMA IF NOT EXISTS ; GRANT USAGE ON SCHEMA TO ROLE ; GRANT CREATE TABLE, CREATE DYNAMIC TABLE, CREATE STAGE, CREATE SEQUENCE, CREATE CORTEX SEARCH SERVICE ON SCHEMA TO ROLE ; --The following block is for CASE 2 (Cortex connect) ONLY --Grant the Cortex read-only role access to the database and schema GRANT USAGE ON DATABASE TO ROLE ; GRANT USAGE ON SCHEMA TO ROLE ; --Create the warehouse this connector will use if it doesn't already exist. Grant the --appropriate privileges to the connector admin role. Adjust the size according to your needs. CREATE WAREHOUSE WITH WAREHOUSE_SIZE = 'MEDIUM' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE; GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE ; ```
## Use case 1: Use the connector definition to ingest files only Use the connector definition to: - Perform custom processing on ingested files - Ingest Google Drive files and permissions and keep them up to date ### Set up the connector As a data engineer, perform the following tasks to install and configure the connector: #### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. #### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Enter the required parameter values as described in [Google Drive Source Parameters](#google-drive-source-parameters), [Google Drive Destination Parameters](#google-drive-destination-parameters) and [Google Drive Ingestion Parameters](#google-drive-ingestion-parameters). ##### Google Drive Source Parameters
##### Google Drive Destination Parameters
##### Google Drive Ingestion Parameters
1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. ## Use case 2: Use the connector definition to ingest files and perform processing with Cortex Use the predefined flow definition to: - Create AI assistants for public documents within your organization's Google Drive. - Enable your AI assistants to adhere to access controls specified in your organization's Google Drive. ### Set up the connector As a data engineer, perform the following tasks to install and configure the connector: #### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. #### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Enter the required parameter values as described in [Google Drive Cortex Connect Source Parameters](#google-drive-cortex-connect-source-parameters), [Google Drive Cortex Connect Destination Parameters](#google-drive-cortex-connect-destination-parameters) and [Google Drive Cortex Connect Ingestion Parameters](#google-drive-cortex-connect-ingestion-parameters). ##### Google Drive Cortex Connect Source Parameters
##### Google Drive Cortex Connect Destination Parameters
##### Google Drive Cortex Connect Ingestion Parameters
1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. 3. [](#label-openflow-gdrive-cortex). ## Use case 3: Customise the connector definition Customize the connector definition to perform custom processing on ingested files. ### Set up the connector As a data engineer, perform the following tasks to install and configure the connector: #### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. #### Configure the connector 1. Customize the connector definition. 1. Remove the following process groups: - Check If Duplicate Content - Snowflake Stage and Parse PDF - Update Snowflake Cortex 2. Attach any custom processing to the output of the *Process Google Drive Metadata* process group. Each flow file represents a single Google Drive file change. Flow file attributes can be seen in the `Fetch Google Drive Metadata` documentation. 2. Populate the process group parameters. Follow the same process as for [Use case 1: Use the connector definition to ingest files only](#use-case-1-use-the-connector-definition-to-ingest-files-only). Note that after modifying the connector definition, not all parameters might be required. ### Run the flow 1. Run the flow. 1. Start the process group. The flow will create all required objects inside of Snowflake. 2. Right click on the imported process group and select **Start**. 2. [](#label-openflow-gdrive-cortex). #### Query the Cortex Search service You can use the [Cortex Search](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) service to build chat and search applications to chat with or query your documents in Google Drive. After you install and configure the connector and it begins ingesting content from Google Drive, you can query the Cortex Search service. For more information about using Cortex Search, see [Query a Cortex Search service](/user-guide/snowflake-cortex/cortex-search/query-cortex-search-service). **Filter responses** To restrict responses from the Cortex Search service to documents that a specific user has access to in Google Drive, you can specify a filter containing the user ID or email address of the user when you query Cortex Search. For example, `filter.@contains.user_ids` or `filter.@contains.user_emails`. The name of the Cortex Search service created by the connector is `search_service` in the schema `Cortex`. Run the following SQL code in a SQL worksheet to query the Cortex Search service with files ingested from your Google Drive. Replace the following: - application_instance_name: Name of your database and connector application instance. - user_emailID: Email ID of the user who you want to filter the responses for. - your_question: The question that you want to get responses for. - number_of_results: Maximum number of results to return in the response. The maximum value is 1000 and the default value is 10. ```sql SELECT PARSE_JSON( SNOWFLAKE.CORTEX.SEARCH_PREVIEW( '.cortex.search_service', '{ "query": "", "columns": ["chunk", "web_url"], "filter": {"@contains": {"user_emails": ""} }, "limit": }' ) )['results'] AS results ``` Here's a complete list of values that you can enter for `columns`:
**Example: Query an AI assistant for human resources (HR) information** You can use Cortex Search to query an AI assistant for employees to chat with the latest versions of HR information, such as onboarding, code of conduct, team processes, and organization policies. Using response filters, you can also allow HR team members to query employee contracts while adhering to access controls configured in Google Drive.
Run the following in a [SQL worksheet](#label-snowsight-worksheets-create-file) to query the Cortex Search service with files ingested from Google Drive. Select the database as your application instance name and schema as **Cortex**. Replace the following: - application_instance_name: Name of your database and connector application instance. - user_emailID: Email ID of the user who you want to filter the responses for. ```sql SELECT PARSE_JSON( SNOWFLAKE.CORTEX.SEARCH_PREVIEW( '.cortex.search_service', '{ "query": "What is my vacation carry over policy?", "columns": ["chunk", "web_url"], "filter": {"@contains": {"user_emails": ""} }, "limit": 1 }' ) )['results'] AS results ```
**Python:** Run the following code in a [Python worksheet](#label-snowsight-worksheets-create) to query the Cortex Search service with files ingested from Google Drive. Ensure that you add the `snowflake.core` package to your database. Replace the following: - application_instance_name: Name of your database and connector application instance. - user_emailID: Email ID of the user who you want to filter the responses for. ```python from snowflake.snowpark import Session from snowflake.core import Root def main(session: snowpark.Session): root = Root(session) # fetch service my_service = (root .databases[""] .schemas["cortex"] .cortex_search_services["search_service"] ) # query service resp = my_service.search( query="What is my vacation carry over policy?", columns = ["chunk", "web_url"], filter = {"@contains": {"user_emails": ""} }, limit=1 ) return (resp.to_json()) ``` **REST API:** Execute the following code in a command-line interface to query the Cortex Search service with files ingested from your Google Drive. You will need to authentication through key pair authentication and OAuth to access the Snowflake REST APIs. For more information, see [](#label-cortex-search-query-syntax-rest) and [](/developer-guide/snowflake-rest-api/authentication). Replace the following: - application_instance_name: Name of your database and connector application instance. - account_url: Your Snowflake account URL. For instructions on finding your account URL, see [](#label-account-name-find). ```bash curl --location "https:///api/v2/databases//schemas/cortex/cortex-search-services/search_service" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer " \ --data '{ "query": "What is my vacation carry over policy?", "columns": ["chunk", "web_url"], "limit": 1 }' ``` ## Finding files in stage Files stored in the stage may have unreadable names. To find specific files, use the metadata tables as your source of truth. These tables contain the mapping between file names and their corresponding file IDs in the stage. For Cortex-enabled setups, use the following query to find files: ```sql SELECT DISTINCT METADATA:id FROM DOCS_CHUNKS WHERE METADATA:fullName LIKE '%%'; ``` For non-Cortex setups, use the following query: ```sql SELECT FILE_ID FROM DOC_METADATA WHERE FILE_NAME = ''; ``` Replace `` with the name or partial name of the file you're looking for. The files in the stage start with the ID returned from these queries. --- title: Set up the Openflow Connector for Google Sheets source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/google-sheets/setup.md section: Loading & Unloading Data --- # Set up the %sheets% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up the %sheets%. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/google-sheets/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you have reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-google-sheets) connector. ## Get the Google Cloud credentials and set up your Google Cloud Project As a Google Cloud administrator, perform the following tasks: 1. Ensure that you have the following: - A Google user with [Super Admin permissions](https://support.google.com/a/answer/2405986?hl) - A [Google Cloud Project](https://developers.google.com/workspace/guides/create-project) with the following roles: - [Organization Policy Administrator](https://cloud.google.com/iam/docs/understanding-roles#orgpolicy.policyAdmin) - [Organization Administrator](https://cloud.google.com/iam/docs/understanding-roles#resourcemanager.organizationAdmin) 2. Enable service account key creation. Google disables service account key creation by default. This key creation policy must be turned off for Snowflake Openflow to use the service account JSON. To enable service account key creation, perform the following tasks: 1. Log in to the [Google Cloud Console](https://console.cloud.google.com/) with a super admin account that has the Organizational Policy Admin role. 2. Ensure that you are in the project associated with your organization, not the project in your organization. 3. Select **Organization Policies**. 4. Select the **Disable service account key creation** policy. 5. Select **Manage Policy** and turn off enforcement. 6. Select **Set Policy**. 3. [Create a service account and key](https://developers.google.com/workspace/guides/create-credentials#service-account). 4. Share the Google Sheets spreadsheet with the service account email address. The email address can be found in the service account JSON file under the *client_email* field. Set the sharing permissions to *Viewer*. 5. Enable the Google Sheets API for your Google Cloud Project. For more information, see [Enable the Google Sheets API](https://developers.google.com/sheets/api/guides/concepts#enable_the_google_sheets_api). ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1. Create a new role or use an existing role and grant the [](#label-database-privileges). 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. ## Set up the connector As a data engineer, perform the following tasks to install and configure the connector: ### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values as described in [Flow parameters](#flow-parameters). ### Flow parameters The configuration of the connector definition is divided into three parameter contexts: - [Google Sheets Source Parameters](#google-sheets-source-parameters): Used to establish connection with Google Sheets. - [Google Sheets Destination Parameters](#google-sheets-destination-parameters): Used to establish connection with Snowflake. - [Google Sheets Ingestion Parameters](#google-sheets-ingestion-parameters): Used to define the configuration of data downloaded from Google Sheets. The [Google Sheets Ingestion Parameters](#google-sheets-ingestion-parameters) parameter context contains spreadsheet-specific details, so you must create new parameter contexts for each new spreadsheet and process group. To create a new parameter context, go to the Openflow Canvas menu, select **Parameter Contexts** and add a new parameter context. It inherits parameters from both the Google Sheets Destination Parameters and Google Sheets Source Parameters parameter contexts. The following tables describe the flow parameters that you can configure based on the parameter contexts: #### Google Sheets Destination Parameters
#### Google Sheets Source Parameters
#### Google Sheets Ingestion Parameters The following table lists only those parameters that are not inherited from other parameter contexts.
The destination table identifier is a combination of the destination table prefix and range name and must be unique. If you download data from multiple spreadsheets, or single sheets, and ranges names are not unique, then you must specify unique destination table prefix for each flow. The connector may fail, overwriting existing destination tables, if destination table names aren't unique. ## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. Imported *.xlsx* must be in Google Sheets format. If you import files, ensure that the file is converted to Google Sheets format before running flows. Spreadsheets in any format other than Google Sheets cannot be read. For more information, see [Convert files to Google Sheets format](https://support.google.com/docs/answer/9331167?hl=en#2.5). --- title: Set up the Openflow Connector for HubSpot source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/hubspot/setup.md section: Loading & Unloading Data --- # Set up the %hubspot% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up the %hubspot%. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/hubspot/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-hubspot) connector. ## Get the credentials As a HubSpot administrator, generate a HubSpot private app token or create one in your HubSpot account. This lets you authenticate your requests to the HubSpot API. 1. Log in to your HubSpot account. 2. Navigate to **Settings** by selecting the gear icon in the top navigation bar. 3. In the left navigation, go to **Integrations** %raa% **Private Apps**. 4. Select **Create a private app**. 1. Enter a name for your app. 2. Navigate to the **Scopes** tab. 3. Select the scopes required for the API requests you intend to make. To find scopes required for the API requests, see [Scopes](https://developers.hubspot.com/docs/guides/apps/authentication/scopes). 4. Select **Create app**. 5. Set the required scopes for the API requests you intend to make for each endpoint. 5. Select **View access token** to view the access token. Paste the token in the connector parameters, or save it securely. ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1. Create a new role or use an existing role and grant the [](#label-database-privileges) and [](#label-view-privileges). 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. If for any reason, you do not want to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. After the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 7. Create a database and schema in Snowflake for the connector to store ingested data. Grant the following [](#label-database-privileges) to the role created in the first step. ```sql CREATE DATABASE hubspot_destination_db; CREATE SCHEMA hubspot_destination_db.hubspot_destination_schema; GRANT USAGE ON DATABASE hubspot_destination_db TO ROLE ; GRANT USAGE ON SCHEMA hubspot_destination_db.hubspot_destination_schema TO ROLE ; GRANT CREATE TABLE, CREATE VIEW ON SCHEMA hubspot_destination_db.hubspot_destination_schema TO ROLE ; ``` 8. Create a warehouse that will be used by the connector or use an existing one. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. 9. Ensure that the user with role used by the connector has the required privileges to use the warehouse. If that's not the case then grant the required privileges to the role. ```sql CREATE WAREHOUSE hubspot_connector_warehouse WITH WAREHOUSE_SIZE = 'X-Small'; GRANT USAGE ON WAREHOUSE hubspot_connector_warehouse TO ROLE ; ``` ## Set up the connector As a data engineer, perform the following tasks to install and configure the connector: ### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values as described in [Flow parameters](#flow-parameters). ### Flow parameters This section describes the flow parameters that you can configure based on the following parameter contexts: - [HubSpot Source Parameters](#hubspot-source-parameters): Used to establish connection with HubSpot. - [HubSpot Destination Parameters](#hubspot-destination-parameters): Used to establish connection with Snowflake. - [HubSpot Ingestion Parameters](#hubspot-ingestion-parameters): Used to define the configuration of data downloaded from HubSpot. #### HubSpot Source Parameters
#### HubSpot Destination Parameters
#### HubSpot Ingestion Parameters
## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. ### Reconfigure the connector You can modify the connector parameters after the connector has started ingesting data. If the issue query criteria changes, perform the following steps to make sure that the data in the destination table is consistent. 1. Stop the connector: Ensure that all Openflow processors are stopped. 2. Access configuration settings: Navigate to the connector's configuration settings within the Snowflake Openflow interface. 3. Modify parameters: Adjust the parameters as required. 4. Clear processor state: If you are changing ingestion criteria, then Snowflake strongly recommends that you start ingestion from the beginning to keep the data in the destination table consistent. After clearing the state in the `List Fresh HubSpot Objects` processor, the connector will fetch all the objects from the beginning. Manual truncation of the destination table may be needed to prevent duplication of rows. ## Data structure and views The connector stores data in the following two formats within your Snowflake database: ### Raw data storage All raw HubSpot data is stored in tables with the exact names specified in the Object Types parameter. For example: - If you configure `Products,Contacts,Companies` in the Object Types parameter, the connector creates three tables: `PRODUCTS`, `CONTACTS`, and `COMPANIES`. - Each table contains the complete JSON payload from the HubSpot API responses. - Raw data preserves the original structure and all metadata from HubSpot. ### Flattened views For easier querying and analysis, the connector automatically creates flattened views for each object type: - Each raw table has a corresponding view with the suffix `_VIEW`. For example: `PRODUCTS_VIEW`, `CONTACTS_VIEW`, and `COMPANIES_VIEW`. - Views extract commonly used fields from the JSON payload into individual columns. - Complex nested structures are flattened for simplified SQL queries. --- title: Set up the Openflow Connector for Jira Cloud source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/jira-cloud/setup.md section: Loading & Unloading Data --- # Set up the %jira% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [About Openflow](/user-guide/data-integration/openflow/about) - [Manage Openflow](/user-guide/data-integration/openflow/manage) - [Openflow connectors](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up the %jira%. ## Prerequisites 1. Ensure that you have reviewed [About Openflow Connector for Jira Cloud](about). 2. Ensure that you have [Set up Openflow - BYOC](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-jira-cloud) connector. ## Get the credentials As a Jira Cloud administrator, perform the following tasks in your Atlassian account: 1. Navigate to the [API tokens page](https://id.atlassian.com/manage-profile/security/api-tokens). 2. Select **Create API token with scopes**. 3. In the **Create an API token** dialog box, provide a descriptive name for the API token and select an expiration date for the API token. This can range from 1 to 365 days. 4. Select the Api token app **Jira**. 5. Select jira scopes `read:jira-work` and `read:jira-user`. 6. Select **Create token**. 7. In the **Copy your API token** dialog box, select **Copy** to copy your generated API token and then paste the token to the connector parameters, or save it securely. 8. Select **Close** to close the dialog box. ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1. Create a new role or use an existing role. 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 6. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 7. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 8. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 9. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 10. Create a database and schema in Snowflake for the connector to store ingested data. Grant the following [](#label-database-privileges) to the role created in the first step.
```sql CREATE DATABASE jira_destination_db; CREATE SCHEMA jira_destination_db.jira_destination_schema; GRANT USAGE ON DATABASE jira_destination_db TO ROLE ; GRANT USAGE ON SCHEMA jira_destination_db.jira_destination_schema TO ROLE ; GRANT CREATE TABLE, CREATE VIEW ON SCHEMA jira_destination_db.jira_destination_schema TO ROLE ; ```
11. Create a warehouse that will be used by the connector or use an existing one. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. 12. Ensure that the user with role used by the connector has the required privileges to use the warehouse. If that's not the case then grant the required privileges to the role.
```sql CREATE WAREHOUSE jira_connector_warehouse WITH WAREHOUSE_SIZE = 'X-Small'; GRANT USAGE ON WAREHOUSE jira_connector_warehouse TO ROLE ; ```
## Set up the connector As a data engineer, perform the following tasks to install and configure the connector: ### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values as described in [Flow parameters](#flow-parameters). ### Flow parameters This section describes the flow parameters that you can configure based on the following parameter contexts: - [Jira Cloud Source Parameters](#jira-cloud-source-parameters): Used to establish connection with Jira API. - [Jira Cloud Destination Parameters](#jira-cloud-destination-parameters): Used to establish connection with Snowflake. - [Jira Cloud Ingestion Parameters](#jira-cloud-ingestion-parameters): Used to define the configuration of data downloaded from Jira. Modifying the parameters related to ingestion configuration (for example, Search Type, JQL Query, Project Names, and Created After) will reset the state of the `FetchJiraIssues` processor, allowing it to fetch all issues again. This is useful if you want to change the issue query criteria or restart the ingestion from scratch. This reset action does not truncate the destination table. #### Jira Cloud Source Parameters
#### Jira Cloud Destination Parameters
#### Jira Cloud Ingestion Parameters
## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. If you need to change the issue query criteria or want to restart the ingestion from scratch, perform the following steps to ensure that the data in the destination table is consistent: 1. Right-click on the **FetchJiraIssues** processor and stop it. 2. Right-click on the **FetchJiraIssues** processor and then select **View State**. 3. In the **State** dialog box, select **Clear State**. This action clears the state of the processor and allows it to fetch all issues again. 4. Optional: If you want to change the issue query criteria, right-click on the imported process group and select **Parameters**. Update the parameters as needed. 5. Optional: If you want to change the destination table name, right-click on the imported process group and select **Parameters**. Update the `Destination Table` parameter. 6. Right-click on the **FetchJiraIssues** processor and select **Start**. The connector starts the data ingestion. 7. After ingestion, the data is available in the Snowflake destination table and in a flattened format in the destination view. The view includes all fields available in the Jira instance. ## Accessing the data Data fetched from Jira is available in the destination table. All fields fetched for Jira issue is available in the `ISSUE` column as an object in raw form fetched from the API. To help with querying the data, a flattened view is created based on the destination table. The view name is a concatenation of the table name and the suffix `_VIEW`. For example, if the destination table is named `JIRA_ISSUES`, then the view will be named `JIRA_ISSUES_VIEW`. In the view, all issue fields are extracted and available as separate columns. The column name is set to the field label. If there are many issues with the same label, a suffix with field ID is added to the column name to ensure uniqueness. For example, if there are two fields with IDs `customfield_1`, `customfield_2`, the label set in both fields to `Custom Field`, then the columns in the view will be named `Custom Field (customfield_1)`, `Custom Field (customfield_2)`. --- title: Set up the Openflow Connector for Kafka source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kafka/setup.md section: Loading & Unloading Data --- # Set up the Openflow Connector for Kafka This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/kafka/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using Openflow - Snowflake Deployments, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the Kafka connector. The connector must be able to connect to all Kafka brokers in the cluster. ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 2. Create a new role or use an existing role and grant the [Database privileges](#label-database-privileges). The connector requires a user to create the destination table. Make sure the user has the required privileges for managing Snowflake objects:
Snowflake recommends creating a separate user and role for each Kafka Cluster for better access control. You can use the following script to create and configure a custom role (requires SECURITYADMIN or equivalent): ```sql USE ROLE securityadmin; CREATE ROLE openflow_kafka_connector_role_1; GRANT USAGE ON DATABASE kafka_db TO ROLE openflow_kafka_connector_role_1; GRANT USAGE ON SCHEMA kafka_schema TO ROLE openflow_kafka_connector_role_1; ``` Privileges must be granted directly to the connector role and cannot be inherited. 3. Configure the destination table Snowflake highly recommends using server-side schema evolution for schema changes and [an error table for DML error logging](/user-guide/data-load-overview). The following example shows how to create a table and add proper OWNERSHIP permissions. ```sql USE ROLE openflow_kafka_connector_role_1; CREATE TABLE kafka_db.kafka_schema. ( kafkaMetadata variant ) ENABLE_SCHEMA_EVOLUTION = TRUE ERROR_LOGGING = TRUE; USE ROLE securityadmin; GRANT OWNERSHIP ON TABLE existing_table1 TO ROLE openflow_kafka_connector_role_1; ``` The connector supports automatic schema detection and evolution. The structure of tables in Snowflake is defined and evolved automatically to support the structure of new data loaded by the connector. It automatically maps the record content's first-level keys to table columns matching by name (case-insensitive). With Schema evolution enabled, Snowflake can automatically expand the destination table by adding new columns that are detected in the incoming stream and dropping NOT NULL constraints to accommodate new data patterns. For more information, see [Table schema evolution](/user-guide/data-load-schema-evolution). If ENABLE_SCHEMA_EVOLUTION isn't enabled, you must create the schema manually by extending the table definition. The connector tries to match the record content's first-level keys to the table columns by name. If keys from the JSON don't match the table columns, the connector ignores the keys. 4. (Optional) Configure a secrets manager Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. 1. Determine how you'll authenticate to the secrets manager after it's configured. On AWS, Snowflake recommends using the EC2 instance role associated with Openflow so no other secrets need to be persisted. 2. Configure a Parameter Provider associated with this Secrets Manager in Openflow from the hamburger menu in the upper right. Navigate to Controller Settings > Parameter Provider and fetch your parameter values. 3. Reference all credentials with the associated parameter paths so no sensitive values need to be persisted within Openflow. 5. Grant access to users For any other Snowflake users who require access to the raw ingested data by the connector (for example, for custom processing in Snowflake), grant those users the role created in step 1. ## Set up the connector As a data engineer, perform the following tasks to install and configure the connector: ### Install the connector To install the connector, do the following: 1. Navigate to the Openflow overview page. In the **Featured connectors section**, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the Select runtime dialog, select your runtime from the **Available runtimes** drop-down list and select **Add**. Before you install the connector, ensure that you have created a database, schema, and a table in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ### Configure the connector 1. If needed, customize the connector configuration before configuring the built-in parameters. 2. Populate the process group parameters 1. Right click on the imported process group and select Parameters. 2. Fill out the required parameter values #### Parameters The following table describes the parameters for the Openflow Connector for Kafka:
### Start the connector 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the plane and select **Start**. The connector starts data ingestion. ## Understanding KAFKAMETADATA column The connector populates the KAFKAMETADATA structure with metadata about the Kafka record. The structure contains the following information:
## Measuring ingestion latency For change tracking, incremental processing, and time-travel queries based on row modification time, the ROW_TIMESTAMP feature can be used. Enable it by running the following command on your destination table: ```sql ALTER TABLE SET ROW_TIMESTAMP = TRUE; ``` After row timestamps are enabled, tables expose the `METADATA$ROW_LAST_COMMIT_TIME` column, which returns the timestamp when each row was last modified. For more information, see [METADATA$ROW_LAST_COMMIT_TIME](/user-guide/data-load-overview). Row timestamp isn't available for interactive tables. For more information, see [](/user-guide/interactive). ## Using the connector with Apache Iceberg™ tables The connector can ingest data into a Snowflake-managed Apache Iceberg™ table. You must create the Iceberg table manually before running the connector; the connector doesn't create Iceberg tables automatically and doesn't support schema evolution. The Iceberg table can use either of the following storage options: - [Snowflake storage](/user-guide/tables-iceberg-internal-storage): Snowflake stores and manages the Iceberg table files for you, so you don't need to create an external volume or grant the connector access to it. - External cloud storage that you manage, accessed through an external volume. You must grant the connector role USAGE on the external volume. ### Grant usage on an external volume This step applies only when the Iceberg table uses an external volume that you manage. If the table uses [Snowflake storage](/user-guide/tables-iceberg-internal-storage), skip this step. For example, if your Iceberg table uses the `kafka_external_volume` external volume and the connector uses the role `openflow_kafka_connector_role`, run the following statement: ```sql USE ROLE ACCOUNTADMIN; GRANT USAGE ON EXTERNAL VOLUME kafka_external_volume TO ROLE openflow_kafka_connector_role; ``` ### Create an Apache Iceberg™ table for ingestion When you create an Iceberg table, you can use Iceberg data types (including VARIANT) or [compatible Snowflake types](/user-guide/tables-iceberg-data-types). For example, consider the following message: ```json { "id": 1, "name": "Steve", "body_temperature": 36.6, "approved_coffee_types": ["Espresso", "Doppio", "Ristretto", "Lungo"], "animals_possessed": { "dogs": true, "cats": false }, "options": { "can_walk": true, "can_talk": false }, "date_added": "2024-10-15" } ``` To create an Iceberg table for the example message, use one of the following statements. To use [Snowflake storage](/user-guide/tables-iceberg-internal-storage), set `EXTERNAL_VOLUME = 'SNOWFLAKE_MANAGED'` and omit `BASE_LOCATION`: ```sql CREATE OR REPLACE ICEBERG TABLE my_iceberg_table ( kafkaMetadata OBJECT( topic STRING, partition INTEGER, offset BIGINT, key STRING, headers MAP(STRING, STRING), timestamp BIGINT ), id INT, name string, body_temperature float, approved_coffee_types array(string), animals_possessed variant, date_added date, options object(can_walk boolean, can_talk boolean) ) EXTERNAL_VOLUME = 'SNOWFLAKE_MANAGED' CATALOG = 'SNOWFLAKE' ICEBERG_VERSION = 3; ``` To use your own external volume, set `EXTERNAL_VOLUME` to the volume name and provide a `BASE_LOCATION`: ```sql CREATE OR REPLACE ICEBERG TABLE my_iceberg_table ( kafkaMetadata OBJECT( topic STRING, partition INTEGER, offset BIGINT, key STRING, headers MAP(STRING, STRING), timestamp BIGINT ), id INT, name string, body_temperature float, approved_coffee_types array(string), animals_possessed variant, date_added date, options object(can_walk boolean, can_talk boolean) ) EXTERNAL_VOLUME = 'my_volume' CATALOG = 'SNOWFLAKE' BASE_LOCATION = 'my_location/my_iceberg_table' ICEBERG_VERSION = 3; ``` ## Using the connector with Interactive Tables Interactive tables are a special type of Snowflake table optimized for low-latency, high-concurrency queries. For more information, see [](/user-guide/interactive). 1. Create an interactive table: ```sql CREATE INTERACTIVE TABLE REALTIME_METRICS ( metric_name VARCHAR, metric_value NUMBER, source_topic VARCHAR, timestamp TIMESTAMP_NTZ ) AS (SELECT $1:M_NAME::VARCHAR, $1:M_VALUE::NUMBER, $1:RECORD_METADATA.topic::VARCHAR, $1:RECORD_METADATA.timestamp::TIMESTAMP_NTZ from TABLE(DATA_SOURCE(TYPE => 'STREAMING'))); ``` Important considerations: - Interactive tables have specific limitations and query restrictions. Review [](/user-guide/interactive) before using them with the connector. - For interactive tables, any required transformations must be handled in the table definition. - Interactive warehouses are required to query interactive tables efficiently. ## Using the connector with a customer-defined schema for the destination table The connector treats each Kafka record as a row to be inserted into a Snowflake table. For example, if you have a Kafka topic with the content of the message structured like the following JSON: ```json { "order_id": 12345, "customer_name": "John", "order_total": 100.00, "isPaid": true } ``` By default you don't have to specify all fields from the JSON thanks to the `ENABLE_SCHEMA_EVOLUTION = TRUE` feature. However, if you prefer a static schema, it can be created by running: ```sql CREATE TABLE ORDERS ( kafkaMetadata OBJECT, order_id NUMBER, customer_name VARCHAR, order_total FLOAT, ispaid BOOLEAN ); ``` ## Using the connector with a customer-defined PIPE If you choose to create your own pipe, you can define the data transformation logic in the pipe's [COPY INTO](/sql-reference/sql/copy-into-table) statement. You can rename columns as required and cast the data types as needed. For example: ```sql CREATE TABLE ORDERS ( order_id VARCHAR, customer_name VARCHAR, order_total VARCHAR, ispaid VARCHAR ); CREATE PIPE ORDERS AS COPY INTO ORDERS SELECT $1:order_id::STRING, $1:customer_name, $1:order_total::STRING, $1:isPaid::STRING FROM TABLE(DATA_SOURCE(TYPE => 'STREAMING')); ``` When you define your own pipe, your destination table columns don't need to match the JSON keys. You can rename the columns to your desired names and cast the data types if required. To adjust the connector to work with a custom pipe, perform the following tasks: 1. Right-click on the PublishSnowpipeStreaming processor used in your Kafka ingestion flow in the Openflow canvas. 2. Select Configure from the context menu. 3. Navigate to the Properties tab. 4. In the Destination type field, pick Pipe. 5. In the Pipe field, type the name of your PIPE. 6. Select Apply to save the configuration. ## Customizing error handling Error handling is split between Openflow-side failures and server-side failures within the Snowpipe Streaming service. - **Openflow Errors (Client-Side Failures)**: Errors such as unparseable payloads or custom transformation failures occur before records reach Snowflake. By default these records are discarded. It's possible to process these errors in Openflow - use FlowFiles from the parse failure relationship in the ConsumeKafka processor. - **Snowpipe Streaming Errors (Server-Side Failures)**: Errors for records that successfully reach Snowflake but are incompatible with the destination table's schema (for example, type mismatches) are captured by the Snowflake infrastructure. When error logging is enabled on the destination table (`error_logging = true`), these failed rows are automatically ingested into the destination Error table. ## Performance tuning [](/user-guide/data-integration/openflow/connectors/kafka/performance-tuning) --- title: Set up the Openflow Connector for LinkedIn Ads source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/linkedin-ads/setup.md section: Loading & Unloading Data --- # Set up the %linkedinads% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up the %linkedinads%. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/linkedin-ads/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-linkedinads) connector. ## Get the credentials 1. As a LinkedIn Ads user, perform the following tasks: 1. Optional: If you don't have an ad account to run and manage campaigns, [create one](https://www.linkedin.com/help/linkedin/answer/a426102/create-an-ad-account-in-campaign-manager-as-a-new-advertiser). 2. Ensure that the [user account](https://www.linkedin.com/help/lms/answer/a417905?trk=hc-articlePage-peopleAlsoViewed) has at least a VIEWER role on the ad account. 3. Use the user account to apply for Advertising API access. For more information, see the [Microsoft quick start](https://learn.microsoft.com/en-us/linkedin/marketing/quick-start?view=li-lms-2025-02#step-1-apply-for-api-access). 4. Obtain a [refresh token](https://learn.microsoft.com/en-us/linkedin/shared/authentication/developer-portal-tools?context=linkedin%2Fcontext#generate-a-token-in-the-developer-portal). Use `3-legged oAuth` and the `r_ads_reporting` scope. 5. Obtain the client ID and client secret from the LinkedIn Developer Portal. These credentials are available in the **Auth** tab in [App Details](https://www.linkedin.com/developers/apps). ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1. Create a new role or use an existing role. 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 6. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 7. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 8. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 9. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 10. Create a database and schema in Snowflake for the connector to store ingested data. Grant required [](#label-database-privileges) to the role created in the first step. Substitute the role placeholder with the actual value and use the following sql commands: ```sql CREATE DATABASE linkedin_destination_db; CREATE SCHEMA linkedin_destination_db.linkedin_destination_schema; GRANT USAGE ON DATABASE linkedin_destination_db TO ROLE ; GRANT USAGE ON SCHEMA linkedin_destination_db.linkedin_destination_schema TO ROLE ; GRANT CREATE TABLE ON SCHEMA linkedin_destination_db.linkedin_destination_schema TO ROLE ; ``` 11. Create a warehouse that will be used by the connector or use an existing one. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. 12. Ensure that the user with role used by the connector has the required privileges to use the warehouse. If that's not the case then grant the required privileges to the role. ```sql CREATE WAREHOUSE linkedin_connector_warehouse WITH WAREHOUSE_SIZE = 'X-Small'; GRANT USAGE ON WAREHOUSE linkedin_connector_warehouse TO ROLE ; ``` ## Set up the connector As a data engineer, perform the following tasks to install and configure the connector: ### Install the connector 1. Create a database and schema in Snowflake for the connector to store ingested data. Grant required [](#label-database-privileges) to the role created in the first step. Substitute the role placeholder with the actual value and use the following SQL commands: ```sql CREATE DATABASE DESTINATION_DB; CREATE SCHEMA DESTINATION_DB.DESTINATION_SCHEMA; GRANT USAGE ON DATABASE DESTINATION_DB TO ROLE ; GRANT USAGE ON SCHEMA DESTINATION_DB.DESTINATION_SCHEMA TO ROLE ; GRANT CREATE TABLE, CREATE PIPE ON SCHEMA DESTINATION_DB.DESTINATION_SCHEMA TO ROLE ; ``` To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ### Configure the connector Each process group is responsible for fetching data for a single report configuration. To use multiple configurations on a regular schedule, create a separate process group for each report configuration. 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values as described in [Flow parameters](#flow-parameters). ### Flow parameters This section describes the flow parameters that you can configure based on the following parameter contexts: - [Linkedin Ads Source Parameters](#linkedin-ads-source-parameters): Used to establish connection with LinkedIn Ads API. - [Linkedin Ads Destination Parameters](#linkedin-ads-destination-parameters): Used to establish connection with Snowflake. -
[Linkedin Ads Ingestion Parameters](#linkedin-ads-ingestion-parameters): Contains all parameters from the other two parameter contexts and additional parameters specific to a given process group.
Because this parameter context contains ingestion-specific details, you must create new parameter contexts for each new report and process group.
#### Linkedin Ads Source Parameters
#### Linkedin Ads Destination Parameters
#### Linkedin Ads Ingestion Parameters The following table lists parameters that are not inherited from the other parameter contexts:
You must specify at least one of the filters, that is shares, campaigns, campaign groups, accounts, or companies. ## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2.
Right-click on the imported process group and select **Start**.
The connector starts the data ingestion.
--- title: Set up the Openflow Connector for Meta Ads source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/meta-ads/setup.md section: Loading & Unloading Data --- # Set up the %metaads% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up the %metaads%. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/meta-ads/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-meta-ads) connector. ## Get the credentials As a Meta Ads administrator, perform the following actions in your Meta Ads account: 1. [Create a Meta App](https://developers.facebook.com/docs/development/create-an-app/) or ensure that you have access to one. 2. Enable [Marketing API](https://developers.facebook.com/docs/marketing-api/get-started) in the [App dashboard](https://developers.facebook.com/apps). 3. Generate a [long-lived token](https://developers.facebook.com/docs/facebook-login/guides/access-tokens/get-long-lived/). 4. Optional: Increase the rate limit by [changing the app access type](https://developers.facebook.com/docs/marketing-api/overview/rate-limiting) from `Standard access` to `Advanced access` of the Ads Management Standard Access. Enable the `ads_read` and `ads_management` [permissions](https://developers.facebook.com/docs/permissions/). ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1. Create a new role or use an existing role and grant the [](#label-database-privileges). 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. ## Set up the connector As a data engineer, perform the following tasks to install and configure the connector: ### Install the connector 1. Create a database and schema in Snowflake for the connector to store ingested data.Grant required [](#label-database-privileges) to the role created in the first step. Substitute the role placeholder with the actual value and use the following sql commands: ```sql CREATE DATABASE META_ADS_DESTINATION_DB; CREATE SCHEMA META_ADS_DESTINATION_DB.META_ADS_DESTINATION_SCHEMA; GRANT USAGE ON DATABASE META_ADS_DESTINATION_DB TO ROLE ; GRANT USAGE ON SCHEMA META_ADS_DESTINATION_DB.META_ADS_DESTINATION_SCHEMA TO ROLE ; GRANT CREATE TABLE ON SCHEMA META_ADS_DESTINATION_DB.META_ADS_DESTINATION_SCHEMA TO ROLE ; ``` To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values as described in [Flow parameters](#flow-parameters). ### Flow parameters This section describes the flow parameters that you can configure based on the following parameter contexts: - [Meta Ads Source Parameters](#meta-ads-source-parameters): Used to establish connection with MetaAds API. - [Meta Ads Destination Parameters](#meta-ads-destination-parameters): Used to establish connection with Snowflake. - [Meta Ads Ingestion Parameters](#meta-ads-ingestion-parameters): Used to define the configuration of data downloaded from Meta Ads. #### Meta Ads Source Parameters
#### Meta Ads Destination Parameters
#### Meta Ads Ingestion Parameters
## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. ## How to reset the connector To fully reset connector to the initial state, do the following: 1. Ensure that there are no more flow files in the queues. 2. Stop all the processors. 3. Clear the state of the initial processor. 1. Right click on the processor `Create Meta Ads Report` and select **View State**. 2. Select the option **Clear State**. This resets the state of the processor. 4. Drop the destination table in Snowflake. --- title: Set up the Openflow Connector for Microsoft Dataverse source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/dataverse/setup.md section: Loading & Unloading Data --- # Set up the Openflow Connector for Microsoft Dataverse This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up the Openflow Connector for Microsoft Dataverse. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/dataverse/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-dataverse) connector. ## Get the credentials As a Microsoft Dataverse administrator, perform the following steps: 1. Ensure you have a Dataverse Environment to work with, and you have access to that environment through [https://admin.powerplatform.microsoft.com/](https://admin.powerplatform.microsoft.com/). 2. Ensure that you have an application registered in Microsoft Entra ID in portal.azure.com. This application must have access to the tenant we have our Dataverse Environment available. To register the application follow [this guide](https://learn.microsoft.com/en-us/power-apps/developer/data-platform/walkthrough-register-app-azure-active-directory). 3. Generate and store ClientID and Client Secret within that application. 4. Go to Power Apps Admin Center and configure your Dataverse Environment to be accessed via applications registered before. To do that, go to **Manage** %raa% **Environments** and select the environment to configure. Then go to **Settings** %raa% **Users & permissions** %raa% **Application users**. Previously created applications must be added and granted with privileges necessary to read data from Microsoft Dataverse. 5. Copy and save the Environment URL of the selected Dataverse Environment from [https://admin.powerplatform.microsoft.com/](https://admin.powerplatform.microsoft.com/). ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1.
Create a Snowflake user with the type as [SERVICE](#label-user-type-property).
Create a database and schema to store the replicated data, and set up privileges for the service user to create tables in destination schema by granting the [USAGE and CREATE TABLE privileges](#label-database-privileges). ```sql CREATE DATABASE ; CREATE SCHEMA .; CREATE USER TYPE=SERVICE COMMENT='Service user for automated access of Openflow'; CREATE ROLE ; GRANT ROLE TO USER ; GRANT USAGE ON DATABASE TO ROLE ; GRANT USAGE ON SCHEMA . TO ROLE ; GRANT CREATE TABLE ON SCHEMA . TO ROLE ; CREATE WAREHOUSE WITH WAREHOUSE_SIZE = 'SMALL' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE; GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE ; ```
1. Create a pair of secure keys (public and private). Store the private key for the user in a file to supply to the connector's configuration. Assign the public key to the Snowflake service user: ```sql ALTER USER SET RSA_PUBLIC_KEY = 'thekey'; ``` For more information, see [pair of keys](/user-guide/key-pair-auth). 2. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 3. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 4. Designate a warehouse for the connector to use. Grant the USAGE privilege on the warehouse to the role created before. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. ## Set up the connector As a data engineer, perform the following tasks to install and configure the connector: ### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values as described in [Flow parameters](#flow-parameters). ### Flow parameters This section describes the flow parameters that you can configure based on the following parameter contexts: - [Dataverse Source Parameters](#dataverse-source-parameters): Used to establish connection with Dataverse. - [Dataverse Destination Parameters](#dataverse-destination-parameters): Used to establish connection with Snowflake. - [Dataverse Ingestion Parameters](#dataverse-ingestion-parameters): Used to define the configuration of data downloaded from Dataverse. #### Dataverse Source Parameters
#### Dataverse Destination Parameters
#### Dataverse Ingestion Parameters
When configuring `Source Tables Filter Value`, use the **entity set name** (plural form, e.g., `annotations`) rather than the table name displayed in the Microsoft Dataverse interface. To find the entity set name for a table, go to [Power Apps](https://make.powerapps.com), select **Tables**, find your table, then select **Advanced** %raa% **Tools** %raa% **Copy set name**. The `Column Filter JSON` parameter uses a different naming convention — it requires the **singular logical entity name** (e.g., `annotation`). See [Replicate a subset of columns in a table](#replicate-a-subset-of-columns-in-a-table) for details. ## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. ### Replicate a subset of columns in a table The connector can filter the data replicated per table to a subset of configured columns. To apply filters to columns, modify the Replication Parameters context `Column Filter` property to specify a JSON filter. Add an array of configurations, one entry for every table to which you want to apply a filter. The `table` field must use the **singular logical entity name** (e.g., `annotation`), not the plural entity set name used in `Source Tables Filter Value` (e.g., `annotations`). To find the logical entity name in Power Apps, go to [Power Apps](https://make.powerapps.com), select **Tables**, find your table, then select **Advanced** %raa% **Tools** %raa% **Copy logical name**. Some columns have a binary representation stored under a `_binary`-suffixed column name (for example, a column `mycolumn` may also appear as `mycolumn_binary`). To fully exclude such a column, list both names in the `excluded` array. The following example excludes large binary columns from a table: ```javascript [ { "table": "mytable", "excluded": ["mycolumn", "mycolumn_binary"] } ] ``` Columns can be included or excluded by name or pattern. You can apply a single condition per table, or combine multiple conditions, with exclusions taking precedence over inclusions. The following example shows all available fields. The `table` field is mandatory. One or more of `included`, `excluded`, `includedPattern`, `excludedPattern` is required. ```javascript [ { "table" : "", "included": ["", ""], "excluded": ["", ""], "includedPattern": "", "excludedPattern": "", } ] ``` ### Manage table state The connector maintains per-table ingestion state in the `Dataverse Table State Service` controller service. Each entry records the current ingestion status and the delta token used for change tracking. #### View connector state To view the current state of all tables: 1. Right-click on the canvas and select **Controller services**. 2. Locate the controller service named **Dataverse Table State Service**. 3. In the **Dataverse Table State Service** menu, click **View state**. The state is a set of key/value pairs where the key is the table entity set name (for example, `accounts`). The value has the format `;;;`, for example: ```text accounts -> DONE;!AAAAAjE...;; ``` The `STATUS` can be one of the following: - `FETCHING` — the connector is actively fetching records for this table. - `PROCESSING` — the table is queued for ingestion but not currently being fetched. - `DONE` — all available data was fetched successfully. The connector will check for new data on the next scheduled run according to "Ingestion Schedule Interval" parameter. - `FAILED` — an unrecoverable error occurred. Review the connector logs for details. If the logs indicate a configuration issue or a [known limitation](/user-guide/data-integration/openflow/connectors/dataverse/about#limitations), resolve it and restart ingestion for the affected table. If no known cause is found, this may indicate a bug or an unsupported scenario; contact Snowflake Support. #### Restart ingestion for a single table Clearing a table's state causes the connector to perform a full re-ingestion of that table on the next run. All previously synced records will be re-ingested. To restart ingestion for a specific table: 1. Stop all processors in the flow. 2. Ensure that no in-flight FlowFiles are being processed for that table. 3. Right-click on the canvas and select **Disable all controller services**. 4. Go to **Controller services** and open the state view for **Dataverse Table State Service**. 5. Select the trash icon next to the table entry (identified by its entity set name) to remove the state for that table only. 6. Right-click on the canvas, select **Enable all controller services**, and then start all processors. #### Restart ingestion for all tables To restart ingestion for all replicated tables: 1. Stop all processors in the flow. 2. Clear all FlowFiles from the connector's queues. 3. Right-click on the canvas and select **Disable all controller services**. 4. Go to **Controller services** and open the state view for **Dataverse Table State Service**. 5. Select **Clear state** to remove all table entries. 6. Right-click on the canvas, select **Enable all controller services**, and then start all processors. Do not delete FlowFiles manually while the connector is running. Doing so can leave a table in the `FETCHING` status indefinitely. If this occurs, restart ingestion for that table as described above. --- title: Set up the Openflow Connector for MySQL source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/mysql/setup.md section: Loading & Unloading Data --- # Set up the %mysql% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [Incremental replication](/user-guide/data-integration/openflow/connectors/mysql/incremental-replication) - [](/user-guide/data-integration/openflow/connectors/mysql/about) - [](/user-guide/data-integration/openflow/connectors/mysql/data-mapping) This topic describes the steps to set up the %mysql%. This connector can be configured to immediately start replicating incremental changes for newly added tables, bypassing the snapshot load phase. This option is often useful when reinstalling the connector in an account where previously replicated data exists and you want to continue replication without having to re-snapshot tables. For details on the incremental load process, see [Incremental replication](/user-guide/data-integration/openflow/connectors/mysql/incremental-replication). ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/mysql/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-mysql) connector. 4. Ensure that you have a MySQL 8 or a later version to synchronize data with Snowflake. 5. Recommended: Ensure that you add only one connector instance per runtime. 6. As a database administrator, perform the following tasks: 1. Enable [binary logs](https://dev.mysql.com/doc/refman/8.4/en/binary-log.html), then save and configure its format as follows:
For example: ```sql log_bin = on binlog_format = row binlog_row_metadata = full binlog_row_image = full binlog_row_value_options = ``` 2. Increase the value of `sort_buffer_size`. ```sql sort_buffer_size = 4194304 ``` `sort_buffer_size` defines the amount of memory (in bytes) allocated per query thread for in-memory sorting operations, such as ORDER BY. If the value is too small, the connector may fail with the following error message: `Out of sort memory, consider increasing server sort buffer size`. This indicates that `sort_buffer_size` should be raised. 3. If you're using Amazon RDS databases, then increase the retention period relevant to *binlog_expire_logs_seconds* using *rds_set_configuration*. For example, if you want to store binlog for 24 hours, then call `mysql.rds_set_configuration('binlog retention hours', 24)`. 4. When using a read replica to connect, binary logging must be enabled on the replica. Configuration details are provided in step 4. 5. After binary logging is enabled, configure the replica to log the events received from its source into its own binary log. ```sql log_replica_updates = ON ``` `log_replica_updates` allows the replica to write events received from its source to its own binary log, making those changes available to any databases that are replicating from it. 6. Connect via SSL. If you're planning to use an SSL connection to MySQL, prepare the root certificate for your database server. It is required during configuration. 7. Create a user for the connector. The connector requires a user with the REPLICATION_SLAVE and REPLICATION_CLIENT privileges for reading the binary logs. Grant these privileges: ```sql GRANT REPLICATION SLAVE ON *.* TO ''@'%' GRANT REPLICATION CLIENT ON *.* TO ''@'%' ``` 8. Grant the SELECT privilege on every replicated table: ```sql GRANT SELECT ON .* TO ''@'%' GRANT SELECT ON .
No views created. Data is directly queryable from the destination tables.
Legacy parameter Current equivalent
Search Type Removed. The new connector always fetches all issues from discovered projects. Use `Project Keys Filter` to limit ingestion to specific projects.
JQL Query Removed. The new connector doesn't support arbitrary JQL for issue filtering. Use `Project Keys Filter` instead.
Project Names Replaced by `Project Keys Filter`, which accepts project keys (not names or IDs).
Status Category Removed. The new connector fetches all issues regardless of status.
Updated After Removed. The new connector manages incremental state automatically.
Created After Removed. The new connector manages incremental state automatically.
Destination Table Removed. The new connector creates fixed table names per entity (`ISSUE`, `PROJECT`, `COMMENT`, and others) in the configured destination schema.
Fetch All Worklogs Removed. The new connector fetches all worklogs into a separate `WORKLOG` table by default when `WORKLOG` is listed in `Enabled Tables`.
Connection Method Not exposed as a parameter. The new connector uses the `DIRECT` connection method.
Parameter Description
Deletes Fetch Strategy Enables tracking of deleted issues via the Jira audit log. Not available in the legacy connector.
Merge Interval Time interval between journal-to-destination merge operations. Available in both the core flow and the agile flow.
Property Description
End Offset Number of bytes removed at the end of the file.
Remove All Content Remove all content from the FlowFile superseding Start Offset and End Offset properties.
Start Offset Number of bytes removed at the beginning of the file.
Name Description
success Processed flowfiles.
Property Description
Input Compression Strategy The strategy to use for decompressing input FlowFiles
Output Compression Level The compression level for output FlowFiles for supported formats. A lower value results in faster processing but less compression; a value of 0 indicates no (that is, simple archiving) for gzip or minimal for xz-lzma2 compression. Higher levels can mean much larger memory usage such as the case with levels 7-9 for xz-lzma/2 so be careful relative to heap size.
Output Compression Strategy The strategy to use for compressing output FlowFiles
Output Filename Strategy Processing strategy for filename attribute on output FlowFiles
Name Description
failure FlowFiles will be transferred to the failure relationship on compression modification errors
success FlowFiles will be transferred to the success relationship on compression modification success
Name Description
mime.type The appropriate MIME Type is set based on the value of the Compression Format property. If the Compression Format is 'no compression' this attribute is removed as the MIME Type is no longer known.
Display Name API Name Default Value Allowable Values Description
Database User Database User Database user name
Mongo URI * Mongo URI MongoURI, typically of the form: mongodb://host1[:port1][,host2[:port2],...]
Password Password The password for the database user
SSL Context Service SSL Context Service The SSL Context Service used to provide client certificate information for TLS/SSL connections.
Write Concern * Write Concern ACKNOWLEDGED - ACKNOWLEDGED - UNACKNOWLEDGED - FSYNCED - JOURNALED - REPLICA_ACKNOWLEDGED - MAJORITY - W1 - W2 - W3 The write concern to use
Display Name API Name Default Value Allowable Values Description
Schema Access Strategy * Schema Access Strategy infer - Use 'Schema Name' Property - Use 'Schema Text' Property - Infer from Result Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property
Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry
Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema
Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
Mongo Collection Name * mongo-collection-name The name of the collection to use
Mongo Database Name * mongo-db-name The name of the database to use
Client Service * mongo-lookup-client-service A MongoDB controller service to use with this lookup service.
Projection mongo-lookup-projection Specifies a projection for limiting which fields will be returned.
Lookup Value Field mongo-lookup-value-field The field whose value will be returned when the lookup key(s) match a record. If not specified then the entire MongoDB result document minus the _id field will be returned as a record.
Connector Minimum version
MySQL 0.33.0
PostgreSQL 0.39.0
MongoDB 0.17.0
SQL Server 0.27.0
Oracle Embedded License 0.25.0
Oracle Independent License 0.24.0
Name Type Description
application String The fixed value _openflow_
cloud.service.provider String One of _aws_, _snowflake_
container.id String Unique identifier of the container
container.image.name String Fully qualified name of the container image. All Openflow images are hosted by Snowflake repositories. For example, *<account>-openflow-<env>.registry-internal.snowflakecomputing.com/openflow/openflow/openflow_repo/runtime-server*
container.image.tag String Version of the container image
k8s.container.name String The name of the K8s container. Openflow Runtime containers will start with the "Runtime Key" and end with *-gateway* or *-server*. For example, an Openflow Runtime named "PostgreSQL CDC" with a Runtime Key of postgresql-cdc, so it would have container names of: - postgresql-cdc-gateway - postgresql-cdc-server
k8s.container.restart_count Numeric String The number of times this container has restarted since it was created.
k8s.namespace.name String K8s namespace of the pod or container, starting with _runtime-_ for Openflow Runtimes. Values also include _kube-system_ and _openflow-runtime-infra_.
k8s.node.name String The internal domain name of the EKS node hosting the pod / container, or the EKS node itself. For example, ip-10-12-13-144.us-west-2.compute.internal
k8s.pod.name String The name of the K8s pod. Openflow Runtime pods will start with the "Runtime Key" and end with a numeric identifier for each pod replica. This number can grow up to the "Max Nodes" set for the Runtime, indexed at 0. For example, an Openflow Runtime named "PostgreSQL CDC" with a Runtime Key of postgresql-cdc and 3 nodes would have pod names of: - postgresql-cdc-0 - postgresql-cdc-1 - postgresql-cdc-2
k8s.pod.start_time ISO 8601 Date String Timestamp that the pod was started
k8s.pod.uid UUID String Unique identifier of the pod within the cluster
deployment.version String The Openflow deployment version.
openflow.dataplane.id UUID String The unique identifier of the Openflow Deployment, matching the "ID" shown in the Snowflake Openflow UI through Deployment > View Details.
Name Type Description
name String Provider of the metric. One of: - *runtime* for Openflow Connector metrics - *github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver* for system-level metrics
Name Type Description
metric Object Contains two fields: - *name* for the unique metric produced, typically using dot-delimited namespaces - *unit* for the value represented by the type, such as byte, nanosecond, and thread The name and unit values vary widely. For the full list, see [Application Metrics](#label-openflow-application-metrics) below.
metric_type String One of: - *gauge* for most Openflow metrics, a snapshot value that can increase or decrease - *sum* for cumulative metrics like pod CPU time and network IO
value_type String The primitive type of the value produced by this metric. One of: - INT - DOUBLE
aggregation_temporality String Optional. Set to cumulative for metrics that are strictly increasing and dependent on previous values, such as pod CPU time and network IO.
is_monotonic Boolean Optional. For cumulative metrics, this is true to show that it is strictly increasing within the time series.
Name Type Description
formattedMessage String The actual log message emitted from the Runtime logger.
level String One of: - ERROR - WARN - INFO - DEBUG - TRACE
loggerName String The fully qualified classname for the logger. Openflow processors will typically use logger names that start with *com.snowflake.openflow.runtime.processors*. This is useful to view logs for a specific processor, controller service, or bundled library.
nanoseconds Integer Nanosecond-level time that this log message was created, starting at milliseconds. For example, a nanosecond value of 111222333 could correspond to a timestamp value of 1749180210111 with the leftmost 3 digits of nanosecond matching the right-most 3 digits of timestamp.
threadName String Name of the thread handling this call. For example, _Timer-Driven Process Thread-7_
throwable JSON Object *null* when there is no exception or stacktrace for this log message. Otherwise, it logs the stacktrace as a JSON string with fields: - *className* - the exception thrown - *message* - any message logged with the exception - *stepArray* - array of method calls for the stack trace, including: - *className* - *fileName* - *lineNumber* - *methodName*
timestamp Integer Time that this log message was created, represented as milliseconds since the UNIX epoch. For example, 1749180210044 indicates that the log was created at 2025-06-05 03:23:30.044 UTC
mdc JSON Object Mapped Diagnostic Context (MDC) providing additional flow-level context for the log entry. Contains the following fields: - *processGroupId* - unique identifier of the process group - *processGroupIdPath* - hierarchical path of process group IDs - *processGroupName* - name of the process group - *processGroupNamePath* - hierarchical path of process group names - *registeredFlowIdentifier* - identifier of the registered flow (present for all versioned flows, including out-of-the-box Openflow connectors) - *registeredFlowVersion* - version of the registered flow (present for all versioned flows, including out-of-the-box Openflow connectors) For example: ```json { "processGroupId": "6dc1d98f-019d-1000-ffff-ffffa3ba8a09", "processGroupIdPath": "/58385a8b-019d-1000-2a52-9ef1c34b0e5f/6dc1d98f-019d-1000-ffff-ffffa3ba8a09", "processGroupName": "latency targets", "processGroupNamePath": "/Openflow/latency targets", "registeredFlowIdentifier": "sqlserver-multidatabase", "registeredFlowVersion": "0.29.0-ebb7a257" } ```
Metric Name Unit Description
connection.input.bytes bytes Size of Items Input
connection.input.count items Count of Items Input
connection.output.bytes bytes Size of Items Output
connection.output.count items Count of Items Output
connection.queued.bytes bytes Size of Items Queued
connection.queued.bytes.max bytes Max Size of Items Queued
connection.queued.count items Count of Items Queued
connection.queued.count.max items Max Count of Items Queued
connection.queued.duration.total milliseconds Total Duration of Queued Items
connection.queued.duration.max milliseconds Max Duration of Queued Items
connection.backpressure.threshold.bytes bytes The maximum size of data in bytes that can be queued in this connection before it applies back pressure.
connection.backpressure.threshold.objects items The configured maximum number of FlowFiles that can be queued in this connection before it applies back pressure.
connection.loadbalance.status.load_balance_not_configured binary, 0 or 1 1 if the connection does not have a configured load balance setting. Otherwise, 0.
connection.loadbalance.status.load_balance_active binary, 0 or 1 1 if the connection is load balancing across the cluster. Otherwise, 0.
connection.loadbalance.status.load_balance_inactive binary, 0 or 1 1 if the connection is not load balancing across the cluster. Otherwise, 0.
Attribute Description
id The unique identifier of the connection
name The user-visible name of the connection
type The fixed value _connection_
source.id The unique identifier of the component that is sending FlowFiles to this connection
source.name The user-visible name of the component that is sending FlowFiles to this connection
destination.id The unique identifier of the component that is receiving FlowFiles from this connection
destination.name The user-visible name of the component that is receiving FlowFiles from this connection
group.id The unique identifier of the Process Group that contains this Connection
Metric Name Unit Description
port.thread.count.active threads Number of Active Threads
port.bytes.received bytes Number of Bytes Received
port.bytes.sent bytes Number of Bytes Sent
port.flowfiles.received flowfiles Number of FlowFiles Received
port.flowfiles.sent flowfiles Number of FlowFiles Sent
port.input.bytes bytes Size of Items Input
port.input.count items Count of Items Input
port.output.bytes bytes Size of Items Output
port.output.count items Count of Items Output
Attribute Description
id The unique identifier of the port
name The user-visible name of the port
type One of _port-input_ or _port-output_
group.id The unique identifier of the Process Group that contains this Port
Metric Name Unit Description
processgroup.thread.count.active threads Number of Active Threads
processgroup.thread.count.stateless threads Number of Stateless Threads
processgroup.thread.count.terminated threads Number of Terminated Threads
processgroup.bytes.read bytes Number of Bytes Read
processgroup.bytes.received bytes Number of Bytes Received
processgroup.bytes.transferred bytes Number of Bytes Transferred
processgroup.bytes.sent bytes Number of Bytes Sent
processgroup.bytes.written bytes Number of Bytes Written
processgroup.flowfiles.received flowfiles Number of FlowFiles Received
processgroup.flowfiles.sent flowfiles Number of FlowFiles Sent
processgroup.flowfiles.transferred flowfiles Number of FlowFiles Transferred
processgroup.input.count items Number of Items Input
processgroup.input.content.size bytes Size of Items Input
processgroup.output.count items Number of Items Output
processgroup.output.content.size bytes Size of Items Output
processgroup.queued.count items Number of Items Queued
processgroup.queued.content.size bytes Size of Items Queued
processgroup.time.processing nanoseconds Time Spent Processing
Attribute Description
id The unique identifier of the Process Group
name The user-visible name of the Process Group
type The fixed value _process-group_
tree.level The depth of the Process Group, relative to the root process group of the flow. Process Groups at the highest level of the flow will have a tree.level of 1
Metric Name Unit Description
processor.thread.count.active thread Number of Active Threads
processor.thread.count.terminated thread Number of Terminated Threads
processor.time.lineage.average nanosecond Average Lineage Duration
processor.invocations invocations Number of Invocations
processor.bytes.read byte Number of Bytes Read
processor.bytes.received byte Number of Bytes Received
processor.bytes.sent byte Number of Bytes Sent
processor.bytes.written byte Number of Bytes Written
processor.flowfiles.received flowfiles Number of FlowFiles Received
processor.flowfiles.removed flowfiles Number of FlowFiles Removed
processor.flowfiles.sent flowfiles Number of FlowFiles Sent
processor.input.count item Number of Items Input
processor.input.content.size bytes Size of Items Input
processor.output.count item Number of Items Output
processor.output.content.size byte Size of Items Output
processor.time.processing nanosecond Time Spent Processing
processor.run.status.running binary, 0 or 1 1 if running; 0 otherwise
processor.run.status.stopped binary, 0 or 1 1 if stopped; 0 otherwise
processor.run.status.validating binary, 0 or 1 1 if validating; 0 otherwise
processor.run.status.invalid binary, 0 or 1 1 if invalid; 0 otherwise
processor.run.status.disabled binary, 0 or 1 1 if disabled; 0 otherwise
processor.counter count Value of the counter
Attribute Description
id The unique identifier of the processor
name The user-visible and user-editable name of the Processor
type The fixed value _processor_
component The immutable class name of the processor.
execution.node Either _ALL_ or _PRIMARY_, depending on how this Processor is configured to run
group.id The unique identifier of the Process Group that contains this Processor
Attribute Description
type The fixed value _counter_
counter The user- or system-generated name of the counter
Metric Name Unit Description
remoteprocessgroup.thread.count.active threads Number of Active Threads
remoteprocessgroup.remote.port.count.active ports Number of Active Remote Ports
remoteprocessgroup.remote.port.count.inactive ports Number of Inactive Remote Ports
remoteprocessgroup.duration.lineage.average nanoseconds Average Lineage Duration
remoteprocessgroup.refresh.age milliseconds Time since last refresh
remoteprocessgroup.received.count items Number of Received Items
remoteprocessgroup.received.content.size bytes Size of Received Items
remoteprocessgroup.sent.count items Number of Sent Items
remoteprocessgroup.sent.content.size bytes Size of Sent Items
remoteprocessgroup.transmission.status.transmitting binary, 0 or 1 1 if the Remote Process Group is transmitting. Otherwise, 0.
remoteprocessgroup.transmission.status.nottransmitting binary, 0 or 1 0 if the Remote Process Group is transmitting. Otherwise, 1.
Attribute Description
id The unique identifier of the remote process group
name The user-visible name of the Remote Process Group
group.id The unique identifier of the Process Group that contains this Remote Process Group
authorization.issue The Authorization used to access the Remote Process Group
target.uri The URI of the Remote Process Group
type The fixed value _remote-process-group_
Metric Name Unit Description
jvm.memory.heap.used bytes The amount of memory currently occupied by objects on the JVM Heap
jvm.memory.heap.committed bytes The amount of memory guaranteed to be available for use by the JVM Heap
jvm.memory.heap.max bytes Maximum amount of memory allocated for the JVM Heap
jvm.memory.heap.init bytes Initial amount of memory allocated for the JVM Heap
jvm.memory.heap.usage percentage JVM Heap Usage
jvm.memory.non-heap.usage percentage JVM Non-Heap Usage
jvm.memory.total.init bytes Initial amount of memory allocated for the JVM
jvm.memory.total.used bytes Current amount of memory used by the JVM
jvm.memory.total.max bytes Maximum amount of memory that can be used by the JVM
jvm.memory.total.committed bytes The amount of memory guaranteed to be available for use by the JVM
jvm.threads.count threads Number of live threads
jvm.threads.deadlocks threads JVM Thread Deadlocks
jvm.threads.daemon.count threads Number of live daemon threads
jvm.uptime seconds Number of seconds the JVM process has been running
jvm.file.descriptor.usage percentage Percentage of available file descriptors currently in use.
jvm.gc.G1-Concurrent-GC.runs runs Total number of times that the G1 Concurrent Garbage Collection has run
jvm.gc.G1-Concurrent-GC.time milliseconds Total amount of time that the G1 Concurrent Garbage Collection has been running
jvm.gc.G1-Young-Generation.runs runs Total number of times that the G1 Young Generation has run
jvm.gc.G1-Young-Generation.time milliseconds Total amount of time that the G1 Young Generation has been running
jvm.gc.G1-Old-Generation.runs runs Total number of times that the G1 Old Generation has run
jvm.gc.G1-Old-Generation.time milliseconds Total amount of time that the G1 Old Generation has been running
Metric Name Unit Description
cores.available cores The number of available cores for the Runtime
cores.load percentage Either the system load average or -1 if it is not available
Attribute Description
id The fixed value _cpu_
name The name of the operating system
architecture The architecture of the operating system
version The version of the operating system
Metric Name Unit Description
storage.free bytes The amount of free storage for a given repository
storage.used bytes The amount of used storage for a given repository
Attribute Description
id The unique identifier of the storage repository
name Same as id and provided for consistency
storage.type One of _flowfile_, _content_, or _provenance_
Property Description
Activity Restored Message The message that will be the content of FlowFiles that are sent to 'activity.restored' relationship
Continually Send Messages If true, will send inactivity indicator continually every Threshold Duration amount of time until activity is restored; if false, will send an indicator only when the flow first becomes inactive
Copy Attributes If true, will copy all flow file attributes from the flow file that resumed activity to the newly created indicator flow file
Inactivity Message The message that will be the content of FlowFiles that are sent to the 'inactive' relationship
Monitoring Scope Specify how to determine activeness of the flow. 'node' means that activeness is examined at individual node separately. It can be useful if DFM expects each node should receive flow files in a distributed manner. With 'cluster', it defines the flow is active while at least one node receives flow files actively. If NiFi is running as standalone mode, this should be set as 'node', if it 's' cluster ', NiFi logs a warning message and act as' node'scope.
Reporting Node Specify which node should send notification flow-files to inactive and activity.restored relationships. With 'all', every node in this cluster send notification flow-files. 'primary' means flow-files will be sent only from a primary node. If NiFi is running as standalone mode, this should be set as 'all', even if it 's' primary ', NiFi act as' all'.
Reset State on Restart When the processor gets started or restarted, if set to true, the initial state will always be active. Otherwise, the last reported flow state will be preserved.
Threshold Duration Determines how much time must elapse before considering the flow to be inactive
Wait for Activity When the processor gets started or restarted, if set to true, only send an inactive indicator if there had been activity beforehand. Otherwise send an inactive indicator even if there had not been activity beforehand.
Scopes Description
LOCAL MonitorActivity stores the last timestamp at each node as state, so that it can examine activity at cluster wide. If 'Copy Attribute' is set to true, then flow file attributes are also persisted. In local scope, it stores last known activity timestamp if the flow is inactive.
CLUSTER MonitorActivity stores the last timestamp at each node as state, so that it can examine activity at cluster wide. If 'Copy Attribute' is set to true, then flow file attributes are also persisted. In local scope, it stores last known activity timestamp if the flow is inactive.
Name Description
activity.restored This relationship is used to transfer an Activity Restored indicator when FlowFiles are routing to 'success' following a period of inactivity
inactive This relationship is used to transfer an Inactivity indicator when no FlowFiles are routed to 'success' for Threshold Duration amount of time
success All incoming FlowFiles are routed to success
Name Description
inactivityStartMillis The time at which Inactivity began, in the form of milliseconds since Epoch
inactivityDurationMillis The number of milliseconds that the inactivity has spanned
Property Description
ADLS Credentials Controller Service used to obtain Azure Credentials.
Conflict Resolution Strategy Indicates what should happen when a file with the same name already exists in the output directory
Destination Directory Name of the Azure Storage Directory where the files will be moved. The Directory Name cannot contain a leading '/'. The root directory can be designated by the empty string value. Non-existing directories will be created. If the original directory structure should be kept, the full directory path needs to be provided after the destination directory. e.g.: destdir/$\{azure.directory\}
Destination Filesystem Name of the Azure Storage File System where the files will be moved.
File Name The filename
Source Directory Name of the Azure Storage Directory from where the move should happen. The Directory Name cannot contain a leading '/'. The root directory can be designated by the empty string value.
Source Filesystem Name of the Azure Storage File System from where the move should happen.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
Name Description
failure Files that could not be written to Azure storage for some reason are transferred to this relationship
success Files that have been successfully written to Azure storage are transferred to this relationship
Name Description
azure.source.filesystem The name of the source Azure File System
azure.source.directory The name of the source Azure Directory
azure.filesystem The name of the Azure File System
azure.directory The name of the Azure Directory
azure.filename The name of the Azure File
azure.primaryUri Primary location for file content
azure.length The length of the Azure File
Property Description
attribute-cache-regex Any attributes whose names match this regex will be stored in the distributed cache to be copied to any FlowFiles released from a corresponding Wait processor. Note that the uuid attribute will not be cached regardless of this value. If blank, no attributes will be cached.
distributed-cache-service The Controller Service that is used to cache release signals in order to release files queued at a corresponding Wait processor
release-signal-id A value, or the results of an Attribute Expression Language statement, which will be evaluated against a FlowFile in order to determine the release signal cache key
signal-buffer-count Specify the maximum number of incoming flow files that can be buffered until signals are notified to cache service. The more buffer can provide the better performance, as it reduces the number of interactions with cache service by grouping signals by signal identifier when multiple incoming flow files share the same signal identifier.
signal-counter-delta A value, or the results of an Attribute Expression Language statement, which will be evaluated against a FlowFile in order to determine the signal counter delta. Specify how much the counter should increase. For example, if multiple signal events are processed at upstream flow in batch oriented way, the number of events processed can be notified with this property at once. Zero (0) has a special meaning, it clears target count back to 0, which is especially useful when used with Wait Releasable FlowFile Count = Zero (0) mode, to provide 'open-close-gate' type of flow control. One (1) can open a corresponding Wait processor, and Zero (0) can negate it as if closing a gate.
signal-counter-name A value, or the results of an Attribute Expression Language statement, which will be evaluated against a FlowFile in order to determine the signal counter name. Signal counter name is useful when a corresponding Wait processor needs to know the number of occurrences of different types of events, such as success or failure, or destination data source names, etc.
Name Description
failure When the cache cannot be reached, or if the Release Signal Identifier evaluates to null or empty, FlowFiles will be routed to this relationship
success All FlowFiles where the release signal has been successfully entered in the cache will be routed to this relationship
Name Description
notified All FlowFiles will have an attribute 'notified'. The value of this attribute is true, is the FlowFile is notified, otherwise false.
Field Description
`apiType` **(Required)** The query endpoint name in the Shopify Admin GraphQL API. This must match the root query field exactly (for example, `orders` for the [orders query](https://shopify.dev/docs/api/admin-graphql/2026-04/queries/orders), `products` for the [products query](https://shopify.dev/docs/api/admin-graphql/2026-04/queries/products)). Used as the key for lookup and override matching.
`tableName` **(Required)** The Snowflake destination table name.
`gidTypeName` The Shopify GID resource type (for example, `Order`, `Product`). Used for delete cascade and child record routing.
`additionalGidTypeNames` Array of additional GID type names that also map to this object. Use when Shopify returns the same resource under more than one GID type name, so that records are routed to the correct table regardless of which GID type appears in the response.
`graphqlFields` List of GraphQL selection fields. Each entry is a field name, a nested selection (for example, `"totalPriceSet { shopMoney { amount currencyCode } }"`), or an aliased field with arguments (for example, `"tier: metafield(key: \"custom.tier\") { value }"`). Aliases are useful for querying metafields by key.
`requiredQueryArgs` Map of fixed GraphQL argument key-value pairs appended to every query for this object. Use for endpoints that require non-standard arguments that aren't covered by the built-in query parameters (for example, `{"type": "SALES_CHANNEL"}`).
`supportsIncremental` Whether the object supports incremental sync. Default: `true`.
`incrementalField` The field used for watermark-based incremental queries (for example, `updatedAt`, `createdAt`).
`refreshStrategy` Controls the sync mode. `INCREMENTAL` (default) uses watermark-based incremental queries. `FULL_PERIODIC` performs a complete re-sync on each run instead. `PARENT_PIGGYBACKED` means this object is extracted from another object's query response and is not queried independently.
`supportsDeletes` Whether the connector should track deletion events for this object. Default: `false`.
`promotedColumns` Array of column definitions that extract values from the JSON payload into dedicated Snowflake columns. See [Promoted columns](#label-promoted-columns).
`childFields` Array of child connection definitions that are extracted into separate tables. See [Child fields](#label-child-fields).
`ignoredFields` List of field names to exclude from queries.
`supportsBulk` Whether the object supports bulk queries through the Shopify Bulk Operations API. Default: `true`.
`sortKeys` List of sort key values (from the object's corresponding `SortKeys` enum) used to order results during bulk and incremental queries. For example, `["UPDATED_AT", "ID"]`.
`sortKeyStyle` How sort key values are formatted in queries. `ENUM` (default) uses bare enum values (for example, `UPDATED_AT`). `STRING` uses quoted lowercase strings (for example, `"updated_at"`). Use `STRING` for object types that accept string sort keys, such as metaobjects.
Field Description
`name` The Snowflake column name (uppercase recommended).
`path` A JSONPath expression pointing to the value in the raw record (for example, `$.email`, `$.totalPriceSet.shopMoney.amount`).
`type` The column type: `string`, `integer`, `boolean`, `float`, `money`, `timestamp`, `date`, `id`, `gid`, or `json`.
Field Description
`fieldName` The GraphQL connection field name in the parent object (for example, `lineItems`).
`tableName` The Snowflake table name for the child records.
`gidTypeName` The Shopify GID type for the child (for example, `LineItem`).
`connectionType` `edges` (paginated connection) or `array` (inline array). Default: `edges`.
`pageSize` Number of child records to fetch per page. Default: `250`.
`graphqlFields` Explicit GraphQL selection set for the child table. If omitted, the connector parses the child's fields from the matching connection entry in the parent's `graphqlFields` list.
`promotedColumns` Array of promoted column definitions for the child table, using the same schema as top-level `promotedColumns`.
Property Description
Model Name The name of the OpenAI Model to use
OpenAI API Key The API Key for interacting with OpenAI
Prompt Text that can be used to guide the model's style or continue a previous audio segment. The text must be in English.
Response Format Specifies which format is desired for the output
Temperature The sampling temperature to use. The value must be a floating-point number between 0.0 and 1.0. A higher value, such as 0.8 will result in more of an interpreted translation, whereas a value of 0.0 will result in a more literal translation.
Name Description
failure FlowFiles that could not be transcribed are routed to this relationship.
success FlowFiles that have been successfully transcribed will be transferred to this relationship.
Cost category Description
Openflow (shown as **Openflow Compute BYOC** on your Snowflake bill) Cost based on the number of virtual CPU cores (vCPU) used by connector runtimes within your "bring your own cloud (BYOC)" environment. You are charged for active runtimes only. The compute used for Openflow management processes is excluded from this specific charge. Credits are billed per-second with a 60 second minimum. For an example of using of VCPU and the impacts of scaling see [](#label-openflow-byoc-scaling-overview). For information on the rate per vCPU per hour, refer to Table 1(g) in the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf). Additionally, the [METERING_DAILY_HISTORY](/sql-reference/account-usage/metering_daily_history) and [METERING_HISTORY](/sql-reference/account-usage/metering_history) views in the [](/sql-reference/account-usage) schema can provide additional details on Openflow compute costs using queries for *SERVICE_TYPE=OPENFLOW_COMPUTE_BYOC*. See [](/user-guide/cost-exploring-compute) for more information on exploring compute costs in Snowflake.
Infrastructure (only for BYOC configuration) Applicable only for BYOC deployments, you directly pay your cloud provider, for example, AWS, for the underlying infrastructure provisioned in your environment to run Openflow. This primarily includes compute (for runtimes you provision to run the connectors and for managing the runtimes), networking, and storage costs and will appear on your CSP bill. The EC2 compute requirements are illustrated in the following image: ![EC2 compute requirements](/static/images/connectivity/ec2-compute-reqs.png)
Ingestion Cost for loading data into Snowflake using services such as Snowpipe or Snowpipe Streaming, based on data volume. Appears on your Snowflake bill under respective ingestion services line items. Certain connectors may require a standard Snowflake warehouse, incurring additional warehouse costs. For example, database CDC connectors require a Snowflake warehouse for both initial snapshot and incremental Change Data Capture (CDC). You can schedule [](/sql-reference/sql/merge) operations to manage the compute cost.
Telemetry Data Ingest Standard Snowflake charges for sending logs and metrics to Openflow deployments and sending runtimes to your event table within Snowflake. The rate for credits per GB of telemetry data can be found in Table 5 in the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf).
MySQL type Snowflake type Notes
DECIMAL / NUMERIC NUMBER The maximum number of digits in DECIMAL format for MySQL is 65. For Snowflake, the maximum is 38. Precision is lost when exceeded.
INT / INTEGER INT
TINYINT / BOOL INT
SMALLINT INT
MEDIUMINT INT
BIGINT INT
YEAR INT
FLOAT FLOAT
DOUBLE FLOAT
VARCHAR TEXT
CHAR TEXT Trailing spaces aren't preserved.
TINYTEXT TEXT
TEXT TEXT
MEDIUMTEXT TEXT Supported up to the maximum entry size in Snowflake (16 MB).
LONGTEXT TEXT Supported up to the maximum entry size in Snowflake (16 MB).
ENUM TEXT Stored as a string value. For example, for `ENUM('one', 'two')` the possible values are `'one'` and `'two'`.
SET TEXT Stored as a comma-separated string in column declaration order. For example, for `SET('one', 'two')` the possible values are `''`, `'one'`, `'two'`, and `'one,two'`.
BIT TEXT Represented as a hexadecimal string. For example: `'83060c183060c183'`.
DATE DATE
DATETIME TIMESTAMP_NTZ
TIMESTAMP TIMESTAMP_TZ Values are stored in UTC.
TIME TIME
BINARY BINARY
VARBINARY BINARY
TINYBLOB BINARY
BLOB BINARY
MEDIUMBLOB BINARY Supported up to the maximum entry size in Snowflake (16 MB).
LONGBLOB BINARY Supported up to the maximum entry size in Snowflake (16 MB).
JSON VARIANT Supported up to the maximum entry size in Snowflake (16 MB).
Parameter Description
Starting Binlog Position - *Latest* (default): CDC stream reading starts at the latest available position and continues from there. - `Earliest`: Switches the incremental load to start, or restart reading from the earliest available binary log position.
Re-read Tables in State - `New` (default): While re-reading the binary log, only those events will be processed from new tables added to replication after the re-reading started. Other events are discarded until the connector reaches the position just before re-reading started. - `Any active`: Re-read and re-process events from any table currently in replication.
Database version Procedure
AWS RDS (Standard) Run the following:
```sql begin rdsadmin.rdsadmin_util.set_configuration( name => 'archivelog retention hours', value => '24'); end; / commit; ```
For more information see [Retaining archived redo logs](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.Oracle.CommonDBATasks.RetainRedoLogs.html).
AWS RDS Custom 1. Create a text file named `/opt/aws/rdscustomagent/config/redo_logs_custom_configuration.json`. 2. Add a JSON object to this file in the following format: `{"archivedLogRetentionHours" : "24"}`. For more information see [Restoring an RDS Custom for Oracle instance](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/custom-backup.pitr.html).
Database version Command
Oracle Database 21c and earlier Run the following: ```sql BEGIN DBMS_XSTREAM_AUTH.GRANT_ADMIN_PRIVILEGE( grantee => 'c##xstreamadmin', privilege_type => 'CAPTURE', grant_select_privileges => TRUE, container => 'ALL'); END; / ```
Oracle Database 23c and later Oracle Database 23c introduced a dedicated `XSTREAM_CAPTURE` system privilege. Run the following: ```sql GRANT XSTREAM_CAPTURE TO c##xstreamadmin CONTAINER=ALL; ```
Oracle type Snowflake type Notes
NUMBER NUMBER If precision is undefined, mapped to NUMBER(38, 19). If precision or scale exceeds Snowflake limitations (precision > 38 or scale > 37), the value is stored as TEXT.
FLOAT FLOAT
BINARY_FLOAT FLOAT
BINARY_DOUBLE FLOAT
CHAR TEXT
VARCHAR2 TEXT
NCHAR TEXT
NVARCHAR2 TEXT
CLOB TEXT Supported up to the maximum entry size in Snowflake (16 MB).
NCLOB TEXT Supported up to the maximum entry size in Snowflake (16 MB).
LONG TEXT
DATE TIMESTAMP_NTZ
TIMESTAMP TIMESTAMP_NTZ
TIMESTAMP WITH TIME ZONE TIMESTAMP_TZ
TIMESTAMP WITH LOCAL TIME ZONE TIMESTAMP_LTZ
INTERVAL TEXT
INTERVAL YEAR TO MONTH TEXT
INTERVAL DAY TO SECOND TEXT
RAW BINARY
LONG RAW BINARY
BLOB BINARY Supported up to the maximum entry size in Snowflake (16 MB).
BOOLEAN BOOLEAN
JSON VARIANT Supported up to the maximum entry size in Snowflake (16 MB).
XMLTYPE TEXT
Stage Action Result
Trial period (Day 1 to 60) Select **Cancel Trial** in the **Openflow for Oracle** dashboard before Day 60. Oracle XStream services stop. No charges are incurred.
36-month commitment (Day 61+) No action required. If the trial isn't canceled, the non-cancelable 36-month term begins automatically on Day 61. The license can't be canceled during this period. If your Snowflake agreement is terminated, the full remaining balance is due immediately.
Post-term S&M renewal (after month 36) The license fee drops to $0. The annual Support & Maintenance (S&M) fee continues. You may opt out of S&M renewal in the **Openflow for Oracle** dashboard. If you opt out and S&M coverage expires, the connector is permanently locked. To resume, you must purchase a new embedded license, which resets the 36-month commitment.
Parameter Description
Starting XStream Position - `Latest` (default): CDC stream reading starts at the latest available position and continues from there. - `Earliest`: Switches the incremental load to start, or restart reading from the earliest available XStream position.
Re-read Tables in State - `New` (default): While re-reading the redo logs, only those LCRs (Logical Change Records) will be processed from new tables added to replication after the re-reading started. Other LCRs are discarded until the connector reaches the position just before re-reading started. - `Any active`: Re-read and re-process events from any table currently in replication.
PostgreSQL type Snowflake type Notes
SMALLINT / INT2 INT
INTEGER / INT / INT4 INT
BIGINT / INT8 INT
SMALLSERIAL / SERIAL2 INT
SERIAL / SERIAL4 INT
BIGSERIAL / SERIAL8 INT
NUMERIC / DECIMAL NUMBER Scale and precision are preserved within Snowflake limitations. Negative scale is converted to scale 0 with adjusted precision.
REAL / FLOAT4 FLOAT
DOUBLE PRECISION / FLOAT8 FLOAT
MONEY FLOAT
BOOLEAN / BOOL BOOLEAN
CHARACTER / CHAR / BPCHAR TEXT
CHARACTER VARYING / VARCHAR TEXT
TEXT TEXT
BYTEA BINARY Supported up to the maximum entry size in Snowflake (16 MB).
DATE DATE
TIME / TIME WITHOUT TIME ZONE TIME
TIME WITH TIME ZONE / TIMETZ TIMESTAMP_TZ
TIMESTAMP / TIMESTAMP WITHOUT TIME ZONE TIMESTAMP_NTZ
TIMESTAMP WITH TIME ZONE / TIMESTAMPTZ TIMESTAMP_LTZ
INTERVAL TEXT
JSON VARIANT Supported up to the maximum entry size in Snowflake (16 MB).
JSONB VARIANT Supported up to the maximum entry size in Snowflake (16 MB).
UUID TEXT
XML TEXT
BIT TEXT
BIT VARYING / VARBIT TEXT
POINT TEXT
LINE TEXT
LSEG TEXT
BOX TEXT
PATH TEXT
POLYGON TEXT
CIRCLE TEXT
CIDR TEXT
INET TEXT
MACADDR TEXT
MACADDR8 TEXT
TSVECTOR TEXT
TSQUERY TEXT
PG_LSN TEXT
Parameter Description
Column Removal Strategy Defines the strategy to adopt when a column should be removed in the destination table based on the latest received schema. Three possible values: `Drop Column`, `Rename Column`, `Ignore Column`. - `Drop Column`: Drop the column from the Snowflake table. - `Rename Column`: Rename the column in the Snowflake table. - `Ignore Column`: Ignore the column, leaving it as is in the Snowflake table.
Connected App Key The private key used for JWT Bearer Flow authentication with Salesforce. Copy-paste the content of the `private.key` file generated during the [Salesforce setup](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce). This private key must correspond to the public certificate (`public.crt`) uploaded to the external client app in Salesforce. You can also use the next parameter to upload the private key file instead.
Connected App Key File Upload the `private.key` file by selecting the **Reference asset** checkbox, then upload the file as an asset and select the asset as the value for the parameter. This is an alternative to pasting the key content in the **Connected App Key** parameter.
Connected App Key Password Password set on the private key file during the [Salesforce setup](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce) steps.
Destination Database Name of the database in Snowflake where the Salesforce data will be replicated. The database must exist before starting the connector.
Destination Schema Name of the schema, in the database above, into which the connector will create tables for the Salesforce data to be added. The schema must exist before starting the connector.
Enable Journal Tables If set to `true`, a `JOURNAL_` table is created for each synced object that has a `SystemModstamp` or `LastModifiedDate` field. All changes are appended to the journal table, providing a full history of modifications. This is in addition to the main table that contains the merged data for the object. If a full reload occurs for a given object type, its journal table is also recreated. Default: `false`.
Enable Views Creation If set to `true`, a view named `_FORMULA_VW` is created for each synced object that contains formula fields. The view translates supported Salesforce formula expressions into Snowflake SQL, allowing you to query formula results directly without replicating formula field values from Salesforce. See [](#salesforce-formula-fields) for details. Default: `false`.
Filter Comma-separated list of objects to replicate from Salesforce, or regular expression to apply against all existing objects. The filter is case-insensitive, meaning that a filter set to `account` would match the object type `Account`. Example: `Account, Opportunity, Contact`. If left empty, all objects will be replicated. This is not recommended as there are usually thousands of objects in a Salesforce instance.
Incremental Offload Whether the processor should perform incremental offload. If `true`, the processor will only fetch the records that have been modified since the last query job submission by using a `WHERE` clause on the appropriate timestamp field. If `false`, all records will be fetched at every execution of the connector.
Initial Load Chunking If set to a value other than `NONE`, the initial data load will be split into multiple jobs based on this interval. On the first run for an object, the connector will query Salesforce to find the oldest record and use that as the starting point. Each subsequent job will query the next time chunk until caught up to the current time. Should be set with one of: `NONE`, `MONTHLY`, `QUARTERLY`, `YEARLY`. This is useful for large datasets where loading all historical data in a single query may time out, exceed API limits, or exceed the storage size of the content repository of the runtime. After catching up, the processor continues with normal incremental offload behavior.
OAuth2 Audience Audience to set in the JWT token. Set to `https://login.salesforce.com` for production environments or `https://test.salesforce.com` for sandboxes and test environments.
OAuth2 Client ID Should be set to the **Consumer Key** value retrieved during the Salesforce Setup steps.
OAuth2 Subject Should be set to the username of an admin-approved user for the application to interact with Salesforce APIs on behalf of this user.
OAuth2 Token Endpoint URL Endpoint to negotiate tokens via the JWT Bearer Flow. Example: `https://myCompany.my.salesforce.com/services/oauth2/token`.
Object Fields Filter JSON A JSON specifying which fields and field patterns should be included or excluded, per Salesforce object. Takes the form of an array with one item per object. Example 1: This will include all fields that end with 'name' in the 'Account' Salesforce object: `[ {"objectType":"Account", "includedPattern":".*name"} ]` Example 2: This will include the fields Id, Name, and Revenue in the 'Account' Salesforce object: `[ {"objectType":"Account", "included": ["Id", "Name", "Revenue"]} ]` `excluded` and `excludedPattern` are also available for configuring the filters.
Object Identifier Resolution Determines if schema / table / column names are treated as case-sensitive or case-insensitive. One of: `CASE_INSENSITIVE` / `CASE_SENSITIVE`. Changing this parameter value will require clearing the state and doing a full reload of all objects.
Removed Column Name Suffix Suffix added to the column name when the parameter **Column Removal Strategy** is set to `Rename Column`. Default: `__deleted`.
Run Schedule Frequency at which the connector will check for updates in Salesforce for configured objects via the **Filter** parameter. Default: `15 minutes`.
Salesforce Instance Hostname of the Salesforce instance including the domain name. Do not include the protocol prefix (`https://`). For example, use `myCompany.my.salesforce.com`.
Snowflake Account Identifier Snowflake account name formatted as `[organization-name]-[account-name]` where data will be persisted. Example: `PM-CONNECTORS`.
Snowflake Username The name of the service user that the connector uses to connect to Snowflake. The service user is required only when using the `KEY_PAIR` authentication strategy (Openflow BYOC only).
Snowflake Private Key The RSA Private Key that the connector uses for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line starts with `-----BEGIN PRIVATE`. This is required only when using the `KEY_PAIR` authentication strategy (Openflow BYOC only). You may also use the next parameter to upload the private key to the Openflow runtime instead.
Snowflake Private Key File The file containing the RSA Private Key that the connector uses for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line starts with `-----BEGIN PRIVATE`. Required only when using the `KEY_PAIR` authentication strategy (Openflow BYOC only). Select the **Reference asset** checkbox to upload the private key file and store it securely in the Openflow runtime.
Snowflake Private Key Password The password associated with the Snowflake Private Key File (if encrypted). This is required only when using the `KEY_PAIR` authentication strategy (Openflow BYOC only).
Snowflake Role Name of the Snowflake role used during query execution. When using `SNOWFLAKE_MANAGED`, this is the Snowflake Role for Openflow Runtimes. When using `KEY_PAIR` (Openflow BYOC only), this is the role assigned to the specified Snowflake username.
Snowflake Authentication Strategy Authentication strategy for the connector to connect to Snowflake. Using `SNOWFLAKE_MANAGED` (default) uses the Snowflake managed token associated with the specified Snowflake Runtime Role. If using Openflow BYOC, you can also use `KEY_PAIR` to specify a specific user and role via a custom Key Pair.
Snowflake Warehouse The Snowflake warehouse used to run queries.
Special Objects Filter Comma-separated list of objects to offload from Salesforce (using direct API access), or regular expression to apply against all existing objects. The filter is case-insensitive, meaning that a filter set to `account` would match the object type `Account`. This filter should only be used for objects that are **not** supported by the Salesforce Bulk API such as knowledge data, for example. This parameter should not overlap with the parameter **Filter**. Example: `Knowledge.*`
Category Salesforce function Snowflake equivalent
Logical `IF` `CASE WHEN ... THEN ... ELSE ... END`
Logical `CASE` `CASE ... WHEN ... THEN ... ELSE ... END`
Logical `AND` / `OR` / `NOT` `AND` / `OR` / `NOT`
Null handling `ISBLANK` `LENGTH(COALESCE(expr, '')) = 0`
Null handling `ISNULL` `expr IS NULL`
Null handling `NULLVALUE` `COALESCE`
Null handling `BLANKVALUE` `CASE WHEN ... IS NULL OR LENGTH(...) = 0 THEN ... END`
Text `LEFT` `LEFT`
Text `RIGHT` `RIGHT`
Text `MID` `SUBSTR`
Text `LEN` `LENGTH`
Text `SUBSTITUTE` `REPLACE`
Text `TRIM` `TRIM`
Text `UPPER` `UPPER`
Text `LOWER` `LOWER`
Text `CONTAINS` `CONTAINS`
Text `BEGINS` `STARTSWITH`
Text `FIND` `CHARINDEX`
Text `LPAD` `LPAD`
Text `RPAD` `RPAD`
Text `BR` Newline character literal
Conversion `TEXT` `CAST(... AS STRING)`
Conversion `VALUE` `TRY_CAST(... AS NUMBER)`
Math `ABS` `ABS`
Math `ROUND` `ROUND`
Math `CEILING` `CEIL`
Math `FLOOR` `FLOOR`
Math `MOD` `MOD`
Math `SQRT` `SQRT`
Math `MAX` `GREATEST`
Math `MIN` `LEAST`
Math `LOG` `LOG(10, ...)`
Math `EXP` `EXP`
Math `LN` `LN`
Date and time `NOW` `CURRENT_TIMESTAMP()`
Date and time `TODAY` `CURRENT_DATE()`
Date and time `YEAR` `YEAR`
Date and time `MONTH` `MONTH`
Date and time `DAY` `DAY`
Date and time `DATEVALUE` `TO_DATE`
Date and time `DATETIMEVALUE` `TO_TIMESTAMP`
Date and time `ADDMONTHS` `DATEADD(MONTH, ...)`
Picklist `ISPICKVAL` `COALESCE(field, '') = COALESCE(value, '')`
Failure reason Description
`FUNCTION_NOT_SUPPORTED` The formula uses a function that has no Snowflake equivalent or that is specific to the Salesforce UI. This includes: `IMAGE`, `HYPERLINK`, `URLFOR`, `HTMLENCODE`, `JSENCODE`, `LINKTO`, `GEOLOCATION`, `DISTANCE`, `VLOOKUP`, `REGEX`, `PREDICT`, `GETSESSIONID`, `GETRECORDIDS`, `REQUIRESCRIPT`, `ISCHANGED`, `ISNEW`, `ISCLONE`, `PRIORVALUE`.
`GLOBAL_VARIABLE_NOT_SUPPORTED` The formula references a Salesforce global variable such as `$User.Name`, `$Organization.Name`, or `$Profile.Name`. These variables have no equivalent in Snowflake.
`FORMULA_CHAIN_NOT_SUPPORTED` The formula references another formula field. Chained formula references (a formula field that depends on another formula field) are not supported.
`ROLLUP_NOT_SUPPORTED` The field is a rollup summary field rather than a formula field. Rollup summaries aggregate data from child records and cannot be expressed as a simple SQL view.
`LOOKUP_NOT_SYNCED` The formula references a relationship that cannot be resolved from the Salesforce object metadata. This typically occurs when the relationship name in the formula does not match any known relationship on the object.
`ID_FORMAT_MISMATCH` The formula contains a hardcoded 15-character Salesforce ID. Salesforce uses 15-character IDs internally, but the Bulk API returns 18-character IDs. Formulas with hardcoded 15-character IDs cannot be reliably translated.
`COMPOUND_FIELD_REFERENCE` The formula references a compound field (such as `MailingAddress`) that is not stored as a single column in Snowflake.
`PARSE_ERROR` The formula expression could not be parsed. This might indicate a syntax that the connector does not yet recognize.
`UNSUPPORTED_SYNTAX` The formula uses a syntax construct that is recognized but cannot be translated (for example, an `IF` function with fewer than three arguments).
SQL Server type Snowflake type Notes
TINYINT INT
SMALLINT INT
INT INT
BIGINT INT
DECIMAL NUMBER If precision exceeds Snowflake limitations (precision > 38), the value is stored as TEXT.
NUMERIC NUMBER If precision exceeds Snowflake limitations (precision > 38), the value is stored as TEXT.
SMALLMONEY NUMBER
MONEY NUMBER
REAL FLOAT
FLOAT FLOAT
BIT BOOLEAN
CHAR TEXT
VARCHAR TEXT
NCHAR TEXT
NVARCHAR TEXT
TEXT TEXT
NTEXT TEXT
DATE DATE
TIME TIME
SMALLDATETIME TIMESTAMP_NTZ
DATETIME TIMESTAMP_NTZ
DATETIME2 TIMESTAMP_NTZ
DATETIMEOFFSET TIMESTAMP_TZ
BINARY BINARY
VARBINARY BINARY
IMAGE BINARY Supported up to the maximum entry size in Snowflake (16 MB).
JSON VARIANT Supported up to the maximum entry size in Snowflake (16 MB).
VECTOR VARIANT
XML TEXT
UNIQUEIDENTIFIER TEXT
ROWVERSION / TIMESTAMP TEXT
SQL_VARIANT TEXT
GEOGRAPHY TEXT Values of this type are inserted as NULL.
GEOMETRY TEXT Values of this type are inserted as NULL.
Parameter Description
Starting Change Tracking Position - `Latest` (default): change tracking table reading starts at the latest available position and continues from there. - `Earliest`: Switches the incremental load to start, or restart reading from the earliest available change tracking table positions.
Re-read Tables in State - `New` (default): Only new tables, added after the starting position was switched to `Earliest`, will have their change tracking tables read from the earliest available positions. Tables that started replication before the configuration change will continue reading from their last positions. - `Any active`: Re-read and re-process changes from any table currently in replication.
Connector Description
[Openflow Connector for Amazon Ads](/user-guide/data-integration/openflow/connectors/amazon-ads/about) Bring data from Amazon Ads for Ad performance statistics and insights
[Openflow Connector for Box](/user-guide/data-integration/openflow/connectors/box/about) Ingest Box content for your own custom processing in Snowflake Ingest Box content and make it ready for chat in your AI assistants with Snowflake Cortex Use Box AI to extract metadata from Box content for enrichment in Snowflake Add enriched metadata from Snowflake to content in Box
[Openflow Connector for Google Ads](/user-guide/data-integration/openflow/connectors/google-ads/about) Import metrics from Google Ads for performance tracking and optimization
[Openflow Connector for Google BigQuery](/user-guide/data-integration/openflow/connectors/google-big-query/about) Replicate datasets and tables from Google BigQuery into Snowflake with incremental change capture
[Openflow Connector for Google Drive](/user-guide/data-integration/openflow/connectors/google-drive/about) Ingest Google Drive content and make it ready for chat in your AI assistants with Snowflake Cortex Ingest Google Drive content for your own custom processing in Snowflake
[Openflow Connector for Google Sheets](/user-guide/data-integration/openflow/connectors/google-sheets/about) Load data from Google sheets into Snowflake tables for reporting, analytics, and insights
[Openflow Connector for HubSpot](/user-guide/data-integration/openflow/connectors/hubspot/about) Get HubSpot CRM data into Snowflake for reporting, analytics, and insights
[Openflow Connector for Jira Cloud](/user-guide/data-integration/openflow/connectors/jira-cloud/about) Ingest Jira issues, projects, comments, changelogs, worklogs, users, and agile boards into Snowflake for cross‐team visibility and deeper insights
[Openflow Connector for Kafka](/user-guide/data-integration/openflow/connectors/kafka/about) Ingest real‐time events from Apache Kafka into Snowflake for near real-time analytics
[Openflow Connector for Kinesis Data Streams](/user-guide/data-integration/openflow/connectors/kinesis/about) Ingest real‐time events from Amazon Kinesis Data Streams into Snowflake for near real-time analytics
[Openflow Connector for LinkedIn Ads](/user-guide/data-integration/openflow/connectors/linkedin-ads/about) Import campaign performance data from LinkedIn Ads to Snowflake for reporting, analytics, and insights
[Openflow Connector for Meta Ads](/user-guide/data-integration/openflow/connectors/meta-ads/about) Bring Meta (Facebook) Ads data to unify and analyze your marketing performance
[Openflow Connector for Microsoft Dataverse](/user-guide/data-integration/openflow/connectors/dataverse/about) Integrate data from Microsoft Power Platform and Dynamics 365 applications with Snowflake for holistic business insights
[Openflow Connector for MySQL](/user-guide/data-integration/openflow/connectors/mysql/about) CDC replication of MySQL tables into Snowflake for comprehensive, centralized reporting
[Openflow Connector for Oracle](/user-guide/data-integration/openflow/connectors/oracle/about) CDC replication of Oracle database tables into Snowflake for comprehensive, centralized reporting
[Openflow Connector for PostgreSQL](/user-guide/data-integration/openflow/connectors/postgres/about) CDC replication of PostgreSQL data with Snowflake for comprehensive, centralized reporting
[Openflow Connector for Salesforce Bulk API](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about) Ingests Salesforce objects into Snowflake, with support for incremental change detection
[Openflow Connector for SharePoint](/user-guide/data-integration/openflow/connectors/sharepoint/about) Ingest SharePoint content and make it ready for chat in your AI assistants with Snowflake Cortex Ingest SharePoint content for your own custom processing in Snowflake
[Openflow Connector for Shopify](/user-guide/data-integration/openflow/connectors/shopify/about) Replicate Shopify store data into Snowflake using the Admin GraphQL API for e-commerce analytics and reporting
[Openflow Connector for Slack](/user-guide/data-integration/openflow/connectors/slack/about) Pull Slack messages and metadata into Snowflake for searchable, organization‐wide insights
[Openflow Connector for Snowflake to Kafka](/user-guide/data-integration/openflow/connectors/snowflake-to-kafka/about) CDC replication of Snowflake tables into Apache Kafka for real-time insights distribution and event-driven architectures
[Openflow Connector for SQL Server](/user-guide/data-integration/openflow/connectors/sql-server/about) CDC replication of Microsoft SQL Server data with Snowflake for comprehensive, centralized reporting
[Openflow Connector for Veeva Vault](/user-guide/data-integration/openflow/connectors/veeva-vault/about) Replicate Veeva Vault data into Snowflake using Direct Data archives for analytics and reporting
[Openflow Connector for Workday](/user-guide/data-integration/openflow/connectors/workday/about) Get Workday data into Snowflake using Report-as-a-Service (RaaS) streams for enterprise-level analytics and planning
Runtime Activity Snowflake costs Cloud costs
No runtimes None Openflow Control Pool x 1 node = 1 CPU_X64_S instance-hour None
1 small runtime (1vCPU) (min=1 max=2) Active for 1 hour. Runtime does not scale to 2. Openflow Control Pool x 1 node + Small Openflow Compute Pool (CPU_X64_S) x 1 node = 2 CPU_X64_S instance-hours None
2 small runtime (1 vCPU) (min/max=2) 1 large runtime (8 vCPU) (min/max=10) Small: 4 nodes active for 1 hour Large: 10 nodes active for 1 hour Openflow Control Pool x 1 node + Small Openflow Compute Pool (CPU_X64_S) x 2 node + Large Openflow Compute Pool (CPU_X64_L) x 4 nodes = 3 CPU_X64_S instance-hours + 4 CPU_X64_L instance-hours None
1 medium (4vCPU) (min=1 max=2) First 20 minutes 1 node is running After 20 minutes, scales to 2 nodes After 40 minutes, scales back to 1 node Total 1 hour Openflow Control Pool x 1 node + Medium Openflow Compute Pool (CPU_X64_SL) x 1 node = 1 CPU_X64_S instance-hour + 1 CPU_X64_SL instance-hour None
1 medium (4vCPU) (min/max=2) First 30 minutes 2 nodes running Suspends after the first 30 minutes Openflow Control Pool x 1 node + Medium Openflow Compute Pool (CPU_X64_SL) x 1 node x 1/2 hour = 1 CPU_X64_S instance-hour + 1/2 CPU_X64_SL instance-hour None
Property Description
Maximum Batch Content Size Maximum combined content size of FlowFiles to package into one output FlowFile. Note, that FlowFiles whose content exceeds this limit are packaged separately.
max-batch-size Maximum number of FlowFiles to package into one output FlowFile.
Name Description
original The FlowFiles that were used to create the package are sent to this relationship
success The packaged FlowFile is sent to this relationship
Name Description
mime.type The mime.type will be changed to application/flowfile-v3
Property Description
Aggregation Results Format Format of Aggregation output.
Aggregation Results Split Output a flowfile containing all aggregations or one flowfile for each individual aggregation.
Aggregations One or more query aggregations (or "aggs"), in JSON syntax. Ex: \{"items": \{"terms": \{"field": "product", "size": 10\}\}\}
Client Service An Elasticsearch client service to use for running queries.
Fields Fields of indexed documents to be retrieved, in JSON syntax. Ex: ["user.id", "http.response.*", \{"field": "@timestamp", "format": "epoch_millis"\}]
Index The name of the index to use.
Max JSON Field String Length The maximum allowed length of a string value when parsing a JSON document or attribute.
Output No Hits Output a "hits" flowfile even if no hits found for query. If true, an empty "hits" flowfile will be output even if "aggregations" are output.
Pagination Keep Alive Pagination "keep_alive" period. Period Elasticsearch will keep the scroll/pit cursor alive in between requests (this is not the time expected for all pages to be returned, but the maximum allowed time for requests between page retrievals).
Pagination Type Pagination method to use. Not all types are available for all Elasticsearch versions, check the Elasticsearch docs to confirm which are applicable and recommended for your service.
Query A query in JSON syntax, not Lucene syntax. Ex: \{"query":\{"match":\{"somefield":"somevalue"\}\}\}. If this parameter is not set, the query will be read from the flowfile content. If the query (property and flowfile content) is empty, a default empty JSON Object will be used, which will result in a "match_all" query in Elasticsearch.
Query Attribute If set, the executed query will be set on each result flowfile in the specified attribute.
Query Clause A "query" clause in JSON syntax, not Lucene syntax. Ex: \{"match":\{"somefield":"somevalue"\}\}. If the query is empty, a default JSON Object will be used, which will result in a "match_all" query in Elasticsearch.
Query Definition Style How the JSON Query will be defined for use by the processor.
Script Fields Fields to created using script evaluation at query runtime, in JSON syntax. Ex: \{"test1": \{"script": \{"lang": "painless", "source": "doc[ 'price'].value * 2"\}\}, "test2": \{"script": \{"lang": "painless", "source": "doc[ 'price'].value * params.factor", "params": \{"factor": 2.0\}\}\}\}
Search Results Format Format of Hits output.
Search Results Split Output a flowfile containing all hits or one flowfile for each individual hit or one flowfile containing all hits from all paged responses.
Size The maximum number of documents to retrieve in the query. If the query is paginated, this "size" applies to each page of the query, not the "size" of the entire result set.
Sort Sort results by one or more fields, in JSON syntax. Ex: [\{"price" : \{"order" : "asc", "mode" : "avg"\}\}, \{"post_date" : \{"format": "strict_date_optional_time_nanos"\}\}]
Type The type of this document (used by Elasticsearch for indexing and searching).
Name Description
aggregations Aggregations are routed to this relationship.
failure All flowfiles that fail for reasons unrelated to server availability go to this relationship.
hits Search hits are routed to this relationship.
original All original flowfiles that don't cause an error to occur go to this relationship.
Name Description
mime.type application/json
aggregation.name The name of the aggregation whose results are in the output flowfile
aggregation.number The number of the aggregation whose results are in the output flowfile
page.number The number of the page (request), starting from 1, in which the results were returned that are in the output flowfile
hit.count The number of hits that are in the output flowfile
elasticsearch.query.error The error message provided by Elasticsearch if there is an error querying the index.
Display Name API Name Default Value Allowable Values Description
Write Target File Size * Write Target File Size 512 MB Controls the size of files generated to target about this many bytes
Property Description
Granularity Output flow file for each Record, Chunk, or File encountered in the event log
Name Description
bad chunk Any bad chunks of records will be transferred to this relationship in their original binary form
failure Any FlowFile that encountered an exception during conversion will be transferred to this relationship with as much parsing as possible done
original The unmodified input FlowFile will be transferred to this relationship
success Any FlowFile that was successfully converted from evtx to XML
Name Description
filename The output filename
mime.type The output filetype (application/xml for success and failure relationships, original value for bad chunk and original relationships)
Property Description
Ranges The comma-separated Excel ranges to parse in the A1 notation. For example: Sheet1!A1:B2,Sheet2!D4:E5,Sheet3. Ranges in R1C1 and 3-D reference style are not allowed. The value can't be empty.
Name Description
failure FlowFile with errors occurred while parsing ranges.
success FlowFile annotated with attributes containing parsed Excel range. For each range a separate FlowFile is produced.
Name Description
range.formula Single range formula that was used to produce other attributes, e.g. Sheet1!A1:B2.
range.sheetname Parsed sheet name.
range.rows.starting Starting row (numbered from 1) of parsed range.
range.rows.ending Ending row of parsed range.
range.columns.starting Number of starting column of parsed range.
range.columns.ending Number of ending column of parsed range.
Property Description
Character Set Specifies which character set of the Syslog messages
Name Description
failure Any FlowFile that could not be parsed as a Syslog message will be transferred to this Relationship without any attributes being added
success Any FlowFile that is successfully parsed as a Syslog message will be to this Relationship.
Name Description
syslog.priority The priority of the Syslog message.
syslog.severity The severity of the Syslog message derived from the priority.
syslog.facility The facility of the Syslog message derived from the priority.
syslog.version The optional version from the Syslog message.
syslog.timestamp The timestamp of the Syslog message.
syslog.hostname The hostname or IP address of the Syslog message.
syslog.sender The hostname of the Syslog server that sent the message.
syslog.body The body of the Syslog message, everything after the hostname.
Property Description
Character Set Specifies which character set of the Syslog messages
include_policy If true, then the Syslog Message body will be included in the attributes.
nil_policy Defines how NIL values are handled for header fields.
Name Description
failure Any FlowFile that could not be parsed as a Syslog message will be transferred to this Relationship without any attributes being added
success Any FlowFile that is successfully parsed as a Syslog message will be to this Relationship.
Name Description
syslog.priority The priority of the Syslog message.
syslog.severity The severity of the Syslog message derived from the priority.
syslog.facility The facility of the Syslog message derived from the priority.
syslog.version The optional version from the Syslog message.
syslog.timestamp The timestamp of the Syslog message.
syslog.hostname The hostname or IP address of the Syslog message.
syslog.appname The appname of the Syslog message.
syslog.procid The procid of the Syslog message.
syslog.messageid The messageid the Syslog message.
syslog.structuredData Multiple entries per structuredData of the Syslog message.
syslog.sender The hostname of the Syslog server that sent the message.
syslog.body The body of the Syslog message, everything after the hostname.
Property Description
record-reader Specifies the Controller Service to use for reading incoming data
record-writer Specifies the Controller Service to use for writing out the records
Name Description
failure If a FlowFile cannot be partitioned from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship
original Once all records in an incoming FlowFile have been partitioned, the original FlowFile is routed to this relationship.
success FlowFiles that are successfully partitioned will be routed to this relationship
Name Description
record.count The number of records in an outgoing FlowFile
mime.type The MIME Type that the configured Record Writer indicates is appropriate
fragment.identifier All partitioned FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.index A one-up number that indicates the ordering of the partitioned FlowFiles that were created from a single parent FlowFile
fragment.count The number of partitioned FlowFiles generated from the parent FlowFile
segment.original.filename The filename of the parent FlowFile
<dynamic property name> For each dynamic property that is added, an attribute may be added to the FlowFile. See the description for Dynamic Properties for more information.
Display Name API Name Default Value Allowable Values Description
Certificate Authorities * Certificate Authorities PEM X.509 Certificate Authorities trusted for verifying peers in TLS communications containing one or more standard certificates
Certificate Authorities Source * Certificate Authorities Source PROPERTIES - Properties - System Source of information for loading trusted Certificate Authorities
Certificate Chain * Certificate Chain PEM X.509 Certificate Chain associated with Private Key starting with standard BEGIN CERTIFICATE header
Certificate Chain Location * Certificate Chain Location PEM X.509 Certificate Chain file location associated with Private Key starting with standard BEGIN CERTIFICATE header
Private Key * Private Key PEM Private Key encoded using either PKCS1 or PKCS8. Supported algorithms include ECDSA, Ed25519, and RSA
Private Key Location * Private Key Location PEM Private Key file location encoded using either PKCS1 or PKCS8. Supported algorithms include ECDSA, Ed25519, and RSA
Private Key Source * Private Key Source PROPERTIES - Undefined - Properties - Files Source of information for loading Private Key and Certificate Chain
TLS Protocol * TLS Protocol TLS - TLS - TLSv1.3 - TLSv1.2 TLS protocol version required for negotiating encrypted communications.
Node Size Recommended For Message Rate Capacity
Small (S) Low to moderate throughput scenarios Up to 27 MB/s per node
Medium (M) Moderate to high throughput scenarios Up to 135 MB/s per node
Large (L) High throughput scenarios Exceeding 135 MB/s per node. Up to 310 MB/s per node.
Average record size Approximate calculation Max Records Per Request
1 KB 1 MB / 1 KB 1000
200 bytes 1 MB / 200 bytes 5000
5 KB 1 MB / 5 KB 200
Node Size ConsumeKinesis Tasks PublishSnowpipeStreaming Tasks
Small (S) 2 1
Medium (M) 4 2
Large (L) 6 3
Node Size Recommended For Message Rate Capacity
Small (S) Low to moderate throughput scenarios Up to 18 MB/s per node
Medium (M) Moderate to high throughput scenarios Up to 145 MB/s per node
Large (L) High throughput scenarios Up to 250 MB/s per node
Node Size ConsumeKafka Tasks PublishSnowpipeStreaming Tasks
Small (S) 1 1
Medium (M) 4 2
Large (L) 8 2
Property Description
Database The Snowflake database containing the stage
Filename The filename of the file to perform OCR on, it must be uploaded to the stage prior to performing OCR. FlowFile attributes may be referenced via Expression Language.
Max Attribute Size The maximum size of the OCR results that can be written to an attribute. If the OCR results exceed this, the FlowFile will be routed to failure.
OCR Mode Specifies how document text and structure should be extracted. In 'OCR' mode, only raw text content is extracted, ignoring formatting and table structures. In 'LAYOUT' mode, the output preserves table structures as markdown.
Output Strategy Determines response output destination
Results Attribute The name of the attribute to write the OCR response to.
Schema The Snowflake schema containing the stage
Snowflake Connection Service Database Connection Service for accessing Snowflake
Stage The Snowflake stage where PDFs will be temporarily stored. The stage must have server-side encryption enabled. FlowFile attributes may be referenced via Expression Language
Name Description
empty FlowFiles for which OCR results are empty
failure FlowFiles that cannot be processed are routed to this relationship
success FlowFiles that are successfully processed (with non-empty OCR results) are routed to this relationship
Name Description
mime.type The MIME type of the output content (text/plain when output strategy is FLOW_FILE)
snowflake.error.information Contains error information if Snowflake Cortex OCR operation returns an error
Property Description
Table State Service A service containing currently replicated tables and their states
Name Description
existing FlowFile with qualified table name that is already being replicated
failure If a FlowFile attribute cannot be read or is incorrect, it will be routed to this Relationship.
new FlowFile with qualified table name that was is not replicated
stale FlowFile with qualified table name that used to be replicated but no longer is, either because it was removed from source database or excluded by parameter
Name Description
source.schema.name Name of the schema of the table from which an event originated
source.table.name Name of the table from which an event originated
Display Name API Name Default Value Allowable Values Description
Access Token Scopes * Access Token Scopes catalog Comma-separated list of one or more OAuth 2 scopes requested for Access Tokens
Authentication Strategy * Authentication Strategy OAUTH2 - Bearer Authentication - OAuth 2.0 Strategy for authenticating with the Apache Iceberg Catalog over HTTP
Authorization Grant Type * Authorization Grant Type CLIENT_CREDENTIALS - Client Credentials OAuth 2.0 Authorization Grant Type for obtaining Access Tokens
Authorization Server URI * Authorization Server URI Authorization Server URI supporting OAuth 2
Bearer Token * Bearer Token Bearer Token for authentication to Apache Iceberg Catalog
Catalog URI * Catalog URI Apache Iceberg Catalog REST URI
Client ID * Client ID Client ID for OAuth 2 Client Credentials
Client Secret * Client Secret Client Secret for OAuth 2 Client Credentials
Warehouse Location Warehouse Location Apache Iceberg Catalog Warehouse location or identifier
Property Description
Anthropic API Key The API Key for authenticating to Anthropic
Assistant Message The assistant message to send to Anthropic. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}. The assistant message is added last
Image MIME Type The MIME type of the image in the FlowFile content. Supported types are image/jpeg, image/png, image/gif, and image/webp.
Max File Size The maximum size of a FlowFile that can be sent to Anthropic as an image. If the FlowFile is larger than this, it will be routed to 'failure'.
Max Tokens The maximum number of tokens to generate
Model Name The name of the Anthropic model
Output Strategy Determines response output destination
Prompt Type The type of prompt to send to Anthropic. TEXT to send a simple prompt. IMAGE to send an image first and then a prompt. Use JSON for advanced use of Anthropic's /v1/messages endpoint.
Response Format The format of the response from Anthropic
Results Attribute The name of the attribute to write the response to.
Stop Sequences A comma delimited list of strings act as stop sequences. The model will halt after encountering one of the stop sequences.
System Message The system message to send to Anthropic. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}
Temperature The temperature to use for generating the response. Defaults to 1.0. Ranges from 0.0 to 1.0. Use temperature closer to 0.0 for analytical / multiple choice, and closer to 1.0 for creative and generative tasks.
Top K The top K value to use for generating the response. Only sample from the top K options for each subsequent token. Recommended for advanced use cases only. You usually only need to use temperature.
Top P The top P value to use for generating the response. Top P is for nucleus sampling, we compute the cumulative distribution over all the options for each subsequent token in decreasing probability order and cut it off once it reaches a particular probability specified by top_p. Recommended for advanced use cases only. You usually only need to use temperature.
User ID The user id to set in the request metadata
User Message The user message to send to Anthropic. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}. The user message is added first, unless an image is present.
Web Client Service The Web Client Service to use for communicating with Anthropic
Name Description
failure If unable to obtain a valid response from Anthropic, the original FlowFile will be routed to this relationship
retry If a 5XX response from Anthropic is returned, the original FlowFile will be routed to this relationship
success The response from Anthropic is routed to this relationship
Name Description
anthropic.usage.inputTokens The number of input tokens read in the request.
anthropic.usage.outputTokens The number of output tokens generated in the response.
anthropic.chat.completion.id A unique id assigned to the conversation
anthropic.chat.completion.stop.reason The reason that we stopped.
anthropic.chat.completion.stop.sequence Which custom stop sequence was generated, if any, may be 'null'.
mime.type The mime type of the response.
filename An updated filename for the response.
Property Description
API Key The API key for authenticating to the Azure OpenAI service
Deployment Name The name of the OpenAI model deployment
Detail Level The image detail level that OpenAI should use for processing the image. Low detail will be less expensive and lower latency, while a high level may provide better results.
Image MIME Type The MIME type of the image
Image URL The URL of the image to send to OpenAI. If not specified, the contents of the FlowFile will be used as the image.
Max File Size The maximum size of a FlowFile that can be sent to OpenAI as an image. If the FlowFile is larger than this, it will be routed to 'failure'.
Max Tokens The maximum number of tokens to generate
OpenAI Service Name The name of the OpenAI service to use
Prompt Type The type of prompt to send to OpenAI
Response Format The format of the response from OpenAI
Results Attribute The name of the attribute to write the response to. If unset, the response will be written to the FlowFile content.
Seed The seed to use for generating the response
System Message The system message to send to OpenAI. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}
Temperature The temperature to use for generating the response.
Top P The top P value to use for generating the response
User Your end user, sent to OpenAI for monitoring and detection of abuse
User Message The user message to send to OpenAI. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}
Web Client Service The Web Client Service to use for communicating with OpenAI
Name Description
failure If unable to obtain a valid response from Azure OpenAI, the original FlowFile will be routed to this relationship
success The response from Azure OpenAI is routed to this relationship
Property Description
Assistant Message The assistant message to send to the LLM. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}. The assistant message is added last
LLM Provider Service The provider service for sending evaluation prompts to LLM
Output Strategy Determines response output destination
Results Attribute The name of the attribute to write the response to.
System Message The system message to send to the LLM. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}. The system message is added first.
User Message The user message to send to the LLM. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}.
Name Description
failure FlowFiles that cannot be processed are routed to this relationship
success FlowFiles that are successfully processed are routed to this relationship
Property Description
Detail Level The image detail level that OpenAI should use for processing the image. Low detail will be less expensive and lower latency, while a high level may provide better results.
Image MIME Type The MIME type of the image
Image Model Name The name of the OpenAI model
Image URL The URL of the image to send to OpenAI. If not specified, the contents of the FlowFile will be used as the image.
Max File Size The maximum size of a FlowFile that can be sent to OpenAI as an image. If the FlowFile is larger than this, it will be routed to 'failure'.
Max Tokens The maximum number of tokens to generate
OpenAI API Key The API Key for authenticating to OpenAI
OpenAI Organization The organization to use for OpenAI
Prompt Type The type of prompt to send to OpenAI
Response Format The format of the response from OpenAI
Results Attribute The name of the attribute to write the response to. If unset, the response will be written to the FlowFile content.
Seed The seed to use for generating the response
System Message The system message to send to OpenAI. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}
Temperature The temperature to use for generating the response.
Text Model Name The name of the OpenAI model
Top P The top P value to use for generating the response
User Your end user, sent to OpenAI for monitoring and detection of abuse
User Message The user message to send to OpenAI. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}
Web Client Service The Web Client Service to use for communicating with OpenAI
Name Description
failure If unable to obtain a valid response from OpenAI, the original FlowFile will be routed to this relationship
success The response from OpenAI is routed to this relationship
Property Description
Enable Cortex Guardrails Filters potentially unsafe and harmful responses from a language model. Either true or false.
Max Tokens The maximum number of tokens to generate
Output Strategy Determines response output destination
Response Format The format of the response from Snowflake Cortex
Results Attribute The name of the attribute to write the response to.
Snowflake Connection Service Database Connection Service for accessing Snowflake
System Message The system message to send to Snowflake Cortex. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}
Temperature The temperature to use for generating the response.
Text Model Name The name of the Snowflake Cortex model
Top P The top P value to use for generating the response
User Message The user message to send to Snowflake Cortex. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}
Name Description
failure If unable to obtain a valid response from Snowflake Cortex, the original FlowFile will be routed to this relationship
success The response from Snowflake Cortex is routed to this relationship
Property Description
GCP Credentials Service The Controller Service used to obtain Google Cloud Platform credentials.
GCP Location The location to configure the Vertex client with
GCP Project ID The project ID to configure the Vertex client with
Max File Size The maximum size of a FlowFile that can be sent to Vertex as an image. If the FlowFile is larger than this, it will be routed to 'failure'.
Max Tokens The maximum number of tokens to generate
Media MIME Type The MIME type of the media in the FlowFile content. Supported media types are listed here: [https://firebase.google.com/docs/vertex-ai/input-file-requirements](https://firebase.google.com/docs/vertex-ai/input-file-requirements)
Model Name The name of the Vertex model
Output Strategy Determines response output destination
Prompt Type The type of prompt to send to Vertex. Text to send a simple prompt. Media to send a multimedia type first followed by a text prompt.
Response Format The format of the response from Vertex
Results Attribute The name of the attribute to write the response to.
Stop Sequences A comma delimited list of strings act as stop sequences. The model will halt after encountering one of the stop sequences.
System Message The system message to send to Vertex. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}
Temperature The temperature to use for generating the response. Defaults to 1.0. Ranges from 0.0 to 1.0. Use temperature closer to 0.0 for analytical / multiple choice, and closer to 1.0 for creative and generative tasks.
Top K The top K value to use for generating the response. Only sample from the top K options for each subsequent token. Recommended for advanced use cases only. You usually only need to use temperature.
Top P The top P value to use for generating the response. Top P is for nucleus sampling, we compute the cumulative distribution over all the options for each subsequent token in decreasing probability order and cut it off once it reaches a particular probability specified by top_p. Recommended for advanced use cases only. You usually only need to use temperature.
User Message The user message to send to Vertex. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}. The user message is added first, unless an image is present.
Name Description
failure If unable to obtain a valid response from Vertex, the original FlowFile will be routed to this relationship
success The response from Vertex is routed to this relationship
Name Description
vertex.usage.inputTokens The number of input tokens read in the request.
vertex.usage.outputTokens The number of output tokens generated in the response.
vertex.chat.completion.id A unique id assigned to the conversation
mime.type The mime type of the response.
filename An updated filename for the response.
Display Name API Name Default Value Allowable Values Description
Configuration File * configuration-file A configuration file
Required Permission Explanation
read filesystem Provides operator the ability to read from any file that NiFi has access to.
Display Name API Name Default Value Allowable Values Description
Message Type * Message Type Fully qualified name of the Protocol Buffers message type including its package (eg. mypackage.MyMessage). The .proto files configured in 'Proto Directory' must contain the definition of this message type.
Proto Directory * Proto Directory Directory containing Protocol Buffers message definition (.proto) file(s).
Schema Access Strategy * Schema Access Strategy generate-from-proto-file - Use 'Schema Name' Property - Use 'Schema Text' Property - Schema Reference Reader - Generate from Proto file Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property
Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier
Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry
Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema
Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
Parameter Description
`connector_name` Fully qualified name of the Zerocopy Connector (e.g., `my_db.my_schema.my_sap_connector`).
`snowflake_share_name` Name of the Snowflake share, also the name of the share on the SAP® BDC side.
`open_resource_discovery_metadata` A JSON object describing the data product in SAP® BDC. Contains the following fields: - `title`: Display name of the data product. - `shortDescription`: Brief summary of the data product. - `description`: Full description of the data product.
`csn_document_json` The SAP® Core Schema Notation (CSN) JSON payload describing the structure of the data product. Provided by the caller.
Property Description
AMQP Version AMQP Version. Currently only supports AMQP v0.9.1.
Brokers A comma-separated list of known AMQP Brokers in the format <host>:<port> (e.g., localhost:5672). If this is set, Host Name and Port are ignored. Only include hosts from the same AMQP cluster.
Client Certificate Authentication Enabled Authenticate using the SSL certificate rather than user name/password.
Exchange Name The name of the AMQP Exchange the messages will be sent to. Usually provided by the AMQP administrator (e.g., 'amq.direct'). It is an optional property. If kept empty the messages will be sent to a default AMQP exchange.
Header Separator The character that is used to split key-value for headers. The value must only one character. Otherwise you will get an error message
Headers Pattern Regular expression that will be evaluated against the FlowFile attributes to select the matching attributes and put as AMQP headers. Attribute name will be used as header key.
Headers Source The source of the headers which will be applied to the published message.
Host Name Network address of AMQP broker (e.g., localhost). If Brokers is set, then this property is ignored.
Password Password used for authentication and authorization.
Port Numeric value identifying Port of AMQP broker (e.g., 5671). If Brokers is set, then this property is ignored.
Routing Key The name of the Routing Key that will be used by AMQP to route messages from the exchange to a destination queue(s). Usually provided by the administrator (e.g., 'myKey')In the event when messages are sent to a default exchange this property corresponds to a destination queue name, otherwise a binding from the Exchange to a Queue via Routing Key must be set (usually by the AMQP administrator)
SSL Context Service The SSL Context Service used to provide client certificate information for TLS/SSL connections.
Username Username used for authentication and authorization.
Virtual Host Virtual Host name which segregates AMQP system for enhanced security.
Name Description
failure All FlowFiles that cannot be routed to the AMQP destination are routed to this relationship
success All FlowFiles that are sent to the AMQP destination are routed to this relationship
Property Description
Account Snowflake Account Identifier with Organization Name and Account Name formatted as [organization-name]-[account-name]
Authentication Strategy Strategy for authenticating Snowflake connections
Channel Group Group for managing distinct Snowpipe Streaming Channels with partitioning
Channel Insert Timeout Maximum duration to retry inserting records before failing with an upper bound of 5 minutes
Concurrency Group Controls access to the configured channel with serialized claims according to the configured value or expression
Database Snowflake Database destination for processed records
Destination Type Snowflake destination object for processed records with support for derived default pipes
Offset Token End Expression Expression Language definition to produce the highest offset token for a FlowFile as a monotonically increasing number
Offset Token Record Pointer JSON Pointer to offset token in each record required when the last committed offset token is between start and end boundaries
Offset Token Start Expression Expression Language definition to produce the lowest offset token for a FlowFile as a monotonically increasing number
Offset Tracking Resolution Resolution level for evaluating committed offset tokens against input FlowFiles and records. **Disabled**: opaque offset token handling without tracking across FlowFiles or records. **FlowFile**: track each FlowFile with monotonically increasing offset tokens. **Record**: track each record in each FlowFile with monotonically increasing offset tokens.
Offset Tracking Timeout Maximum duration to wait for channel status to confirm committed offset tokens before routing to failure
Pipe Snowflake Pipe destination for processed records
Private Key Service RSA Private Key Service for authenticating connections
Role Snowflake Role the user will assume when authenticating connections
Schema Snowflake Schema destination for processed records
Table Snowflake Table destination for processed records
Transfer Strategy Strategy for transferring records to Snowpipe Streaming. **Managed**: transfer records as either batches of rows or file fragments based on uncompressed size. **Rows**: transfer records as batches of rows over HTTP to Snowpipe Streaming. **File Fragments**: transfer records as file fragments over HTTP to cloud storage services.
User Snowflake User for authenticating connections
Web Client Service Provider Web Client Service Provider supporting HTTP request and response handling
Name Description
empty FlowFiles with empty content not sent to Snowflake
failure FlowFiles that failed to upload to Snowflake
invalid FlowFiles that Snowflake identified as containing one or more invalid rows resulting in partial transmission
success FlowFiles successfully uploaded to Snowflake
Property Description
GCP Credentials Provider Service The Controller Service used to obtain Google Cloud Platform credentials.
Input Batch Size Maximum number of FlowFiles processed for each Processor invocation
Maximum Message Size The maximum size of a Google PubSub message in bytes. Defaults to 1 MB (1048576 bytes)
Message Derivation Strategy The strategy used to publish the incoming FlowFile to the Google Cloud PubSub endpoint.
Record Reader The Record Reader to use for incoming FlowFiles
Record Writer The Record Writer to use in order to serialize the data before sending to GCPubSub endpoint
api-endpoint Override the gRPC endpoint in the form of [host:port]
gcp-batch-bytes Publish request gets triggered based on this Batch Bytes Threshold property and the Batch Size Threshold property, whichever condition is met first.
gcp-project-id Google Cloud Project ID
gcp-pubsub-publish-batch-delay Indicates the delay threshold to use for batching. After this amount of time has elapsed (counting from the first element added), the elements will be wrapped up in a batch and sent. This value should not be set too high, usually on the order of milliseconds. Otherwise, calls might appear to never complete.
gcp-pubsub-publish-batch-size Indicates the number of messages the cloud service should bundle together in a batch. If not set and left empty, only one message will be used in a batch
gcp-pubsub-topic Name of the Google Cloud PubSub Topic
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure FlowFiles are routed to this relationship if the Google Cloud Pub/Sub operation fails.
retry FlowFiles are routed to this relationship if the Google Cloud Pub/Sub operation fails but attempting the operation again may succeed.
success FlowFiles are routed to this relationship after a successful Google Cloud Pub/Sub operation.
Name Description
gcp.pubsub.messageId ID of the pubsub message published to the configured Google Cloud PubSub topic
gcp.pubsub.count.records Count of pubsub messages published to the configured Google Cloud PubSub topic
gcp.pubsub.topic Name of the Google Cloud PubSub topic the message was published to
Property Description
Connection Client ID The client id to be set on the connection, if set. For durable non shared consumer this is mandatory, for all others it is optional, typically with shared consumers it is undesirable to be set. Please see JMS spec for further details
Connection Factory Service The Controller Service that is used to obtain Connection Factory. Alternatively, the 'JNDI *' or the 'JMS *' properties can also be used to configure the Connection Factory.
Destination Name The name of the JMS Destination. Usually provided by the administrator (e.g., 'topic://myTopic' or 'myTopic').
Destination Type The type of the JMS Destination. Could be one of 'QUEUE' or 'TOPIC'. Usually provided by the administrator. Defaults to 'QUEUE'
Maximum Batch Size The maximum number of messages to publish or consume in each invocation of the processor.
Password Password used for authentication and authorization.
SSL Context Service The SSL Context Service used to provide client certificate information for TLS/SSL connections.
User Name User Name used for authentication and authorization.
allow-illegal-chars-in-jms-header-names Specifies whether illegal characters in header names should be sent to the JMS broker. Usually hyphens and full-stops.
attributes-to-send-as-jms-headers-regex Specifies the Regular Expression that determines the names of FlowFile attributes that should be sent as JMS Headers
broker URI pointing to the network location of the JMS Message broker. Example for ActiveMQ: '[tcp://myhost:61616](tcp://myhost:61616)'. Examples for IBM MQ: 'myhost(1414)' and 'myhost01(1414),myhost02(1414)'.
cf The fully qualified name of the JMS ConnectionFactory implementation class (eg. org.apache.activemq. ActiveMQConnectionFactory).
cflib Path to the directory with additional resources (eg. JARs, configuration files etc.) to be added to the classpath (defined as a comma separated list of values). Such resources typically represent target JMS client libraries for the ConnectionFactory implementation.
character-set The name of the character set to use to construct or interpret TextMessages
connection.factory.name The name of the JNDI Object to lookup for the Connection Factory.
java.naming.factory.initial The fully qualified class name of the JNDI Initial Context Factory Class (java.naming.factory.initial).
java.naming.provider.url The URL of the JNDI Provider to use as the value for java.naming.provider.url. See additional details documentation for allowed URL schemes.
java.naming.security.credentials The Credentials to use when authenticating with JNDI (java.naming.security.credentials).
java.naming.security.principal The Principal to use when authenticating with JNDI (java.naming.security.principal).
message-body-type The type of JMS message body to construct.
naming.factory.libraries Specifies jar files and/or directories to add to the ClassPath in order to load the JNDI / JMS client libraries. This should be a comma-separated list of files, directories, and/or URLs. If a directory is given, any files in that directory will be included, but subdirectories will not be included (i.e., it is not recursive).
record-reader The Record Reader to use for parsing the incoming FlowFile into Records.
record-writer The Record Writer to use for serializing Records before publishing them as an JMS Message.
Required Permission Explanation
reference remote resources Client Library Location can reference resources over HTTP
Name Description
failure All FlowFiles that cannot be sent to JMS destination are routed to this relationship
success All FlowFiles that are sent to the JMS destination are routed to this relationship
Property Description
Failure Strategy Specifies how the processor handles a FlowFile if it is unable to publish the data to Kafka
FlowFile Attribute Header Pattern A Regular Expression that is matched against all FlowFile attribute names. Any attribute whose name matches the pattern will be added to the Kafka messages as a Header. If not specified, no FlowFile attributes will be added as headers.
Header Encoding For any attribute that is added as a Kafka Record Header, this property indicates the Character Encoding to use for serializing the headers.
Kafka Connection Service Provides connections to Kafka Broker for publishing Kafka Records
Kafka Key The Key to use for the Message. If not specified, the FlowFile attribute 'kafka.key' is used as the message key, if it is present. Beware that setting Kafka key and demarcating at the same time may potentially lead to many Kafka messages with the same key. Normally this is not a problem as Kafka does not enforce or assume message and key uniqueness. Still, setting the demarcator and Kafka key at the same time poses a risk of data loss on Kafka. During a topic compaction on Kafka, messages will be deduplicated based on this key.
Kafka Key Attribute Encoding FlowFiles that are emitted have an attribute named 'kafka.key'. This property dictates how the value of the attribute should be encoded.
Message Demarcator Specifies the string (interpreted as UTF-8) to use for demarcating multiple messages within a single FlowFile. If not specified, the entire content of the FlowFile will be used as a single message. If specified, the contents of the FlowFile will be split on this delimiter and each section sent as a separate Kafka message. To enter special character such as 'new line' use CTRL+Enter or Shift+Enter, depending on your OS.
Message Key Field The name of a field in the Input Records that should be used as the Key for the Kafka message.
Publish Strategy The format used to publish the incoming FlowFile record to Kafka.
Record Key Writer The Record Key Writer to use for outgoing FlowFiles
Record Metadata Strategy Specifies whether the Record 's metadata (topic and partition) should come from the Record's metadata field or if it should come from the configured Topic Name and Partition / Partitioner class properties
Record Reader The Record Reader to use for incoming FlowFiles
Record Writer The Record Writer to use in order to serialize the data before sending to Kafka
Topic Name Name of the Kafka Topic to which the Processor publishes Kafka Records
Transactional ID Prefix Specifies the KafkaProducer config transactional.id will be a generated UUID and will be prefixed with the configured string.
Transactions Enabled Specifies whether to provide transactional guarantees when communicating with Kafka. If there is a problem sending data to Kafka, and this property is set to false, then the messages that have already been sent to Kafka will continue on and be delivered to consumers. If this is set to true, then the Kafka transaction will be rolled back so that those messages are not available to consumers. Setting this to true requires that the [Delivery Guarantee] property be set to [Guarantee Replicated Delivery.]
acks Specifies the requirement for guaranteeing that a message is sent to Kafka. Corresponds to Kafka Client acks property.
compression.type Specifies the compression strategy for records sent to Kafka. Corresponds to Kafka Client compression.type property.
max.request.size The maximum size of a request in bytes. Corresponds to Kafka Client max.request.size property.
partition Specifies the Kafka Partition destination for Records.
partitioner.class Specifies which class to use to compute a partition id for a message. Corresponds to Kafka Client partitioner.class property.
Name Description
failure Any FlowFile that cannot be sent to Kafka will be routed to this Relationship
success FlowFiles for which all content was sent to Kafka.
Name Description
msg.count The number of messages that were sent to Kafka for this FlowFile. This attribute is added only to FlowFiles that are routed to success.
Property Description
Broker URI The URI(s) to use to connect to the MQTT broker (e.g., [tcp://localhost:1883](tcp://localhost:1883)). The 'tcp', 'ssl', 'ws' and 'wss'schemes are supported. In order to use 'ssl', the SSL Context Service property must be set. When a comma-separated URI list is set (e.g., [tcp://localhost:1883,tcp://localhost:1884](tcp://localhost:1883,tcp://localhost:1884)), the processor will use a round-robin algorithm to connect to the brokers on connection failure.
Client ID MQTT client ID to use. If not set, a UUID will be generated.
Connection Timeout (seconds) Maximum time interval the client will wait for the network connection to the MQTT server to be established. The default timeout is 30 seconds. A value of 0 disables timeout processing meaning the client will wait until the network connection is made successfully or fails.
Keep Alive Interval (seconds) Defines the maximum time interval between messages sent or received. It enables the client to detect if the server is no longer available, without having to wait for the TCP/IP timeout. The client will ensure that at least one message travels across the network within each keep alive period. In the absence of a data-related message during the time period, the client sends a very small "ping" message, which the server will acknowledge. A value of 0 disables keepalive processing in the client.
Last Will Message The message to send as the client's Last Will.
Last Will QoS Level QoS level to be used when publishing the Last Will Message.
Last Will Retain Whether to retain the client's Last Will.
Last Will Topic The topic to send the client's Last Will to.
MQTT Specification Version The MQTT specification version when connecting with the broker. See the allowable value descriptions for more details.
Password Password to use when connecting to the broker
Quality of Service(QoS) The Quality of Service (QoS) to send the message with. Accepts three values '0', '1' and '2'; '0' for 'at most once', '1' for 'at least once', '2' for 'exactly once'. Expression language is allowed in order to support publishing messages with different QoS but the end value of the property must be either '0', '1' or '2'.
Retain Message Whether or not the retain flag should be set on the MQTT message.
SSL Context Service The SSL Context Service used to provide client certificate information for TLS/SSL connections.
Session Expiry Interval After this interval the broker will expire the client and clear the session state.
Session state Whether to start a fresh or resume previous flows. See the allowable value descriptions for more details.
Topic The topic to publish the message to.
Username Username to use when connecting to the broker
message-demarcator With this property, you have an option to publish multiple messages from a single FlowFile. This property allows you to provide a string (interpreted as UTF-8) to use for demarcating apart the FlowFile content. This is an optional property ; if not provided, and if not defining a Record Reader/Writer, each FlowFile will be published as a single message. To enter special character such as 'new line' use CTRL+Enter or Shift+Enter depending on the OS.
record-reader The Record Reader to use for parsing the incoming FlowFile into Records.
record-writer The Record Writer to use for serializing Records before publishing them as an MQTT Message.
Name Description
failure FlowFiles that failed to send to the destination are transferred to this relationship.
success FlowFiles that are sent successfully to the destination are transferred to this relationship.
Property Description
Access Token OAuth Access Token used for authenticating/authorizing the Slack request sent by NiFi. This may be either a User Token or a Bot Token. The token must be granted the chat:write scope. Additionally, in order to upload FlowFile contents as an attachment, it must be granted files:write.
Channel The name or identifier of the channel to send the message to. If using a channel name, it must be prefixed with the # character. For example, #general. This is valid only for public channels. Otherwise, the unique identifier of the channel to publish to must be provided.
Character Set Specifies the name of the Character Set used to encode the FlowFile contents.
Include FlowFile Content as Attachment Specifies whether or not the contents of the FlowFile should be uploaded as an attachment to the Slack message.
Max FlowFile Size The maximum size of a FlowFile that can be sent to Slack. If any FlowFile exceeds this size, it will be routed to failure. This plays an important role because the entire contents of the file must be loaded into NiFi's heap in order to send the data to Slack.
Message Text The text of the message to send to Slack.
Methods Endpoint Url Prefix Customization of the Slack Client. Set the methodsEndpointUrlPrefix. If you need to set a different URL prefix for Slack API Methods calls, you can set the one. Default value: [https://slack.com/api/](https://slack.com/api/)
Publish Strategy Specifies how the Processor will send the message or file to Slack.
Thread Timestamp The Timestamp identifier for the thread that this message is to be a part of. If not specified, the message will be a top-level message instead of being in a thread.
Name Description
failure FlowFiles are routed to 'failure' if unable to be sent to Slack for any other reason
rate limited FlowFiles are routed to 'rate limited' if the Rate Limit has been exceeded
success FlowFiles are routed to success after being successfully sent to Slack
Name Description
slack.channel.id The ID of the Slack Channel from which the messages were retrieved
slack.ts The timestamp of the slack messages that was sent; this is used by Slack as a unique identifier
Property Description
Account Snowflake Account Identifier with Organization Name and Account Name formatted as [organization-name]-[account-name]
Authentication Strategy Strategy for authenticating Snowflake connections
Channel Group Group for managing distinct Snowpipe Streaming Channels with partitioning
Channel Insert Timeout Maximum duration to retry inserting records before failing with an upper bound of 5 minutes
Database Snowflake Database destination for processed records
Destination Type Snowflake destination object for processed records with support for derived default pipes
File Fragment Count Maximum number of file fragments sent to object storage for Snowpipe Streaming ingestion from input FlowFiles. Must be between 1 and 100.
File Fragment Size Maximum size in bytes for each file fragment sent to object storage for Snowpipe Streaming ingestion. Must be between 1 KB and 256 MB
Offset Token End Expression Expression Language definition to produce the highest offset token for a FlowFile as a monotonically increasing number
Offset Token Record Pointer JSON Pointer to offset token in each record required when the last committed offset token is between start and end boundaries
Offset Token Start Expression Expression Language definition to produce the lowest offset token for a FlowFile as a monotonically increasing number
Offset Tracking Resolution Resolution level for evaluating committed offset tokens against input FlowFiles and records. **Disabled**: opaque offset token handling without tracking across FlowFiles or records. **FlowFile**: track each FlowFile with monotonically increasing offset tokens. **Record**: track each record in each FlowFile with monotonically increasing offset tokens.
Offset Tracking Timeout Maximum duration to wait for channel status to confirm committed offset tokens before routing to failure
Pipe Snowflake Pipe destination for processed records
Private Key Service RSA Private Key Service for authenticating connections
Role Snowflake Role the user will assume when authenticating connections
Schema Snowflake Schema destination for processed records
Table Snowflake Table destination for processed records
Transfer Strategy Strategy for transferring records to Snowpipe Streaming. **Managed**: transfer records as either batches of rows or file fragments based on uncompressed size. **Rows**: transfer records as batches of rows over HTTP to Snowpipe Streaming. **File Fragments**: transfer records as file fragments over HTTP to cloud storage services.
User Snowflake User for authenticating connections
Web Client Service Provider Web Client Service Provider supporting HTTP request and response handling
Name Description
empty FlowFiles with empty content not sent to Snowflake
failure FlowFiles that failed to upload to Snowflake
invalid FlowFiles that Snowflake identified as containing one or more invalid rows resulting in partial transmission
success FlowFiles successfully uploaded to Snowflake
Property Description
Blob Name The full name of the blob
Client-Side Encryption Key ID Specifies the ID of the key to use for client-side encryption.
Client-Side Encryption Key Type Specifies the key type to use for client-side encryption.
Client-Side Encryption Local Key When using local client-side encryption, this is the raw key, encoded in hexadecimal
Conflict Resolution Strategy Specifies whether an existing blob will have its contents replaced upon conflict.
Container Name Name of the Azure storage container. In case of PutAzureBlobStorage processor, container can be created if it does not exist.
Create Container Specifies whether to check if the container exists and to automatically create it if it does not. Permission to list containers is required. If false, this check is not made, but the Put operation will fail if the container does not exist.
File Resource Service File Resource Service providing access to the local resource to be transferred
Resource Transfer Source The source of the content to be transferred
Storage Credentials Controller Service used to obtain Azure Blob Storage Credentials.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
Name Description
failure Unsuccessful operations will be transferred to the failure relationship.
success All successfully processed FlowFiles are routed to this relationship
Name Description
azure.container The name of the Azure Blob Storage container
azure.blobname The name of the blob on Azure Blob Storage
azure.primaryUri Primary location of the blob
azure.etag ETag of the blob
azure.blobtype Type of the blob (either BlockBlob, PageBlob or AppendBlob)
mime.type MIME Type of the content
lang Language code for the content
azure.timestamp Timestamp of the blob
azure.length Length of the blob
azure.error.code Error code reported during blob operation
azure.ignored When Conflict Resolution Strategy is 'ignore', this property will be true/false depending on whether the blob was ignored.
Property Description
Cosmos DB Access Key Cosmos DB Access Key from Azure Portal (Settings->Keys). Choose a read-write key to enable database or container creation at run time
Cosmos DB Conflict Handling Strategy Choose whether to ignore or upsert when conflict error occurs during insertion
Cosmos DB Connection Service If configured, the controller service used to obtain the connection string and access key
Cosmos DB Consistency Level Choose from five consistency levels on the consistency spectrum. Refer to Cosmos DB documentation for their differences
Cosmos DB Container ID The unique identifier for the container
Cosmos DB Name The database name or id. This is used as the namespace for document collections or containers
Cosmos DB Partition Key The partition key used to distribute data among servers
Cosmos DB URI Cosmos DB URI, typically in the form of https://\{databaseaccount\}.documents.azure.com:443/ Note this host URL is for Cosmos DB with Core SQL API from Azure Portal (Overview->URI)
Insert Batch Size The number of records to group together for one single insert operation against Cosmos DB
Record Reader Specifies the Controller Service to use for parsing incoming data and determining the data's schema
Name Description
failure All FlowFiles that cannot be written to Cosmos DB are routed to this relationship
success All FlowFiles that are written to Cosmos DB are routed to this relationship
Property Description
Data Format The format of the data that is sent to Azure Data Explorer. Supported formats include: avro, csv, json
Database Name Azure Data Explorer Database Name for ingesting data
Ingest Mapping Name The name of the mapping responsible for storing the data in the appropriate columns.
Ingest Status Polling Interval Defines the value of interval of time to poll for ingestion status
Ingest Status Polling Timeout Defines the total amount time to poll for ingestion status
Ingestion Ignore First Record Defines whether ignore first record while ingestion.
Kusto Ingest Service Azure Data Explorer Kusto Ingest Service
Partially Succeeded Routing Strategy Defines where to route FlowFiles that resulted in a partially succeeded status.
Poll for Ingest Status Determines whether to poll on ingestion status after an ingestion to Azure Data Explorer is completed
Streaming Enabled Whether to stream data to Azure Data Explorer.
Table Name Azure Data Explorer Table Name for ingesting data
Name Description
failure Ingest processing failed
success Ingest processing succeeded
Property Description
ADLS Credentials Controller Service used to obtain Azure Credentials.
Base Temporary Path The Path where the temporary directory will be created. The Path name cannot contain a leading '/'. The root directory can be designated by the empty string value. Non-existing directories will be created. The Temporary File Directory name is _nifitempdirectory
Conflict Resolution Strategy Indicates what should happen when a file with the same name already exists in the output directory
Directory Name Name of the Azure Storage Directory. The Directory Name cannot contain a leading '/'. The root directory can be designated by the empty string value. In case of the PutAzureDataLakeStorage processor, the directory will be created if not already existing.
File Name The filename
File Resource Service File Resource Service providing access to the local resource to be transferred
Filesystem Name Name of the Azure Storage File System (also called Container). It is assumed to be already existing.
Resource Transfer Source The source of the content to be transferred
Writing Strategy Defines the approach for writing the Azure file.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
Name Description
failure Files that could not be written to Azure storage for some reason are transferred to this relationship
success Files that have been successfully written to Azure storage are transferred to this relationship
Name Description
azure.filesystem The name of the Azure File System
azure.directory The name of the Azure Directory
azure.filename The name of the Azure File
azure.primaryUri Primary location for file content
azure.length The length of the Azure File
Property Description
Event Hub Name Name of Azure Event Hubs destination
Event Hub Namespace Namespace of Azure Event Hubs prefixed to Service Bus Endpoint domain
Maximum Batch Size Maximum number of FlowFiles processed for each Processor invocation
Partitioning Key Attribute Name If specified, the value from argument named by this field will be used as a partitioning key to be used by event hub.
Service Bus Endpoint To support namespaces not in the default windows.net domain.
Shared Access Policy Key The key of the shared access policy. Either the primary or the secondary key can be used.
Shared Access Policy Name The name of the shared access policy. This policy must have Send claims.
Transport Type Advanced Message Queuing Protocol Transport Type for communication with Azure Event Hubs
Use Azure Managed Identity Choose whether or not to use the managed identity of Azure VM/VMSS
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure Any FlowFile that could not be sent to the event hub will be transferred to this Relationship.
success Any FlowFile that is successfully sent to the event hubs will be transferred to this Relationship.
Property Description
Credentials Service Controller Service used to obtain Azure Storage Credentials.
Endpoint Suffix Storage accounts in public Azure always use a common FQDN suffix. Override this endpoint suffix with a different suffix in certain circumstances (like Azure Stack or non-public Azure regions).
Message Time To Live Maximum time to allow the message to be in the queue
Queue Name Name of the Azure Storage Queue
Request Timeout The timeout for read or write requests to Azure Queue Storage. Defaults to 1 second.
Visibility Timeout The length of time during which the message will be invisible after it is read. If the processing unit fails to delete the message after it is read, then the message will reappear in the queue.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
Name Description
failure Unsuccessful operations will be transferred to the failure relationship.
success All successfully processed FlowFiles are routed to this relationship
Property Description
GCP Credentials Provider Service The Controller Service used to obtain Google Cloud Platform credentials.
bigquery-api-endpoint Can be used to override the default BigQuery endpoint. Default is bigquerystorage.googleapis.com:443. Format must be hostname:port.
bq.append.record.count The number of records to be appended to the write stream at once. Applicable for both batch and stream types
bq.dataset BigQuery dataset name (Note - The dataset must exist in GCP)
bq.record.reader Specifies the Controller Service to use for parsing incoming data.
bq.skip.invalid.rows Sets whether to insert all valid rows of a request, even if invalid rows exist. If not set the entire insert request will fail if it contains an invalid row.
bq.table.name BigQuery table name
bq.transfer.type Defines the preferred transfer type streaming or batching
gcp-project-id Google Cloud Project ID
gcp-retry-count How many retry attempts should be made before routing to the failure relationship.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure FlowFiles are routed to this relationship if the Google BigQuery operation fails.
success FlowFiles are routed to this relationship after a successful Google BigQuery operation.
Name Description
bq.records.count Number of records successfully inserted
Property Description
Box Client Service Controller Service used to obtain a Box API connection.
Chunked Upload Threshold The maximum size of the content which is uploaded at once. FlowFiles larger than this threshold are uploaded in chunks. Chunked upload is allowed for files larger than 20 MB. It is recommended to use chunked upload for files exceeding 50 MB.
Conflict Resolution Strategy Indicates what should happen when a file with the same name already exists in the specified Box folder.
Create Subfolder Specifies whether to check if the subfolder exists and to automatically create it if it does not. Permission to list folders is required.
Filename The name of the file to upload to the specified Box folder.
Folder ID The ID of the folder where the file is uploaded. Please see Additional Details to obtain Folder ID.
Subfolder Name The name (path) of the subfolder where files are uploaded. The subfolder name is relative to the folder specified by 'Folder ID'. Example: subFolder, subFolder1/subfolder2
Name Description
failure Files that could not be written to Box for some reason are transferred to this relationship.
success Files that have been successfully written to Box are transferred to this relationship.
Name Description
box.id The id of the file
filename The name of the file
path The folder path where the file is located
box.size The size of the file
box.timestamp The last modified time of the file
error.code The error code returned by Box
error.message The error message returned by Box
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Communications Timeout
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
Maximum The maximum value of the sample set. Must be a double
Metric Name The name of the metric
Minimum The minimum value of the sample set. Must be a double
Namespace The namespace for the metric data for CloudWatch
Region
SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections
Sample Count The number of samples used for the statistic set. Must be a double
Sum The sum of values for the sample set. Must be a double
Timestamp A point in time expressed as the number of milliseconds since Jan 1, 1970 00:00:00 UTC. If not specified, the default value is set to the time the metric data was received
Unit The unit of the metric. (e.g Seconds, Bytes, Megabytes, Percent, Count, Kilobytes/Second, Terabits/Second, Count/Second) For details see [http://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_MetricDatum.html](http://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_MetricDatum.html)
Value The value for the metric. Must be a double
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure FlowFiles are routed to failure relationship
success FlowFiles are routed to success relationship
Property Description
Column Name Translation Pattern Column name will be normalized with this regular expression
Column Name Translation Strategy The strategy used to normalize table column name. Column Name will be uppercased to do case-insensitive matching irrespective of strategy
Data Record Path If specified, this property denotes a RecordPath that will be evaluated against each incoming Record and the Record that results from evaluating the RecordPath will be sent to the database instead of sending the entire incoming Record. If not specified, the entire incoming Record will be published to the database.
Database Dialect Service Database Dialect Service for generating statements specific to a particular service or vendor.
Delete Keys A comma-separated list of column names that uniquely identifies a row in the database for DELETE statements. If the Statement Type is DELETE and this property is not set, the table's columns are used. This property is ignored if the Statement Type is not DELETE
Rollback On Failure Specify how to handle error. By default (false), if an error occurs while processing a FlowFile, the FlowFile will be routed to 'failure' or 'retry' relationship based on error type, and processor can continue with next FlowFile. Instead, you may want to rollback currently processed FlowFiles and stop further processing immediately. In that case, you can do so by enabling this 'Rollback On Failure' property. If enabled, failed FlowFiles will stay in the input relationship without penalizing it and being processed repeatedly until it gets processed successfully or removed by other means. It is important to set adequate 'Yield Duration' to avoid retrying too frequently.
Statement Type Record Path Specifies a RecordPath to evaluate against each Record in order to determine the Statement Type. The RecordPath should equate to either INSERT, UPDATE, UPSERT, or DELETE. (Debezium style operation types are also supported: "r" and "c" for INSERT, "u" for UPDATE, and "d" for DELETE)
database-session-autocommit The autocommit mode to set on the database connection being used. If set to false, the operation(s) will be explicitly committed or rolled back (based on success or failure respectively). If set to true, the driver/database automatically handles the commit/rollback.
db-type Database Type for generating statements specific to a particular service or vendor. The Generic Type supports most cases but selecting a specific type enables optimal processing or additional features.
put-db-record-allow-multiple-statements If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates whether to split the field value by a semicolon and execute each statement separately. If any statement causes an error, the entire set of statements will be rolled back. If the Statement Type is not 'SQL', this field is ignored.
put-db-record-binary-format The format to be applied when decoding string values to binary.
put-db-record-catalog-name The name of the database (or the name of the catalog, depending on the destination system) that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty. Note that if the property is set and the database is case-sensitive, the catalog name must match the database's catalog name exactly.
put-db-record-dcbp-service The Controller Service that is used to obtain a connection to the database for sending records.
put-db-record-field-containing-sql If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.
put-db-record-max-batch-size Specifies maximum number of sql statements to be included in each batch sent to the database. Zero means the batch size is not limited, and all statements are put into a single batch which can cause high memory usage issues for a very large number of statements.
put-db-record-query-timeout The maximum amount of time allowed for a running SQL statement , zero means there is no limit. Max time less than 1 second will be equal to zero.
put-db-record-quoted-identifiers Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.
put-db-record-quoted-table-identifiers Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.
put-db-record-record-reader Specifies the Controller Service to use for parsing incoming data and determining the data's schema.
put-db-record-schema-name The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty. Note that if the property is set and the database is case-sensitive, the schema name must match the database's schema name exactly.
put-db-record-statement-type Specifies the type of SQL Statement to generate. Please refer to the database documentation for a description of the behavior of each operation. Please note that some Database Types may not support certain Statement Types. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL'statement type. If 'SQL' is specified, the value of the field specified by the 'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.
put-db-record-table-name The name of the table that the statement should affect. Note that if the database is case-sensitive, the table name must match the database's table name exactly.
put-db-record-translate-field-names If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. If false, the field names must match the column names exactly, or the column will not be updated
put-db-record-unmatched-column-behavior If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation
put-db-record-unmatched-field-behavior If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation
put-db-record-update-keys A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. This property is ignored if the Statement Type is INSERT
table-schema-cache-size Specifies how many Table Schemas should be cached
Name Description
failure A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, such as an invalid query or an integrity constraint violation
retry A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed
success Successfully created FlowFile from SQL query result set.
Name Description
putdatabaserecord.error If an error occurs during processing, the flow file will be routed to failure or retry, and this attribute will be populated with the cause of the error.
Property Description
Databricks Client Databricks Client Service.
Default Catalog Default table catalog, some SQL statements such as 'COPY INTO' do not support using a default catalog
Default Schema Default table schema, some SQL statements such as 'COPY INTO' do not support using a default schema
Record Writer Specifies the Controller Service to use for writing results to a FlowFile. The Record Writer may use Inherit Schema to emulate the inferred schema behavior, i.e. an explicit schema need not be defined in the writer, and will be supplied by the same logic used to infer the schema from the column types.
SQL Warehouse ID Warehouse ID used to execute SQL
SQL Warehouse Name SQL Warehouse Name used to execute SQL, will search through all SQL Warehouses to find matching name.
Statement SQL statement to execute
Name Description
failure Databricks failure relationship
http.response HTTP Response to SQL API Request
original The original FlowFile is routed to this relationship when processing is successful.
records Serialized SQL Records
Name Description
statement.state The final state of the executed SQL statement
error.code The error code for the SQL statement if an error occurred.
error.message The error message for the SQL statement if an error occurred.
Property Description
DBFS File Path DBFS file path e.g. /directory/file.txt
Databricks Client Databricks Client Service.
Overwrite Policy What action to take if a file already exists at the destination path.
Name Description
failure Databricks failure relationship
success Databricks success relationship
Name Description
error.code The error code for the SQL statement if an error occurred.
error.message The error message for the SQL statement if an error occurred.
Property Description
Cache Entry Identifier A FlowFile attribute, or the results of an Attribute Expression Language statement, which will be evaluated against a FlowFile in order to determine the cache key
Cache update strategy Determines how the cache is updated if the cache already contains the entry
Distributed Cache Service The Controller Service that is used to cache flow files
Max cache entry size The maximum amount of data to put into cache
Name Description
failure Any FlowFile that cannot be inserted into the cache will be routed to this relationship
success Any FlowFile that is successfully inserted into cache will be routed to this relationship
Name Description
cached All FlowFiles will have an attribute 'cached'. The value of this attribute is true, is the FlowFile is cached, otherwise false.
Property Description
Chunked Upload Size Defines the size of a chunk. Used when a FlowFile 's size exceeds'Chunked Upload Threshold 'and content is uploaded in smaller chunks. It is recommended to specify chunked upload size smaller than'Chunked Upload Threshold' and as multiples of 4 MB. Maximum allowed value is 150 MB.
Chunked Upload Threshold The maximum size of the content which is uploaded at once. FlowFiles larger than this threshold are uploaded in chunks. Maximum allowed value is 150 MB.
Conflict Resolution Strategy Indicates what should happen when a file with the same name already exists in the specified Dropbox folder.
Dropbox Credential Service Controller Service used to obtain Dropbox credentials (App Key, App Secret, Access Token, Refresh Token). See controller service's Additional Details for more information.
Filename The full name of the file to upload.
Folder The path of the Dropbox folder to upload files to. The folder will be created if it does not exist yet.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure Files that could not be written to Dropbox for some reason are transferred to this relationship.
success Files that have been successfully written to Dropbox are transferred to this relationship.
Name Description
error.message The error message returned by Dropbox
dropbox.id The Dropbox identifier of the file
path The folder path where the file is located
filename The name of the file
dropbox.size The size of the file
dropbox.timestamp The server modified time of the file
dropbox.revision Revision of the file
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Batch items for each request (between 1 and 50) The items to be retrieved in one batch
Character set of document Character set of data in the document
Communications Timeout
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
Hash Key Name The hash key name of the item
Hash Key Value The hash key value of the item
Hash Key Value Type The hash key value type of the item
Json Document attribute The Json document to be retrieved from the dynamodb item ( 's' type in the schema)
Range Key Name The range key name of the item
Range Key Value
Range Key Value Type The range key value type of the item
Region
SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections
Table Name The DynamoDB table name
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure FlowFiles are routed to failure relationship
success FlowFiles are routed to success relationship
unprocessed FlowFiles are routed to unprocessed relationship when DynamoDB is not able to process all the items in the request. Typical reasons are insufficient table throughput capacity and exceeding the maximum bytes per request. Unprocessed FlowFiles can be retried with a new request.
Name Description
dynamodb.key.error.unprocessed DynamoDB unprocessed keys
dynmodb.range.key.value.error DynamoDB range key error
dynamodb.key.error.not.found DynamoDB key not found
dynamodb.error.exception.message DynamoDB exception message
dynamodb.error.code DynamoDB error code
dynamodb.error.message DynamoDB error message
dynamodb.error.service DynamoDB error service
dynamodb.error.retryable DynamoDB error is retryable
dynamodb.error.request.id DynamoDB error request id
dynamodb.error.status.code DynamoDB error status code
dynamodb.item.io.error IO exception message on creating item
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Communications Timeout
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
Partition Key Attribute Specifies the FlowFile attribute that will be used as the value of the partition key when using "Partition by attribute" partition key strategy.
Partition Key Field Defines the name of the partition key field in the DynamoDB table. Partition key is also known as hash key. Depending on the "Partition Key Strategy" the field value might come from the incoming Record or a generated one.
Partition Key Strategy Defines the strategy the processor uses to assign partition key value to the inserted Items.
Record Reader Specifies the Controller Service to use for parsing incoming data and determining the data's schema.
Region
SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections
Sort Key Field Defines the name of the sort key field in the DynamoDB table. Sort key is also known as range key.
Sort Key Strategy Defines the strategy the processor uses to assign sort key to the inserted Items.
Table Name The DynamoDB table name
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure FlowFiles are routed to failure relationship
success FlowFiles are routed to success relationship
unprocessed FlowFiles are routed to unprocessed relationship when DynamoDB is not able to process all the items in the request. Typical reasons are insufficient table throughput capacity and exceeding the maximum bytes per request. Unprocessed FlowFiles can be retried with a new request.
Name Description
dynamodb.chunks.processed Number of chunks successfully inserted into DynamoDB. If not set, it is considered as 0
dynamodb.key.error.unprocessed DynamoDB unprocessed keys
dynmodb.range.key.value.error DynamoDB range key error
dynamodb.key.error.not.found DynamoDB key not found
dynamodb.error.exception.message DynamoDB exception message
dynamodb.error.code DynamoDB error code
dynamodb.error.message DynamoDB error message
dynamodb.error.service DynamoDB error service
dynamodb.error.retryable DynamoDB error is retryable
dynamodb.error.request.id DynamoDB error request id
dynamodb.error.status.code DynamoDB error status code
dynamodb.item.io.error IO exception message on creating item
Property Description
Batch Size The preferred number of FlowFiles to send over in a single batch
Character Set Specifies the character set of the document data.
Client Service An Elasticsearch client service to use for running queries.
Dynamic Templates The dynamic_templates for the document. Must be parsable as a JSON Object. Requires Elasticsearch 7+
Identifier Attribute The name of the FlowFile attribute containing the identifier for the document. If the Index Operation is "index", this property may be left empty or evaluate to an empty value, in which case the document's identifier will be auto-generated by Elasticsearch. For all other Index Operations, the attribute must evaluate to a non-empty value.
Index The name of the index to use.
Index Operation The type of the operation used to index (create, delete, index, update, upsert)
Log Error Responses If this is enabled, errors will be logged to the NiFi logs at the error log level. Otherwise, they will only be logged if debug logging is enabled on NiFi as a whole. The purpose of this option is to give the user the ability to debug failed operations without having to turn on debug logging.
Max JSON Field String Length The maximum allowed length of a string value when parsing a JSON document or attribute.
Output Error Responses If this is enabled, response messages from Elasticsearch marked as "error" will be output to the "error_responses" relationship. This does not impact the output of flowfiles to the "successful" or "errors" relationships
Script The script for the document update/upsert. Only applies to Update/Upsert operations. Must be parsable as JSON Object. If left blank, the FlowFile content will be used for document update/upsert
Scripted Upsert Whether to add the scripted_upsert flag to the Upsert Operation. If true, forces Elasticsearch to execute the Script whether or not the document exists, defaults to false. If the Upsert Document provided (from FlowFile content) will be empty, but sure to set the Client Service controller service's Suppress Null and Empty Values to Never Suppress or no "upsert" doc will be, included in the request to Elasticsearch and the operation will not create a new document for the script to execute against, resulting in a "not_found" error
Treat Not Found as Success If true, "not_found" Elasticsearch Document associated Records will be routed to the "successful" relationship, otherwise to the "errors" relationship. If Output Error Responses is "true" then "not_found" responses from Elasticsearch will be sent to the error_responses relationship.
Type The type of this document (used by Elasticsearch for indexing and searching).
Name Description
errors Record(s)/Flowfile(s) corresponding to Elasticsearch document(s) that resulted in an "error" (within Elasticsearch) will be routed here.
failure All flowfiles that fail for reasons unrelated to server availability go to this relationship.
original All flowfiles that are sent to Elasticsearch without request failures go to this relationship.
retry All flowfiles that fail due to server/cluster availability go to this relationship.
successful Record(s)/Flowfile(s) corresponding to Elasticsearch document(s) that did not result in an "error" (within Elasticsearch) will be routed here.
Name Description
elasticsearch.put.error The error message if there is an issue parsing the FlowFile, sending the parsed document to Elasticsearch or parsing the Elasticsearch response
elasticsearch.bulk.error The _bulk response if there was an error during processing the document within Elasticsearch.
Property Description
Batch Size The number of records to send over in a single batch.
Client Service An Elasticsearch client service to use for running queries.
Date Format Specifies the format to use when writing Date fields. If not specified, the default format 'yyyy-MM-dd' is used. If specified, the value must match the Java Simple Date Format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/25/2017).
Dynamic Templates Record Path A RecordPath pointing to a field in the record(s) that contains the dynamic_templates for the document. Field must be Map-type compatible (e.g. a Map or Record) or a String parsable into a JSON Object. Requires Elasticsearch 7+
Group Results by Bulk Error Type The errored records written to the "errors" relationship will be grouped by error type and the error related to the first record within the FlowFile added to the FlowFile as "elasticsearch.bulk.error". If "Treat Not Found as Success" is "false" then records associated with "not_found" Elasticsearch document responses will also be send to the "errors" relationship.
ID Record Path A record path expression to retrieve the ID field for use with Elasticsearch. If left blank the ID will be automatically generated by Elasticsearch.
Index The name of the index to use.
Index Operation The type of the operation used to index (create, delete, index, update, upsert)
Index Operation Record Path A record path expression to retrieve the Index Operation field for use with Elasticsearch. If left blank the Index Operation will be determined using the main Index Operation property.
Index Record Path A record path expression to retrieve the index field for use with Elasticsearch. If left blank the index will be determined using the main index property.
Log Error Responses If this is enabled, errors will be logged to the NiFi logs at the error log level. Otherwise, they will only be logged if debug logging is enabled on NiFi as a whole. The purpose of this option is to give the user the ability to debug failed operations without having to turn on debug logging.
Max JSON Field String Length The maximum allowed length of a string value when parsing a JSON document or attribute.
Output Error Responses If this is enabled, response messages from Elasticsearch marked as "error" will be output to the "error_responses" relationship. This does not impact the output of flowfiles to the "successful" or "errors" relationships
Record Reader The record reader to use for reading incoming records from flowfiles.
Result Record Writer The response from Elasticsearch will be examined for failed records and the failed records will be written to a record set with this record writer service and sent to the "errors" relationship. Successful records will be written to a record set with this record writer service and sent to the "successful" relationship.
Retain ID (Record Path) Whether to retain the existing field used as the ID Record Path.
Retain Record Timestamp Whether to retain the existing field used as the @timestamp Record Path.
Script Record Path A RecordPath pointing to a field in the record(s) that contains the script for the document update/upsert. Only applies to Update/Upsert operations. Field must be Map-type compatible (e.g. a Map or a Record) or a String parsable into a JSON Object
Scripted Upsert Record Path A RecordPath pointing to a field in the record(s) that contains the scripted_upsert boolean flag. Whether to add the scripted_upsert flag to the Upsert Operation. Forces Elasticsearch to execute the Script whether or not the document exists, defaults to false. If the Upsert Document provided (from FlowFile content) will be empty, but sure to set the Client Service controller service's Suppress Null and Empty Values to Never Suppress or no "upsert" doc will be, included in the request to Elasticsearch and the operation will not create a new document for the script to execute against, resulting in a "not_found" error
Time Format Specifies the format to use when writing Time fields. If not specified, the default format 'HH:mm:ss' is used. If specified, the value must match the Java Simple Date Format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15).
Timestamp Format Specifies the format to use when writing Timestamp fields. If not specified, the default format 'yyyy-MM-dd HH:mm:ss' is used. If specified, the value must match the Java Simple Date Format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/25/2017 18:04:15).
Timestamp Record Path A RecordPath pointing to a field in the record(s) that contains the @timestamp for the document. If left blank the @timestamp will be determined using the main @timestamp property
Timestamp Value The value to use as the @timestamp field (required for Elasticsearch Data Streams)
Treat Not Found as Success If true, "not_found" Elasticsearch Document associated Records will be routed to the "successful" relationship, otherwise to the "errors" relationship. If Output Error Responses is "true" then "not_found" responses from Elasticsearch will be sent to the error_responses relationship.
Type The type of this document (used by Elasticsearch for indexing and searching).
Type Record Path A record path expression to retrieve the type field for use with Elasticsearch. If left blank the type will be determined using the main type property.
Name Description
errors Record(s)/Flowfile(s) corresponding to Elasticsearch document(s) that resulted in an "error" (within Elasticsearch) will be routed here.
failure All flowfiles that fail for reasons unrelated to server availability go to this relationship.
original All flowfiles that are sent to Elasticsearch without request failures go to this relationship.
retry All flowfiles that fail due to server/cluster availability go to this relationship.
successful Record(s)/Flowfile(s) corresponding to Elasticsearch document(s) that did not result in an "error" (within Elasticsearch) will be routed here.
Name Description
elasticsearch.put.error The error message if there is an issue parsing the FlowFile records, sending the parsed documents to Elasticsearch or parsing the Elasticsearch response.
elasticsearch.put.error.count The number of records that generated errors in the Elasticsearch _bulk API.
elasticsearch.put.success.count The number of records that were successfully processed by the Elasticsearch _bulk API.
elasticsearch.bulk.error The _bulk response if there was an error during processing the record within Elasticsearch.
Property Description
Attach File Specifies whether or not the FlowFile content should be attached to the email
BCC The recipients to include in the BCC-Line of the email. Comma separated sequence of addresses following RFC822 syntax.
CC The recipients to include in the CC-Line of the email. Comma separated sequence of addresses following RFC822 syntax.
Content Type Mime Type used to interpret the contents of the email, such as text/plain or text/html
From Specifies the Email address to use as the sender. Comma separated sequence of addresses following RFC822 syntax.
Include All Attributes In Message Specifies whether or not all FlowFile attributes should be recorded in the body of the email message
Message The body of the email message
Reply-To The recipients that will receive the reply instead of the from (see RFC2822 §3.6.2).This feature is useful, for example, when the email is sent by a no-reply account. This field is optional. Comma separated sequence of addresses following RFC822 syntax.
SMTP Auth Flag indicating whether authentication should be used
SMTP Hostname The hostname of the SMTP host
SMTP Password Password for the SMTP account
SMTP Port The Port used for SMTP communications
SMTP Socket Factory Socket Factory to use for SMTP Connection
SMTP TLS Flag indicating whether Opportunistic TLS should be enabled using STARTTLS command
SMTP Username Username for the SMTP account
SMTP X-Mailer Header X-Mailer used in the header of the outgoing email
Subject The email subject
To The recipients to include in the To-Line of the email. Comma separated sequence of addresses following RFC822 syntax.
attribute-name-regex A Regular Expression that is matched against all FlowFile attribute names. Any attribute whose name matches the regex will be added to the Email messages as a Header. If not specified, no FlowFile attributes will be added as headers.
authorization-mode How to authorize sending email on the user's behalf.
email-ff-content-as-message Specifies whether or not the FlowFile content should be the message of the email. If true, the 'Message' property is ignored.
input-character-set Specifies the character set of the FlowFile contents for reading input FlowFile contents to generate the message body or as an attachment to the message. If not set, UTF-8 will be the default value.
oauth2-access-token-provider OAuth2 service that can provide access tokens.
Name Description
failure FlowFiles that fail to send will be routed to this relationship
success FlowFiles that are successfully sent will be routed to this relationship
Property Description
Conflict Resolution Strategy Indicates what should happen when a file with the same name already exists in the output directory
Create Missing Directories If true, then missing destination directories will be created. If false, flowfiles are penalized and sent to failure.
Directory The directory to which files should be written. You may use expression language such as /aa/bb/$\{path\}
Group Sets the group on the output file to the value of this attribute. You may also use expression language such as $\{file.group\}.
Last Modified Time Sets the lastModifiedTime on the output file to the value of this attribute. Format must be yyyy-MM-dd 'T'HH:mm:ssZ. You may also use expression language such as $\{file.lastModifiedTime\}.
Maximum File Count Specifies the maximum number of files that can exist in the output directory
Owner Sets the owner on the output file to the value of this attribute. You may also use expression language such as $\{file.owner\}. Note on many operating systems Nifi must be running as a super-user to have the permissions to set the file owner.
Permissions Sets the permissions on the output file to the value of this attribute. Format must be either UNIX rwxrwxrwx with a - in place of denied permissions (e.g. rw-r–r–) or an octal number (e.g. 644). You may also use expression language such as $\{file.permissions\}.
Required Permission Explanation
write filesystem Provides operator the ability to write to any file that NiFi has access to.
Name Description
failure Files that could not be written to the output directory for some reason are transferred to this relationship
success Files that have been successfully written to the output directory are transferred to this relationship
Property Description
Batch Size The maximum number of FlowFiles to send in a single connection
Conflict Resolution Determines how to handle the problem of filename collisions
Connection Mode The FTP Connection Mode
Connection Timeout Amount of time to wait before timing out while creating a connection
Create Directory Specifies whether or not the remote directory should be created if it does not exist.
Data Timeout When transferring a file between the local and remote system, this value specifies how long is allowed to elapse without any data being transferred between systems
Dot Rename If true, then the filename of the sent file is prepended with a "." and then renamed back to the original once the file is completely sent. Otherwise, there is no rename. This property is ignored if the Temporary Filename property is set.
Hostname The fully qualified hostname or IP address of the remote system
Internal Buffer Size Set the internal buffer size for buffered data streams
Last Modified Time The lastModifiedTime to assign to the file after transferring it. If not set, the lastModifiedTime will not be changed. Format must be yyyy-MM-dd 'T'HH:mm:ssZ. You may also use expression language such as $\{file.lastModifiedTime\}. If the value is invalid, the processor will not be invalid but will fail to change lastModifiedTime of the file.
Password Password for the user account
Permissions The permissions to assign to the file after transferring it. Format must be either UNIX rwxrwxrwx with a - in place of denied permissions (e.g. rw-r–r–) or an octal number (e.g. 644). If not set, the permissions will not be changed. You may also use expression language such as $\{file.permissions\}. If the value is invalid, the processor will not be invalid but will fail to change permissions of the file.
Port The port that the remote system is listening on for file transfers
Reject Zero-Byte Files Determines whether or not Zero-byte files should be rejected without attempting to transfer
Remote Path The path on the remote system from which to pull or push files
Temporary Filename If set, the filename of the sent file will be equal to the value specified during the transfer and after successful completion will be renamed to the original filename. If this value is set, the Dot Rename property is ignored.
Transfer Mode The FTP Transfer Mode
Use Compression Indicates whether or not ZLIB compression should be used when transferring files
Username Username
ftp-use-utf8 Tells the client to use UTF-8 encoding when processing files and filenames. If set to true, the server must also support UTF-8 encoding.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure FlowFiles that failed to send to the remote system; failure is usually looped back to this processor
reject FlowFiles that were rejected by the destination system
success FlowFiles that are successfully sent will be routed to success
Property Description
File Resource Service File Resource Service providing access to the local resource to be transferred
GCP Credentials Provider Service The Controller Service used to obtain Google Cloud Platform credentials.
Resource Transfer Source The source of the content to be transferred
gcp-project-id Google Cloud Project ID
gcp-retry-count How many retry attempts should be made before routing to the failure relationship.
gcs-bucket Bucket of the object.
gcs-content-disposition-type Type of RFC-6266 Content Disposition to be attached to the object
gcs-content-type Content Type for the file, i.e. text/plain
gcs-key Name of the object.
gcs-object-acl Access Control to be attached to the object uploaded. Not providing this will revert to bucket defaults.
gcs-object-crc32c CRC32C Checksum (encoded in Base64, big-Endian order) of the file for server-side validation.
gcs-overwrite-object If false, the upload to GCS will succeed only if the object does not exist.
gcs-server-side-encryption-key An AES256 Encryption Key (encoded in base64) for server-side encryption of the object.
gzip.content.enabled Signals to the GCS Blob Writer whether GZIP compression during transfer is desired. False means do not gzip and can boost performance in many cases.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
storage-api-url Overrides the default storage URL. Configuring an alternative Storage API URL also overrides the HTTP Host header on requests as described in the Google documentation for Private Service Connections.
Name Description
failure FlowFiles are routed to this relationship if the Google Cloud Storage operation fails.
success FlowFiles are routed to this relationship after a successful Google Cloud Storage operation.
Name Description
gcs.bucket Bucket of the object.
gcs.key Name of the object.
gcs.size Size of the object.
gcs.cache.control Data cache control of the object.
gcs.component.count The number of components which make up the object.
gcs.content.disposition The data content disposition of the object.
gcs.content.encoding The content encoding of the object.
gcs.content.language The content language of the object.
mime.type The MIME/Content-Type of the object
gcs.crc32c The CRC32C checksum of object's data, encoded in base64 in big-endian order.
gcs.create.time The creation time of the object (milliseconds)
gcs.update.time The last modification time of the object (milliseconds)
gcs.encryption.algorithm The algorithm used to encrypt the object.
gcs.encryption.sha256 The SHA256 hash of the key used to encrypt the object
gcs.etag The HTTP 1.1 Entity tag for the object.
gcs.generated.id The service-generated for the object
gcs.generation The data generation of the object.
gcs.md5 The MD5 hash of the object's data encoded in base64.
gcs.media.link The media download link to the object.
gcs.metageneration The metageneration of the object.
gcs.owner The owner (uploader) of the object.
gcs.owner.type The ACL entity type of the uploader of the object.
gcs.uri The URI of the object as a string.
Property Description
chunked-upload-size Defines the size of a chunk. Used when a FlowFile 's size exceeds'Chunked Upload Threshold' and content is uploaded in smaller chunks. Minimum allowed chunk size is 256 KB, maximum allowed chunk size is 1 GB.
chunked-upload-threshold The maximum size of the content which is uploaded at once. FlowFiles larger than this threshold are uploaded in chunks.
conflict-resolution-strategy Indicates what should happen when a file with the same name already exists in the specified Google Drive folder.
connect-timeout Maximum wait time for connection to Google Drive service.
file-name The name of the file to upload to the specified Google Drive folder.
folder-id The ID of the shared folder. Please see Additional Details to set up access to Google Drive and obtain Folder ID.
gcp-credentials-provider-service The Controller Service used to obtain Google Cloud Platform credentials.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
read-timeout Maximum wait time for response from Google Drive service.
Name Description
failure Files that could not be written to Google Drive for some reason are transferred to this relationship.
success Files that have been successfully written to Google Drive are transferred to this relationship.
Name Description
drive.id The id of the file
filename The name of the file
mime.type The MIME type of the file
drive.size The size of the file. Set to 0 when the file size is not available (e.g. externally stored files).
drive.size.available Indicates if the file size is known / available
drive.timestamp The last modified time or created time (whichever is greater) of the file. The reason for this is that the original modified date of a file is preserved when uploaded to Google Drive. 'Created time' takes the time when the upload occurs. However uploaded files can still be modified later.
drive.created.time The file's creation time
drive.modified.time The file's last modification time
error.code The error code returned by Google Drive
error.message The error message returned by Google Drive
Property Description
gridfs-bucket-name The GridFS bucket where the files will be stored. If left blank, it will use the default value 'fs' that the MongoDB client driver uses.
gridfs-client-service The MongoDB client service to use for database connections.
gridfs-database-name The name of the database to use
gridfs-file-name The name of the file in the bucket that is the target of this processor. GridFS file names do not include path information because GridFS does not sort files into folders within a bucket.
putgridfs-chunk-size Controls the maximum size of each chunk of a file uploaded into GridFS.
putgridfs-enforce-uniqueness When enabled, this option will ensure that uniqueness is enforced on the bucket. It will do so by creating a MongoDB index that matches your selection. It should ideally be configured once when the bucket is created for the first time because it could take a long time to build on an existing bucket wit a lot of data.
putgridfs-hash-attribute If uniquness enforcement is enabled and the file hash is part of the constraint, this must be set to an attribute that exists on all incoming flowfiles.
putgridfs-properties-prefix Attributes that have this prefix will be added to the file stored in GridFS as metadata.
Name Description
duplicate Flowfiles that fail the duplicate check are sent to this relationship.
failure When there is a failure processing the flowfile, it goes to this relationship.
success When the operation succeeds, the flowfile is sent to this relationship.
Property Description
Associated Object ID Property Target HubSpot property used to uniquely identify the object to associate to from the configured object.
Associated Object ID Value Target HubSpot property value for the 'Associated Object ID Property' to associate to from the configured object.
Associated Object Type Target HubSpot object type to associate to from the configured object.
Association Type ID The HubSpot defined association id from the 'Object ID Value' to the 'Associated Object ID Value'.
HubSpot Service HubSpot Client Service.
Inverse Association Type ID The HubSpot defined association id from the 'Associated Object ID Value' to the 'Object ID Value'.
Missing HubSpot Property Policy What to action to take if HubSpot does not have a matching property.
Object ID Property HubSpot property used to uniquely identify the object.
Object ID Value Matching HubSpot property value to search for.
Object Override Properties Comma-delimited list of NiFi attributes, which if exist, will be added as object properties. Any existing properties in HubSpot will be overridden.
Object Set Properties Comma-delimited list of NiFi attributes, which if exist, will be added as object properties if the current object property in HubSpot is empty.
Object Type HubSpot object type
Name Description
failure HubSpot fail relationship
retry HubSpot retry relationship. FlowFiles that failed to process due to a server timeout or rate limit related error. FlowFiles routed here should be routed back into the processor.
success HubSpot success relationship
Property Description
Iceberg Catalog Provider Service for Iceberg Catalog
Iceberg Writer Provider Service for Iceberg Row Writers responsible for producing formatted Iceberg Data Files
Namespace Iceberg Namespace containing Tables
Record Reader Record Reader for incoming FlowFiles
Table Name Iceberg Table Name
Name Description
failure FlowFiles not transferred to Iceberg
success FlowFiles transferred to Iceberg
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Amazon Kinesis Firehose Delivery Stream Name The name of kinesis firehose delivery stream
Batch Size Batch size for messages (1-500).
Communications Timeout
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
Max message buffer size Max message buffer
Region
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure FlowFiles are routed to failure relationship
success FlowFiles are routed to success relationship
Name Description
aws.kinesis.firehose.error.message Error message on posting message to AWS Kinesis Firehose
aws.kinesis.firehose.error.code Error code for the message when posting to AWS Kinesis Firehose
aws.kinesis.firehose.record.id Record id of the message posted to Kinesis Firehose
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Communications Timeout
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
Max Message Buffer Size Max message buffer size defined with standard data size units
Message Batch Size Batch size for messages (1-500).
Region
Stream Name The name of Kinesis Stream
Stream Partition Key The partition key attribute. If it is not set, a random value is used
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure FlowFiles are routed to failure relationship
success FlowFiles are routed to success relationship
Name Description
aws.kinesis.error.message Error message on posting message to AWS Kinesis
aws.kinesis.error.code Error code for the message when posting to AWS Kinesis
aws.kinesis.sequence.number Sequence number for the message when posting to AWS Kinesis
aws.kinesis.shard.id Shard id of the message posted to AWS Kinesis
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Amazon Lambda Name The Lambda Function Name
Amazon Lambda Qualifier (version) The Lambda Function Version
Communications Timeout
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
Region
SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure FlowFiles are routed to failure relationship
success FlowFiles are routed to success relationship
Name Description
aws.lambda.result.function.error Function error message in result on posting message to AWS Lambda
aws.lambda.result.status.code Status code in the result for the message when posting to AWS Lambda
aws.lambda.result.payload Payload in the result from AWS Lambda
aws.lambda.result.log Log in the result of the message posted to Lambda
aws.lambda.exception.message Exception message on invoking from AWS Lambda
aws.lambda.exception.cause Exception cause on invoking from AWS Lambda
aws.lambda.exception.error.code Exception error code on invoking from AWS Lambda
aws.lambda.exception.request.id Exception request id on invoking from AWS Lambda
aws.lambda.exception.status.code Exception status code on invoking from AWS Lambda
Property Description
Character Set The Character Set in which the data is encoded
Mode Indicates whether the processor should insert or update content
Mongo Collection Name The name of the collection to use
Mongo Database Name The name of the database to use
Update Method MongoDB method for running collection update operations, such as updateOne or updateMany
Update Query Key One or more comma-separated document key names used to build the update query criteria, such as _id
Upsert When true, inserts a document if no document matches the update query criteria; this property is valid only when using update mode, otherwise it is ignored
mongo-client-service If configured, this property will use the assigned client service for connection pooling.
put-mongo-update-mode Choose an update mode. You can either supply a JSON document to use as a direct replacement or specify a document that contains update operators like $set, $unset, and $inc. When Operators mode is enabled, the flowfile content is expected to be the operator part for example: \{$set:\{"key": "value"\},$inc:\{"count":1234\}\} and the update query will come from the configured Update Query property.
putmongo-update-query Specify a full MongoDB query to be used for the lookup query to do an update/upsert. NOTE: this field is ignored if the 'Update Query Key' value is not empty.
Name Description
failure All FlowFiles that cannot be written to MongoDB are routed to this relationship
success All FlowFiles that are written to MongoDB are routed to this relationship
Name Description
mongo.put.update.match.count The match count from result if update/upsert is performed, otherwise not set.
mongo.put.update.modify.count The modify count from result if update/upsert is performed, otherwise not set.
mongo.put.upsert.id The '_id' hex value if upsert is performed, otherwise not set.
Property Description
Character Set The Character Set in which the data is encoded
Mongo Collection Name The name of the collection to use
Mongo Database Name The name of the database to use
Ordered Ordered execution of bulk-writes and break on error - otherwise arbitrary order and continue on error
mongo-client-service If configured, this property will use the assigned client service for connection pooling.
Name Description
failure All FlowFiles that cannot be written to MongoDB are routed to this relationship
success All FlowFiles that are written to MongoDB are routed to this relationship
Property Description
Mongo Collection Name The name of the collection to use
Mongo Database Name The name of the database to use
bypass-validation Enable or disable bypassing document schema validation during insert or update operations. Bypassing document validation is a Privilege Action in MongoDB. Enabling this property can result in authorization errors for users with limited privileges.
insert_count The number of records to group together for one single insert/upsert operation against MongoDB.
mongo-client-service If configured, this property will use the assigned client service for connection pooling.
ordered Perform ordered or unordered operations
record-reader Specifies the Controller Service to use for parsing incoming data and determining the data's schema
update-key-fields Comma separated list of fields based on which to identify documents that need to be updated. If this property is set NiFi will attempt an upsert operation on all documents. If this property is not set all documents will be inserted.
update-mode Choose between updating a single document or multiple documents per incoming record.
Name Description
failure All FlowFiles that cannot be written to MongoDB are routed to this relationship
success All FlowFiles that are written to MongoDB are routed to this relationship
Property Description
put-record-include-zero-record-results If no records are read from the incoming FlowFile, this property specifies whether or not an empty record set will be transmitted. The original FlowFile will still be routed to success, but if no transmission occurs, no provenance SEND event will be generated.
put-record-reader Specifies the Controller Service to use for reading incoming data
put-record-sink Specifies the Controller Service to use for writing out the query result records to some destination.
Name Description
failure A FlowFile is routed to this relationship if the records could not be transmitted and retrying the operation will also fail
retry The original FlowFile is routed to this relationship if the records could not be transmitted but attempting the operation again may succeed
success The original FlowFile will be routed to this relationship if the records were transmitted successfully
Property Description
charset Specifies the character set to use when storing record field values as strings. All fields will be converted to strings using this character set before being stored in Redis.
data-record-path This property denotes a RecordPath that will be evaluated against each incoming Record and the Record that results from evaluating the RecordPath will be sent to Redis instead of sending the entire incoming Record. The property defaults to the root '/' which corresponds to a 'flat' record (all fields/values at the top level of the Record.
hash-value-record-path Specifies a RecordPath to evaluate against each Record in order to determine the hash value associated with all the record fields/values (see 'hset' in Redis documentation for more details). The RecordPath must point at exactly one field or an error will occur.
record-reader Specifies the Controller Service to use for parsing incoming data and determining the data's schema
redis-connection-pool
Name Description
failure FlowFiles containing Records with processing errors will be routed to this relationship
success FlowFiles having all Records stored in Redis will be routed to this relationship
Name Description
redis.success.record.count Number of records written to Redis
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Bucket The S3 Bucket to interact with
Cache Control Sets the Cache-Control HTTP header indicating the caching directives of the associated object. Multiple directives are comma-separated.
Canned ACL Amazon Canned ACL for an object, one of: BucketOwnerFullControl, BucketOwnerRead, LogDeliveryWrite, AuthenticatedRead, PublicReadWrite, PublicRead, Private; will be ignored if any other ACL/permission/owner property is specified
Communications Timeout The amount of time to wait in order to establish a connection to AWS or receive data from AWS before timing out.
Content Disposition Sets the Content-Disposition HTTP header indicating if the content is intended to be displayed inline or should be downloaded. Possible values are 'inline' or 'attachment'. If this property is not specified, object 's content-disposition will be set to filename. When' attachment 'is selected,'; filename='plus object key are automatically appended to form final value' attachment; filename="filename.jpg"'.
Content Type Sets the Content-Type HTTP header indicating the type of content stored in the associated object. The value of this header is a standard MIME type. AWS S3 Java client will attempt to determine the correct content type if one hasn't been set yet. Users are responsible for ensuring a suitable content type is set when uploading streams. If no content type is provided and cannot be determined by the filename, the default content type "application/octet-stream" will be used.
Custom Signer Class Name Fully qualified class name of the custom signer class. The signer must implement com.amazonaws.auth. Signer interface.
Custom Signer Module Location Comma-separated list of paths to files and/or directories which contain the custom signer's JAR file and its dependencies (if any).
Encryption Service Specifies the Encryption Service Controller used to configure requests. PutS3Object: For backward compatibility, this value is ignored when 'Server Side Encryption' is set. FetchS3Object: Only needs to be configured in case of Server-side Customer Key, Client-side KMS and Client-side Customer Key encryptions.
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
Expiration Time Rule
File Resource Service File Resource Service providing access to the local resource to be transferred
FullControl User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Full Control for an object
Multipart Part Size Specifies the part size for use when the PutS3Multipart Upload API is used. Flow files will be broken into chunks of this size for the upload process, but the last part sent can be smaller since it is not padded. The valid range is 50MB to 5GB.
Multipart Threshold Specifies the file size threshold for switch from the PutS3Object API to the PutS3MultipartUpload API. Flow files bigger than this limit will be sent using the stateful multipart process. The valid range is 50MB to 5GB.
Multipart Upload AgeOff Interval Specifies the interval at which existing multipart uploads in AWS S3 will be evaluated for ageoff. When processor is triggered it will initiate the ageoff evaluation if this interval has been exceeded.
Multipart Upload Max Age Threshold Specifies the maximum age for existing multipart uploads in AWS S3. When the ageoff process occurs, any upload older than this threshold will be aborted.
Object Key The S3 Object Key to use. This is analogous to a filename for traditional file systems.
Object Tags Prefix Specifies the prefix which would be scanned against the incoming FlowFile 's attributes and the matching attribute's name and value would be considered as the outgoing S3 object 's Tag name and Tag value respectively. For Ex: If the incoming FlowFile carries the attributes tagS3country, tagS3PII, the tag prefix to be specified would be' tagS3'
Owner The Amazon ID to use for the object's owner
Read ACL User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have permissions to read the Access Control List for an object
Read Permission User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Read Access for an object
Region The AWS Region to connect to.
Remove Tag Prefix If set to 'True', the value provided for 'Object Tags Prefix' will be removed from the attribute(s) and then considered as the Tag name. For ex: If the incoming FlowFile carries the attributes tagS3country, tagS3PII and the prefix is set to 'tagS3' then the corresponding tag values would be 'country' and 'PII'
Resource Transfer Source The source of the content to be transferred
SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections
Server Side Encryption Specifies the algorithm used for server side encryption.
Signer Override The AWS S3 library uses Signature Version 4 by default but this property allows you to specify the Version 2 signer to support older S3-compatible services or even to plug in your own custom signer implementation.
Storage Class
Temporary Directory Multipart State Directory in which, for multipart uploads, the processor will locally save the state tracking the upload ID and parts uploaded which must both be provided to complete the upload.
Use Chunked Encoding Enables / disables chunked encoding for upload requests. Set it to false only if your endpoint does not support chunked uploading.
Use Path Style Access Path-style access can be enforced by setting this property to true. Set it to true if your endpoint does not support virtual-hosted-style requests, only path-style requests.
Write ACL User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have permissions to change the Access Control List for an object
Write Permission User List A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Write Access for an object
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure If the Processor is unable to process a given FlowFile, it will be routed to this Relationship.
success FlowFiles are routed to this Relationship after they have been successfully processed.
Name Description
s3.url The URL that can be used to access the S3 object
s3.bucket The S3 bucket where the Object was put in S3
s3.key The S3 key within where the Object was put in S3
s3.contenttype The S3 content type of the S3 Object that put in S3
s3.version The version of the S3 Object that was put to S3
s3.exception The class name of the exception thrown during processor execution
s3.additionalDetails The S3 supplied detail from the failed operation
s3.statusCode The HTTP error code (if available) from the failed operation
s3.errorCode The S3 moniker of the failed operation
s3.errorMessage The S3 exception message from the failed operation
s3.etag The ETag of the S3 Object
s3.contentdisposition The content disposition of the S3 Object that put in S3
s3.cachecontrol The cache-control header of the S3 Object
s3.uploadId The uploadId used to upload the Object to S3
s3.expiration A human-readable form of the expiration date of the S3 object, if one is set
s3.sseAlgorithm The server side encryption algorithm of the object
s3.usermetadata A human-readable form of the User Metadata of the S3 object, if any was set
s3.encryptionStrategy The name of the encryption strategy, if any was set
Property Description
oauth2-access-token-provider Service providing OAuth2 Access Tokens for authenticating using the HTTP Authorization Header
read-timeout Maximum time allowed for reading a response from the Salesforce REST API
record-reader Specifies the Controller Service to use for parsing incoming data and determining the data's schema
salesforce-api-version The version number of the Salesforce REST API appended to the URL after the services/data path. See Salesforce documentation for supported versions
salesforce-url The URL of the Salesforce instance including the domain without additional path information, such as [https://MyDomainName.my.salesforce.com](https://MyDomainName.my.salesforce.com)
Name Description
failure For FlowFiles created as a result of an execution error.
success For FlowFiles created as a result of a successful execution.
Name Description
error.message The error message returned by Salesforce.
Property Description
Algorithm Negotiation Configuration strategy for SSH algorithm negotiation
Batch Size The maximum number of FlowFiles to send in a single connection
Ciphers Allowed A comma-separated list of Ciphers allowed for SFTP connections. Leave unset to allow all. Available options are: 3des-cbc, aes128-cbc, aes128-ctr, [aes128-gcm@openssh.com](mailto:aes128-gcm@openssh.com), aes192-cbc, aes192-ctr, aes256-cbc, aes256-ctr, [aes256-gcm@openssh.com](mailto:aes256-gcm@openssh.com), arcfour128, arcfour256, blowfish-cbc, [chacha20-poly1305@openssh.com](mailto:chacha20-poly1305@openssh.com), none
Conflict Resolution Determines how to handle the problem of filename collisions
Connection Timeout Amount of time to wait before timing out while creating a connection
Create Directory Specifies whether or not the remote directory should be created if it does not exist.
Data Timeout When transferring a file between the local and remote system, this value specifies how long is allowed to elapse without any data being transferred between systems
Disable Directory Listing If set to 'true', directory listing is not performed prior to create missing directories. By default, this processor executes a directory listing command to see target directory existence before creating missing directories. However, there are situations that you might need to disable the directory listing such as the following. Directory listing might fail with some permission setups (e.g. chmod 100) on a directory. Also, if any other SFTP client created the directory after this processor performed a listing and before a directory creation request by this processor is finished, then an error is returned because the directory already exists.
Dot Rename If true, then the filename of the sent file is prepended with a "." and then renamed back to the original once the file is completely sent. Otherwise, there is no rename. This property is ignored if the Temporary Filename property is set.
Host Key File If supplied, the given file will be used as the Host Key; otherwise, if 'Strict Host Key Checking' property is applied (set to true) then uses the 'known_hosts' and 'known_hosts2' files from ~/.ssh directory else no host key file will be used
Hostname The fully qualified hostname or IP address of the remote system
Key Algorithms Allowed A comma-separated list of Key Algorithms allowed for SFTP connections. Leave unset to allow all. Available options are: ecdsa-sha2-nistp256, [ecdsa-sha2-nistp256-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp256-cert-v01@openssh.com), ecdsa-sha2-nistp384, [ecdsa-sha2-nistp384-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp384-cert-v01@openssh.com), ecdsa-sha2-nistp521, [ecdsa-sha2-nistp521-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp521-cert-v01@openssh.com), rsa-sha2-256, [rsa-sha2-256-cert-v01@openssh.com](mailto:rsa-sha2-256-cert-v01@openssh.com), rsa-sha2-512, [rsa-sha2-512-cert-v01@openssh.com](mailto:rsa-sha2-512-cert-v01@openssh.com), [sk-ecdsa-sha2-nistp256@openssh.com](mailto:sk-ecdsa-sha2-nistp256@openssh.com), [sk-ssh-ed25519@openssh.com](mailto:sk-ssh-ed25519@openssh.com), ssh-dss, [ssh-dss-cert-v01@openssh.com](mailto:ssh-dss-cert-v01@openssh.com), ssh-ed25519, [ssh-ed25519-cert-v01@openssh.com](mailto:ssh-ed25519-cert-v01@openssh.com), ssh-rsa, [ssh-rsa-cert-v01@openssh.com](mailto:ssh-rsa-cert-v01@openssh.com)
Key Exchange Algorithms Allowed A comma-separated list of Key Exchange Algorithms allowed for SFTP connections. Leave unset to allow all. Available options are: curve25519-sha256, [curve25519-sha256@libssh.org](mailto:curve25519-sha256@libssh.org), curve448-sha512, diffie-hellman-group-exchange-sha1, diffie-hellman-group-exchange-sha256, diffie-hellman-group1-sha1, diffie-hellman-group14-sha1, diffie-hellman-group14-sha256, diffie-hellman-group15-sha512, diffie-hellman-group16-sha512, diffie-hellman-group17-sha512, diffie-hellman-group18-sha512, ecdh-sha2-nistp256, ecdh-sha2-nistp384, ecdh-sha2-nistp521, mlkem1024nistp384-sha384, mlkem768nistp256-sha256, mlkem768x25519-sha256, sntrup761x25519-sha512, [sntrup761x25519-sha512@openssh.com](mailto:sntrup761x25519-sha512@openssh.com)
Last Modified Time The lastModifiedTime to assign to the file after transferring it. If not set, the lastModifiedTime will not be changed. Format must be yyyy-MM-dd 'T'HH:mm:ssZ. You may also use expression language such as $\{file.lastModifiedTime\}. If the value is invalid, the processor will not be invalid but will fail to change lastModifiedTime of the file.
Message Authentication Codes Allowed A comma-separated list of Message Authentication Codes allowed for SFTP connections. Leave unset to allow all. Available options are: hmac-md5, hmac-md5-96, hmac-sha1, hmac-sha1-96, [hmac-sha1-etm@openssh.com](mailto:hmac-sha1-etm@openssh.com), hmac-sha2-256, [hmac-sha2-256-etm@openssh.com](mailto:hmac-sha2-256-etm@openssh.com), hmac-sha2-512, [hmac-sha2-512-etm@openssh.com](mailto:hmac-sha2-512-etm@openssh.com)
Password Password for the user account
Permissions The permissions to assign to the file after transferring it. Format must be either UNIX rwxrwxrwx with a - in place of denied permissions (e.g. rw-r–r–) or an octal number (e.g. 644). If not set, the permissions will not be changed. You may also use expression language such as $\{file.permissions\}. If the value is invalid, the processor will not be invalid but will fail to change permissions of the file.
Port The port that the remote system is listening on for file transfers
Private Key Passphrase Password for the private key
Private Key Path The fully qualified path to the Private Key file
Reject Zero-Byte Files Determines whether or not Zero-byte files should be rejected without attempting to transfer
Remote Group Integer value representing the Group ID to set on the file after transferring it. If not set, the group will not be set. You may also use expression language such as $\{file.group\}. If the value is invalid, the processor will not be invalid but will fail to change the group of the file.
Remote Owner Integer value representing the User ID to set on the file after transferring it. If not set, the owner will not be set. You may also use expression language such as $\{file.owner\}. If the value is invalid, the processor will not be invalid but will fail to change the owner of the file.
Remote Path The path on the remote system from which to pull or push files
Send Keep Alive On Timeout Send a Keep Alive message every 5 seconds up to 5 times for an overall timeout of 25 seconds.
Strict Host Key Checking Indicates whether or not strict enforcement of hosts keys should be applied
Temporary Filename If set, the filename of the sent file will be equal to the value specified during the transfer and after successful completion will be renamed to the original filename. If this value is set, the Dot Rename property is ignored.
Use Compression Indicates whether or not ZLIB compression should be used when transferring files
Username Username
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure FlowFiles that failed to send to the remote system; failure is usually looped back to this processor
reject FlowFiles that were rejected by the destination system
success FlowFiles that are successfully sent will be routed to success
Property Description
Batch Size The maximum number of files to put in each iteration
Conflict Resolution Strategy Indicates what should happen when a file with the same name already exists in the output directory
Create Missing Directories If true, then missing destination directories will be created. If false, flowfiles are penalized and sent to failure.
Directory The network folder to which files should be written. This is the remaining relative path after the share: \hostnameshare[dir1dir2]. You may use expression language.
Domain The domain used for authentication. Optional, in most cases username and password is sufficient.
Hostname The network host to which files should be written.
Password The password used for authentication. Required if Username is set.
Share The network share to which files should be written. This is the "first folder"after the hostname: \hostname[share]dir1dir2
Share Access Strategy Indicates which shared access are granted on the file during the write. None is the most restrictive, but the safest setting to prevent corruption.
Temporary Suffix A temporary suffix which will be apended to the filename while it's transfering. After the transfer is complete, the suffix will be removed.
Username The username used for authentication. If no username is set then anonymous authentication is attempted.
enable-dfs Enables accessing Distributed File System (DFS) and following DFS links during SMB operations.
smb-dialect The SMB dialect is negotiated between the client and the server by default to the highest common version supported by both end. In some rare cases, the client-server communication may fail with the automatically negotiated dialect. This property can be used to set the dialect explicitly (e.g. to downgrade to a lower version), when those situations would occur.
timeout Timeout for read and write operations.
use-encryption Turns on/off encrypted communication between the client and the server. The property's behavior is SMB dialect dependent: SMB 2.x does not support encryption and the property has no effect. In case of SMB 3.x, it is a hint/request to the server to turn encryption on if the server also supports it.
Name Description
failure Files that could not be written to the output network path for some reason are transferred to this relationship
success Files that have been successfully written to the output network path are transferred to this relationship
Property Description
Compression Enabled Set true to compress data before uploading the file
Database The database to use by default. The same as passing 'db=DATABASE_NAME' to the connection string.
File Name Destination file name to use.
File Prefix Path prefix under which the data should be uploaded on the stage.
Internal Stage Type The type of internal stage to use
Schema The schema to use by default. The same as passing 'schema=SCHEMA' to the connection string.
Snowflake Connection Service Database Connection Service for accessing Snowflake
Stage The name of the internal stage in the Snowflake account to put files into.
Table The name of the table in the Snowflake account.
Name Description
failure For FlowFiles of failed PUT operation
success For FlowFiles of successful PUT operation
Name Description
snowflake.staged.file.path Staged file path
Property Description
Account Snowflake Account Identifier with Organization Name and Account Name formatted as [organization-name]-[account-name]
Authentication Strategy Strategy for authenticating Snowflake connections
Client Lag The maximum amount of time that the client will wait before flushing records to Snowflake. A larger value can increase latency while sending to Snowflake, but for tables that are not constantly updated it can result in queries that are faster and more cost efficient.
Concurrency Group Allows specifying a 'Concurrency Group' that a given FlowFile belongs to, so that the number of Concurrent Tasks that write to tables in a given group can be limited.
Connection Strategy Strategy for connecting to Snowflake Snowpipe Streaming services
Database Snowflake Database destination for processed records
Delivery Guarantee Specifies the delivery guarantee for the records being sent to Snowflake.
Iceberg Enabled Specifies whether the processor ingests data into an Iceberg table. The processor fails if this property doesn’t match the actual table type.
Max Batch Size Maximum number of records to ingest in a single call. Multiple ingest calls will be made if the number of records exceeds the max batch size. Current guidance recommends batch sizes less than 16MB. The Max Batch Size can be tuned based on the average record size such that batches are generally less than 16MB.
Max Tasks Per Group The maximum number of channels to create for a given Snowpipe Channel Prefix. This allows limiting the number of concurrent tasks that can be writing to a given Snowflake table.
Private Key Service RSA Private Key Service for authenticating connections
Record Offset The Expression Language expression to use to determine the offset of the first record in a FlowFile.
Record Offset Record Path The Record Path expression to use to determine the offset of the first record in a FlowFile.
Record Offset Strategy Specifies the strategy for determining the offset of each record.
Record Reader The Record Reader to use for reading the input
Role Snowflake Role the user will assume when authenticating connections
Schema Snowflake Schema destination for processed records
Snowpipe Channel Index The index to use for the Snowpipe channel name. The full channel name will be constructed as openflow.[prefix].[index]. This is necessary in order to provide Exactly Once delivery to Snowflake, as any retry must be tried against the same channel as was previously used.
Snowpipe Channel Prefix The prefix to use for the Snowpipe channel name. The full channel name will be constructed as openflow.[prefix].[index]. The default value is $\{hostname(false)\}, which ensures that each NiFi node in the cluster writes to a unique channel by incorporating the hostname of the NiFi instance into the channel name.
Table Snowflake Table destination for processed records
User Snowflake User for authenticating connections
Name Description
failure For FlowFiles that failed to upload to Snowflake
success For FlowFiles successfully uploaded to Snowflake
Property Description
Account Snowflake Account Identifier with Organization Name and Account Name formatted as [organization-name]-[account-name]
Authentication Strategy Strategy for authenticating Snowflake connections
Channel Group Group for managing distinct Snowpipe Streaming Channels with partitioning
Channel Insert Timeout Maximum duration to retry inserting records before failing with an upper bound of 5 minutes
Database Snowflake Database destination for processed records
File Fragment Count Maximum number of File Fragments sent to object storage for Snowpipe Streaming ingestion from input FlowFiles. Must be between 1 and 100.
File Fragment Size Maximum size in bytes for each File Fragment sent to object storage for Snowpipe Streaming ingestion. Must be between 1 KB and 256 MB
Offset Token End Expression Expression Language definition to produce the highest offset token for a FlowFile as a monotonically increasing number
Offset Token Record Pointer JSON Pointer to offset token in each record required when the last committed offset token is between start and end boundaries
Offset Token Start Expression Expression Language definition to produce the lowest offset token for a FlowFile as a monotonically increasing number
Offset Tracking Timeout Maximum duration to poll channel status for committed offset tokens
Pipe Snowflake Pipe destination for processed records
Private Key Service RSA Private Key Service for authenticating connections
Schema Snowflake Schema destination for processed records
Transfer Strategy Strategy for transferring records to Snowpipe Streaming
User Snowflake User for authenticating connections
Web Client Service Provider Web Client Service Provider supporting HTTP request and response handling
Name Description
failure FlowFiles that failed to upload to Snowflake
invalid FlowFiles that Snowflake identified as containing one or more invalid rows resulting in partial transmission
success FlowFiles successfully uploaded to Snowflake
Property Description
ARN Type The type of Amazon Resource Name that is being used.
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Amazon Resource Name (ARN) The name of the resource to which notifications should be published
Character Set The character set in which the FlowFile's content is encoded
Communications Timeout
Deduplication Message ID The token used for deduplication of sent messages
E-mail Subject The optional subject to use for any subscribers that are subscribed via E-mail
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
Message Group ID If using FIFO, the message group to which the flowFile belongs
Region
SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections
Use JSON Structure If true, the contents of the FlowFile must be JSON with a top-level element named 'default'. Additional elements can be used to send different messages to different protocols. See the Amazon SNS Documentation for more information.
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure FlowFiles are routed to failure relationship
success FlowFiles are routed to success relationship
Property Description
Character Set Specifies the character set of the data being sent.
Hostname Destination hostname or IP address
Idle Connection Expiration The amount of time a connection should be held open without being used before closing the connection. A value of 0 seconds will disable this feature.
Max Size of Socket Send Buffer The maximum size of the socket send buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped.
Message Delimiter Specifies the delimiter to use for splitting apart multiple messages within a single FlowFile. If not specified, the entire content of the FlowFile will be used as a single message. If specified, the contents of the FlowFile will be split on this delimiter and each section sent as a separate message. Note that if messages are delimited and some messages for a given FlowFile are transferred successfully while others are not, the messages will be split into individual FlowFiles, such that those messages that were successfully sent are routed to the 'success' relationship while other messages are sent to the 'failure' relationship.
Port Destination port number
Protocol The protocol for communication.
SSL Context Service Specifies the SSL Context Service to enable TLS socket communication
Timeout The timeout for connecting to and communicating with the destination. Does not apply to UDP
Name Description
failure FlowFiles that failed to send to the destination are sent out this relationship.
success FlowFiles that are sent successfully to the destination are sent out this relationship.
Property Description
Hostname The ip address or hostname of the Splunk server.
Owner The owner to pass to Splunk.
Password The password to authenticate to Splunk.
Port The HTTP Event Collector HTTP Port Number.
Scheme The scheme for connecting to Splunk.
Security Protocol The security protocol to use for communicating with Splunk.
Token HTTP Event Collector token starting with the string Splunk. For example 'Splunk 1234578-abcd-1234-abcd-1234abcd'
Username The username to authenticate to Splunk.
character-set The name of the character set.
content-type The media type of the event sent to Splunk. If not set, "mime.type" flow file attribute will be used. In case of neither of them is specified, this information will not be sent to the server.
host Specify with the host query string parameter. Sets a default for all events when unspecified.
index Index name. Specify with the index query string parameter. Sets a default for all events when unspecified.
request-channel Identifier of the used request channel.
source User-defined event source. Sets a default for all events when unspecified.
source-type User-defined event sourcetype. Sets a default for all events when unspecified.
Name Description
failure FlowFiles that failed to send to the destination are sent to this relationship.
success FlowFiles that are sent successfully to the destination are sent to this relationship.
Name Description
splunk.acknowledgement.id The indexing acknowledgement id provided by Splunk.
splunk.responded.at The time of the response of put request for Splunk.
Property Description
Batch Size The preferred number of FlowFiles to put to the database in a single transaction
JDBC Connection Pool Specifies the JDBC Connection Pool to use in order to convert the JSON message to a SQL statement. The Connection Pool is necessary in order to determine the appropriate database column types.
Obtain Generated Keys If true, any key that is automatically generated by the database will be added to the FlowFile that generated it using the sql.generate.key attribute. This may result in slightly slower performance and is not supported by all databases.
Rollback On Failure Specify how to handle error. By default (false), if an error occurs while processing a FlowFile, the FlowFile will be routed to 'failure' or 'retry' relationship based on error type, and processor can continue with next FlowFile. Instead, you may want to rollback currently processed FlowFiles and stop further processing immediately. In that case, you can do so by enabling this 'Rollback On Failure' property. If enabled, failed FlowFiles will stay in the input relationship without penalizing it and being processed repeatedly until it gets processed successfully or removed by other means. It is important to set adequate 'Yield Duration' to avoid retrying too frequently.
Support Fragmented Transactions If true, when a FlowFile is consumed by this Processor, the Processor will first check the fragment.identifier and fragment.count attributes of that FlowFile. If the fragment.count value is greater than 1, the Processor will not process any FlowFile with that fragment.identifier until all are available; at that point, it will process all FlowFiles with that fragment.identifier as a single transaction, in the order specified by the FlowFiles 'fragment.index attributes. This Provides atomicity of those SQL statements. Once any statement of this transaction throws exception when executing, this transaction will be rolled back. When transaction rollback happened, none of these FlowFiles would be routed to'success '. If the <Rollback On Failure> is set true, these FlowFiles will stay in the input relationship. When the <Rollback On Failure> is set false,, if any of these FlowFiles will be routed to' retry ', all of these FlowFiles will be routed to' retry '.Otherwise, they will be routed to' failure'. If this value is false, these attributes will be ignored and the updates will occur independent of one another.
Transaction Timeout If the <Support Fragmented Transactions> property is set to true, specifies how long to wait for all FlowFiles for a particular fragment.identifier attribute to arrive before just transferring all of the FlowFiles with that identifier to the 'failure' relationship
database-session-autocommit The autocommit mode to set on the database connection being used. If set to false, the operation(s) will be explicitly committed or rolled back (based on success or failure respectively), if set to true the driver/database handles the commit/rollback.
putsql-sql-statement The SQL statement to execute. The statement can be empty, a constant value, or built from attributes using Expression Language. If this property is specified, it will be used regardless of the content of incoming FlowFiles. If this property is empty, the content of the incoming FlowFile is expected to contain a valid SQL statement, to be issued by the processor to the database.
Name Description
failure A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, such as an invalid query or an integrity constraint violation
retry A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed
success A FlowFile is routed to this relationship after the database is successfully updated
Name Description
sql.generated.key If the database generated a key for an INSERT statement and the Obtain Generated Keys property is set to true, this attribute will be added to indicate the generated key, if possible. This feature is not supported by all database vendors.
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Communications Timeout
Deduplication Message ID The token used for deduplication of sent messages
Delay The amount of time to delay the message before it becomes available to consumers
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
Message Group ID If using FIFO, the message group to which the FlowFile belongs
Queue URL The URL of the queue to act upon
Region
SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure FlowFiles are routed to failure relationship
success FlowFiles are routed to success relationship
Property Description
Batch Size The number of incoming FlowFiles to process in a single execution of this processor.
Character Set Specifies the character set of the Syslog messages. Note that Expression language is not evaluated per FlowFile.
Hostname The IP address or hostname of the Syslog server.
Idle Connection Expiration The amount of time a connection should be held open without being used before closing the connection.
Max Size of Socket Send Buffer The maximum size of the socket send buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped.
Message Body The body for the Syslog messages.
Message Hostname The hostname for the Syslog messages.
Message Priority The priority for the Syslog messages, excluding < >.
Message Timestamp The timestamp for the Syslog messages. The timestamp can be an RFC5424 timestamp with a format of "yyyy-MM-dd 'T'HH:mm:ss. S 'Z'" or "yyyy-MM-dd 'T'HH:mm:ss. S+hh:mm", " or it can be an RFC3164 timestamp with a format of "MMM d HH:mm:ss".
Message Version The version for the Syslog messages.
Port The port for Syslog communication. Note that Expression language is not evaluated per FlowFile.
Protocol The protocol for Syslog communication.
SSL Context Service The Controller Service to use in order to obtain an SSL Context. If this property is set, syslog messages will be sent over a secure connection.
Timeout The timeout for connecting to and communicating with the syslog server. Does not apply to UDP. Note that Expression language is not evaluated per FlowFile.
Name Description
failure FlowFiles that failed to send to Syslog are sent out this relationship.
invalid FlowFiles that do not form a valid Syslog message are sent out this relationship.
success FlowFiles that are sent successfully to Syslog are sent out this relationship.
Property Description
Character Set Specifies the character set of the data being sent.
Connection Per FlowFile Specifies whether to send each FlowFile's content on an individual connection.
Hostname Destination hostname or IP address
Idle Connection Expiration The amount of time a connection should be held open without being used before closing the connection. A value of 0 seconds will disable this feature.
Max Size of Socket Send Buffer The maximum size of the socket send buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped.
Outgoing Message Delimiter Specifies the delimiter to use when sending messages out over the same TCP stream. The delimiter is appended to each FlowFile message that is transmitted over the stream so that the receiver can determine when one message ends and the next message begins. Users should ensure that the FlowFile content does not contain the delimiter character to avoid errors. In order to use a new line character you can enter 'n'. For a tab character use 't'. Finally for a carriage return use 'r'.
Port Destination port number
Record Reader Specifies the Controller Service to use for reading Records from input FlowFiles
Record Writer Specifies the Controller Service to use for writing Records to the configured socket address
SSL Context Service Specifies the SSL Context Service to enable TLS socket communication
Timeout The timeout for connecting to and communicating with the destination. Does not apply to UDP
Transmission Strategy Specifies the strategy used for reading input FlowFiles and transmitting messages to the destination socket address
Name Description
failure FlowFiles that failed to send to the destination are sent out this relationship.
success FlowFiles that are sent successfully to the destination are sent out this relationship.
Name Description
record.count.transmitted Count of records transmitted to configured destination address
Property Description
Hostname Destination hostname or IP address
Idle Connection Expiration The amount of time a connection should be held open without being used before closing the connection. A value of 0 seconds will disable this feature.
Max Size of Socket Send Buffer The maximum size of the socket send buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped.
Port Destination port number
Timeout The timeout for connecting to and communicating with the destination. Does not apply to UDP
Name Description
failure FlowFiles that failed to send to the destination are sent out this relationship.
success FlowFiles that are sent successfully to the destination are sent out this relationship.
Property Description
Databricks Client Databricks Client Service.
Unity Catalog File Path Unity Catalog file path e.g. /Volumes/catalog/schema/volume_name/file.txt
Name Description
failure Databricks failure relationship
success Databricks success relationship
Name Description
error.code The error code for the SQL statement if an error occurred.
error.message The error message for the SQL statement if an error occurred.
Property Description
Corpus ID Identifier of the Vectara corpus
Document Attributes A comma delimited list of NiFi attributes fields, which if present will be included in the document metadata.
Document Author Author of the document
Document Creation Time Timestamp in epoch seconds when the document was created
Document Date Date of document creation
Document Description Description of the document
Document ID A unique identifier for the document constructed either from the source path of the document or a hash of the document's content.
Document Source URL Source URL for document
Document Title Document Title
Index Input Format Input format for indexing service. JSON Object: Load FlowFile content directly as JSON payload. JSON Lines: Create a new section for each line of JSON. JSON Array: Load FlowFile content as a JSON array and create a new section for each element in the JSON array.
Section Custom Dimensions A comma delimited list of metadata fields, which if present in the metadata path will be included as a section's custom dimension. The values for custom dimensions must be valid numbers.
Section Filter Attributes A comma delimited list of metadata fields, which if present in the metadata path will be included as a section metadata filter.
Section ID Attribute The field for setting section id, which is populated if present in the metadata path.
Section Metadata Attributes A comma delimited list of metadata fields, which if present in the metadata path will be included will be included in the section metadata.
Section Metadata JSON Path A JSON Path expression to a metadata JSON Object. The JSON Object needs to contain the list of metadata fields. These fields will be included in Section metadata.
Section Text JSON Path A JSON Path expression to the text field.
Section Title Attribute The field for setting the section title, which is populated if present in the metadata path.
Vectara Client Vectara Client Service.
Name Description
failure Vectara failure relationship
original Original relationship
success Vectara success relationship
Property Description
Corpus ID Identifier of the Vectara corpus
Document Filter Attributes A comma delimited list of metadata fields, which if present in the FlowFile attributes will be included in as a document metadata filter.
Document ID A unique identifier for the document constructed either from the source path of the document or a hash of the document's content.
Document Metadata Attributes A comma delimited list of metadata fields, which if present in the FlowFile attributes will be included will be included in the document metadata.
Vectara Client Vectara Client Service.
Name Description
failure Vectara failure relationship
original Original relationship
success Vectara success relationship
Property Description
websocket-controller-service-id A NiFi Expression to retrieve the id of a WebSocket ControllerService.
websocket-endpoint-id A NiFi Expression to retrieve the endpoint id of a WebSocket ControllerService.
websocket-message-type The type of message content: TEXT or BINARY
websocket-session-id A NiFi Expression to retrieve the session id. If not specified, a message will be sent to all connected WebSocket peers for the WebSocket controller service endpoint.
Name Description
failure FlowFiles that failed to send to the destination are transferred to this relationship.
success FlowFiles that are sent successfully to the destination are transferred to this relationship.
Name Description
websocket.controller.service.id WebSocket Controller Service id.
websocket.session.id Established WebSocket session id.
websocket.endpoint.id WebSocket endpoint id.
websocket.message.type TEXT or BINARY.
websocket.local.address WebSocket server address.
websocket.remote.address WebSocket client address.
websocket.failure.detail Detail of the failure.
Property Description
web-client-service-provider Controller service for HTTP client operations.
zendesk-authentication-type-name Type of authentication to Zendesk API.
zendesk-authentication-value-name Password or authentication token for Zendesk login user.
zendesk-comment-body The content or the path to the comment body in the incoming record.
zendesk-priority The content or the path to the priority in the incoming record.
zendesk-record-reader Specifies the Controller Service to use for parsing incoming data and determining the data's schema.
zendesk-subdomain Name of the Zendesk subdomain.
zendesk-subject The content or the path to the subject in the incoming record.
zendesk-type The content or the path to the type in the incoming record.
zendesk-user Login user to Zendesk subdomain.
Name Description
failure A FlowFile is routed to this relationship if the operation failed and retrying the operation will also fail, such as an invalid data or schema.
success For FlowFiles created as a result of a successful HTTP request.
Name Description
record.count The number of records processed.
error.code The error code of from the response.
error.message The error message of from the response.
Property Description
Database Name Azure Data Explorer Database Name for querying
Kusto Query Service Azure Data Explorer Kusto Query Service
Query Query to be run against Azure Data Explorer
Name Description
failure FlowFiles containing original input associated with a failed Query
success FlowFiles containing results of a successful Query
Name Description
query.error.message Azure Data Explorer query error message on failures
query.executed Azure Data Explorer query executed
mime.type Content Type set to application/json
Property Description
Columns to Return A comma-separated list of column names to be used in the query. If your database requires special treatment of the names (quoting, e.g.), each name should include such treatment. If no column names are supplied, all columns in the specified table will be returned. NOTE: It is important to use consistent column names for a given table for incremental fetch to work properly.
Database Connection Pooling Service The Controller Service that is used to obtain a connection to the database.
Database Dialect Service Database Dialect Service for generating statements specific to a particular service or vendor.
Default Decimal Precision When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'precision' denoting number of available digits is required. Generally, precision is defined by column data type definition or database engines default. However undefined precision (0) can be returned from some database engines. 'Default Decimal Precision' is used when writing those undefined precision numbers.
Default Decimal Scale When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'scale' denoting number of available decimal digits is required. Generally, scale is defined by column data type definition or database engines default. However when undefined precision (0) is returned, scale can also be uncertain with some database engines. 'Default Decimal Scale' is used when writing those undefined numbers. If a value has more decimals than specified scale, then the value will be rounded-up, e.g. 1.53 becomes 2 with scale 0, and 1.5 with scale 1.
Fetch Size The number of result rows to be fetched from the result set at a time. This is a hint to the database driver and may not be honored and/or exact. If the value specified is zero, then the hint is ignored. If using PostgreSQL, then 'Set Auto Commit' must be equal to 'false' to cause 'Fetch Size' to take effect.
Max Wait Time The maximum amount of time allowed for a running SQL select query , zero means there is no limit. Max time less than 1 second will be equal to zero.
Maximum-value Columns A comma-separated list of column names. The processor will keep track of the maximum value for each column that has been returned since the processor started running. Using multiple columns implies an order to the column list, and each column 's values are expected to increase more slowly than the previous columns' values. Thus, using multiple columns implies a hierarchical structure of columns, which is usually used for partitioning tables. This processor can be used to retrieve only those rows that have been added/updated since the last retrieval. Note that some JDBC types such as bit/boolean are not conducive to maintaining maximum value, so columns of these types should not be listed in this property, and will result in error(s) during processing. If no columns are provided, all rows from the table will be considered, which could have a performance impact. NOTE: It is important to use consistent max-value column names for a given table for incremental fetch to work properly.
Normalize Table and Column Names Whether to change non-Avro-compatible characters in column names to Avro-compatible characters. For example, colons and periods will be changed to underscores in order to build a valid Avro record.
Set Auto Commit Allows enabling or disabling the auto commit functionality of the DB connection. Default value is 'No value set'. 'No value set' will leave the db connection 's auto commit mode unchanged. For some JDBC drivers such as PostgreSQL driver, it is required to disable the auto commit functionality to get the'Fetch Size 'setting to take effect. When auto commit is enabled, PostgreSQL driver ignores'Fetch Size'setting and loads all rows of the result set to memory at once. This could lead for a large amount of memory usage when executing queries which fetch large data sets. More Details of this behaviour in PostgreSQL driver can be found in [https://jdbc.postgresql.org//documentation/head/query.html](https://jdbc.postgresql.org//documentation/head/query.html).
Table Name The name of the database table to be queried. When a custom query is used, this property is used to alias the query and appears as an attribute on the FlowFile.
Use Avro Logical Types Whether to use Avro Logical Types for DECIMAL/NUMBER, DATE, TIME and TIMESTAMP columns. If disabled, written as string. If enabled, Logical types are used and written as its underlying type, specifically, DECIMAL/NUMBER as logical 'decimal': written as bytes with additional precision and scale meta data, DATE as logical 'date-millis': written as int denoting days since Unix epoch (1970-01-01), TIME as logical 'time-millis': written as int denoting milliseconds since Unix epoch, and TIMESTAMP as logical 'timestamp-millis': written as long denoting milliseconds since Unix epoch. If a reader of written Avro records also knows these logical types, then these values can be deserialized with more context depending on reader implementation.
db-fetch-db-type Database Type for generating statements specific to a particular service or vendor. The Generic Type supports most cases but selecting a specific type enables optimal processing or additional features.
db-fetch-sql-query A custom SQL query used to retrieve data. Instead of building a SQL query from other properties, this query will be wrapped as a sub-query. Query must have no ORDER BY statement.
db-fetch-where-clause A custom clause to be added in the WHERE condition when building SQL queries.
initial-load-strategy How to handle existing rows in the database table when the processor is started for the first time (or its state has been cleared). The property will be ignored, if any 'initial.maxvalue.*' dynamic property has also been configured.
qdbt-max-frags The maximum number of fragments. If the value specified is zero, then all fragments are returned. This prevents OutOfMemoryError when this processor ingests huge table. NOTE: Setting this property can result in data loss, as the incoming results are not ordered, and fragments may end at arbitrary boundaries where rows are not included in the result set.
qdbt-max-rows The maximum number of result rows that will be included in a single FlowFile. This will allow you to break up very large result sets into multiple FlowFiles. If the value specified is zero, then all rows are returned in a single FlowFile.
qdbt-output-batch-size The number of output FlowFiles to queue before committing the process session. When set to zero, the session will be committed when all result set rows have been processed and the output FlowFiles are ready for transfer to the downstream relationship. For large result sets, this can cause a large burst of FlowFiles to be transferred at the end of processor execution. If this property is set, then when the specified number of FlowFiles are ready for transfer, then the session will be committed, thus releasing the FlowFiles to the downstream relationship. NOTE: The maxvalue.* and fragment.count attributes will not be set on FlowFiles when this property is set.
transaction-isolation-level This setting will set the transaction isolation level for the database connection for drivers that support this setting
Scopes Description
CLUSTER After performing a query on the specified table, the maximum values for the specified column(s) will be retained for use in future executions of the query. This allows the Processor to fetch only those records that have max values greater than the retained values. This can be used for incremental fetching, fetching of newly added rows, etc. To clear the maximum values, clear the state of the processor per the State Management documentation
Name Description
success Successfully created FlowFile from SQL query result set.
Name Description
tablename Name of the table being queried
querydbtable.row.count The number of rows selected by the query
fragment.identifier If 'Max Rows Per Flow File' is set then all FlowFiles from the same query result set will have the same value for the fragment.identifier attribute. This can then be used to correlate the results.
fragment.count If 'Max Rows Per Flow File' is set then this is the total number of FlowFiles produced by a single ResultSet. This can be used in conjunction with the fragment.identifier attribute in order to know how many FlowFiles belonged to the same incoming ResultSet. If Output Batch Size is set, then this attribute will not be populated.
fragment.index If 'Max Rows Per Flow File' is set then the position of this FlowFile in the list of outgoing FlowFiles that were all derived from the same result set FlowFile. This can be used in conjunction with the fragment.identifier attribute to know which FlowFiles originated from the same query result set and in what order FlowFiles were produced
maxvalue.* Each attribute contains the observed maximum value of a specified 'Maximum-value Column'. The suffix of the attribute is the name of the column. If Output Batch Size is set, then this attribute will not be populated.
Property Description
Columns to Return A comma-separated list of column names to be used in the query. If your database requires special treatment of the names (quoting, e.g.), each name should include such treatment. If no column names are supplied, all columns in the specified table will be returned. NOTE: It is important to use consistent column names for a given table for incremental fetch to work properly.
Database Connection Pooling Service The Controller Service that is used to obtain a connection to the database.
Database Dialect Service Database Dialect Service for generating statements specific to a particular service or vendor.
Default Decimal Precision When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'precision' denoting number of available digits is required. Generally, precision is defined by column data type definition or database engines default. However undefined precision (0) can be returned from some database engines. 'Default Decimal Precision' is used when writing those undefined precision numbers.
Default Decimal Scale When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'scale' denoting number of available decimal digits is required. Generally, scale is defined by column data type definition or database engines default. However when undefined precision (0) is returned, scale can also be uncertain with some database engines. 'Default Decimal Scale' is used when writing those undefined numbers. If a value has more decimals than specified scale, then the value will be rounded-up, e.g. 1.53 becomes 2 with scale 0, and 1.5 with scale 1.
Fetch Size The number of result rows to be fetched from the result set at a time. This is a hint to the database driver and may not be honored and/or exact. If the value specified is zero, then the hint is ignored. If using PostgreSQL, then 'Set Auto Commit' must be equal to 'false' to cause 'Fetch Size' to take effect.
Max Wait Time The maximum amount of time allowed for a running SQL select query , zero means there is no limit. Max time less than 1 second will be equal to zero.
Maximum-value Columns A comma-separated list of column names. The processor will keep track of the maximum value for each column that has been returned since the processor started running. Using multiple columns implies an order to the column list, and each column 's values are expected to increase more slowly than the previous columns' values. Thus, using multiple columns implies a hierarchical structure of columns, which is usually used for partitioning tables. This processor can be used to retrieve only those rows that have been added/updated since the last retrieval. Note that some JDBC types such as bit/boolean are not conducive to maintaining maximum value, so columns of these types should not be listed in this property, and will result in error(s) during processing. If no columns are provided, all rows from the table will be considered, which could have a performance impact. NOTE: It is important to use consistent max-value column names for a given table for incremental fetch to work properly.
Set Auto Commit Allows enabling or disabling the auto commit functionality of the DB connection. Default value is 'No value set'. 'No value set' will leave the db connection 's auto commit mode unchanged. For some JDBC drivers such as PostgreSQL driver, it is required to disable the auto commit functionality to get the'Fetch Size 'setting to take effect. When auto commit is enabled, PostgreSQL driver ignores'Fetch Size'setting and loads all rows of the result set to memory at once. This could lead for a large amount of memory usage when executing queries which fetch large data sets. More Details of this behaviour in PostgreSQL driver can be found in [https://jdbc.postgresql.org//documentation/head/query.html](https://jdbc.postgresql.org//documentation/head/query.html).
Table Name The name of the database table to be queried. When a custom query is used, this property is used to alias the query and appears as an attribute on the FlowFile.
Use Avro Logical Types Whether to use Avro Logical Types for DECIMAL/NUMBER, DATE, TIME and TIMESTAMP columns. If disabled, written as string. If enabled, Logical types are used and written as its underlying type, specifically, DECIMAL/NUMBER as logical 'decimal': written as bytes with additional precision and scale meta data, DATE as logical 'date-millis': written as int denoting days since Unix epoch (1970-01-01), TIME as logical 'time-millis': written as int denoting milliseconds since Unix epoch, and TIMESTAMP as logical 'timestamp-millis': written as long denoting milliseconds since Unix epoch. If a reader of written Avro records also knows these logical types, then these values can be deserialized with more context depending on reader implementation.
db-fetch-db-type Database Type for generating statements specific to a particular service or vendor. The Generic Type supports most cases but selecting a specific type enables optimal processing or additional features.
db-fetch-sql-query A custom SQL query used to retrieve data. Instead of building a SQL query from other properties, this query will be wrapped as a sub-query. Query must have no ORDER BY statement.
db-fetch-where-clause A custom clause to be added in the WHERE condition when building SQL queries.
initial-load-strategy How to handle existing rows in the database table when the processor is started for the first time (or its state has been cleared). The property will be ignored, if any 'initial.maxvalue.*' dynamic property has also been configured.
qdbt-max-frags The maximum number of fragments. If the value specified is zero, then all fragments are returned. This prevents OutOfMemoryError when this processor ingests huge table. NOTE: Setting this property can result in data loss, as the incoming results are not ordered, and fragments may end at arbitrary boundaries where rows are not included in the result set.
qdbt-max-rows The maximum number of result rows that will be included in a single FlowFile. This will allow you to break up very large result sets into multiple FlowFiles. If the value specified is zero, then all rows are returned in a single FlowFile.
qdbt-output-batch-size The number of output FlowFiles to queue before committing the process session. When set to zero, the session will be committed when all result set rows have been processed and the output FlowFiles are ready for transfer to the downstream relationship. For large result sets, this can cause a large burst of FlowFiles to be transferred at the end of processor execution. If this property is set, then when the specified number of FlowFiles are ready for transfer, then the session will be committed, thus releasing the FlowFiles to the downstream relationship. NOTE: The maxvalue.* and fragment.count attributes will not be set on FlowFiles when this property is set.
qdbtr-normalize Whether to change characters in column names when creating the output schema. For example, colons and periods will be changed to underscores.
qdbtr-record-writer Specifies the Controller Service to use for writing results to a FlowFile. The Record Writer may use Inherit Schema to emulate the inferred schema behavior, i.e. an explicit schema need not be defined in the writer, and will be supplied by the same logic used to infer the schema from the column types.
Scopes Description
CLUSTER After performing a query on the specified table, the maximum values for the specified column(s) will be retained for use in future executions of the query. This allows the Processor to fetch only those records that have max values greater than the retained values. This can be used for incremental fetching, fetching of newly added rows, etc. To clear the maximum values, clear the state of the processor per the State Management documentation
Name Description
success Successfully created FlowFile from SQL query result set.
Name Description
tablename Name of the table being queried
querydbtable.row.count The number of rows selected by the query
fragment.identifier If 'Max Rows Per Flow File' is set then all FlowFiles from the same query result set will have the same value for the fragment.identifier attribute. This can then be used to correlate the results.
fragment.count If 'Max Rows Per Flow File' is set then this is the total number of FlowFiles produced by a single ResultSet. This can be used in conjunction with the fragment.identifier attribute in order to know how many FlowFiles belonged to the same incoming ResultSet. If Output Batch Size is set, then this attribute will not be populated.
fragment.index If 'Max Rows Per Flow File' is set then the position of this FlowFile in the list of outgoing FlowFiles that were all derived from the same result set FlowFile. This can be used in conjunction with the fragment.identifier attribute to know which FlowFiles originated from the same query result set and in what order FlowFiles were produced
maxvalue.* Each attribute contains the observed maximum value of a specified 'Maximum-value Column'. The suffix of the attribute is the name of the column. If Output Batch Size is set, then this attribute will not be populated.
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer.
record.count The number of records output by the Record Writer.
Property Description
Collection Name The name of the Milvus collection name to use
Max Query Batch Size This is the number of vectors that are contained in a single request to Milvus during a query. Milvus is unable to support batch queries of more then 10 vectors at a time.
Maximum Results The maximum number of results to return (i.e., Top K)
Milvus Connection Service Connection Service for accessing Milvus Database
Output Search Fields Comma separated list of additional fields to return from a search against the Milvus database. Milvus will return the score and id fields by default.
Partition Partition of the vector database that you want to perform operations in. If the database has only one partition leave empty.
Record Reader The Record Reader to use for reading the FlowFile
Record Writer The Record Writer to use for writing the results
Reranking Smoothing Parameter Smoothing Parameter of the Reciprocal Rank Fusion (RRFRanker) during Hybrid Search
Results Record Path Specifies where in the record to place the results.
Sparse Vector Field Name The name of the field to use for storing the sparse vectors.
Sparse Vector Indices Path If, Sparse Vectors are to be provided, this RecordPath points to the indices of the sparse data to use.
Sparse Vector Values Path If, Sparse Vectors are to be provided, this RecordPath points to the values of the sparse data to use.
Vector Field Name The name of the field in Milvus to use for storing the vectors.
Vector Record Path The path to the vector field in the record
Name Description
failure FlowFiles that cannot be sent to Milvus, and for which a retry is not expected to be successful, are routed to this relationship
retry FlowFiles that fail to be sent to Milvus, but for which a retry may help, are routed to this relationship
success FlowFiles that are successfully sent to Milvus are routed to this relationship
Property Description
ID Record Path The path to the ID field in the record
Include Metadata Specifies whether to include metadata in the results
Include Vectors Specifies whether to include vectors in the results
Number of Results The number of results to return (i.e., Top K)
Pinecone API Key The API key for the Pinecone service
Pinecone Index The name of the Pinecone index to use
Pinecone Namespace The name of the Pinecone namespace to use
Query Filter A JSON representation of the query filter to use
Query Strategy The strategy to use for querying Pinecone
Record Reader The Record Reader to use for reading the FlowFile
Record Writer The Record Writer to use for writing the results
Results Record Path Specifies where in the record to place the results.
Sparse Dense Vector Weighting Ranges from 0.0 to 1.0. Weight to apply on dense and sparse vectors when doing an hybrid search. (1 - weight) will be applied to the values of the sparse vector and (weight) will be applied to the dense vector.
Sparse Vector Indices Path If, Sparse Vectors are to be provided, this RecordPath points to the indices of the sparse data to use.
Sparse Vector Values Path If, Sparse Vectors are to be provided, this RecordPath points to the values of the sparse data to use.
Vector Record Path The path to the vector field in the record
Web Client Service The Web Client Service to use for communicating with Pinecone
Name Description
failure FlowFiles that cannot be sent to Pinecone, and for which a retry is not expected to be successful, are routed to this relationship
retry FlowFiles that fail to be sent to Pinecone, but for which a retry may help, are routed to this relationship
success FlowFiles that are successfully sent to Pinecone are routed to this relationship
Property Description
Default Decimal Precision When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'precision' denoting number of available digits is required. Generally, precision is defined by column data type definition or database engines default. However undefined precision (0) can be returned from some database engines. 'Default Decimal Precision' is used when writing those undefined precision numbers.
Default Decimal Scale When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'scale' denoting number of available decimal digits is required. Generally, scale is defined by column data type definition or database engines default. However when undefined precision (0) is returned, scale can also be uncertain with some database engines. 'Default Decimal Scale' is used when writing those undefined numbers. If a value has more decimals than specified scale, then the value will be rounded-up, e.g. 1.53 becomes 2 with scale 0, and 1.5 with scale 1.
include-zero-record-flowfiles When running the SQL statement against an incoming FlowFile, if the result has no data, this property specifies whether or not a FlowFile will be sent to the corresponding relationship
record-reader Specifies the Controller Service to use for parsing incoming data and determining the data's schema
record-writer Specifies the Controller Service to use for writing results to a FlowFile
Name Description
failure If a FlowFile fails processing for any reason (for example, the SQL statement contains columns not present in input data), the original FlowFile it will be routed to this relationship
original The original FlowFile is routed to this relationship
Name Description
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer
record.count The number of records selected by the query
QueryRecord.Route The relation to which the FlowFile was routed
Property Description
age-delay The ending timestamp of the time window will be adjusted earlier by the amount configured in this property. For example, with a property value of 10 seconds, an ending timestamp of 12:30:45 would be changed to 12:30:35.
age-field The name of a TIMESTAMP field that will be used to filter records using a bounded time window. The processor will return only those records with a timestamp value newer than the timestamp recorded after the last processor run.
create-zero-record-files Specifies whether or not to create a FlowFile when the Salesforce REST API does not return any records
custom-soql-query Specify the SOQL query to run.
custom-where-condition A custom expression to be added in the WHERE clause of the query
field-names Comma-separated list of field names requested from the sObject to be queried. When this field is left empty, all fields are queried.
include-deleted-records If true, the processor will include deleted records (IsDeleted = true) in the query results. When enabled, the processor will use the 'queryAll' API.
initial-age-filter This property specifies the start time that the processor applies when running the first query.
oauth2-access-token-provider Service providing OAuth2 Access Tokens for authenticating using the HTTP Authorization Header
query-type Choose to provide the query by parameters or a full custom query.
read-timeout Maximum time allowed for reading a response from the Salesforce REST API
record-writer Service used for writing records returned from the Salesforce REST API
salesforce-api-version The version number of the Salesforce REST API appended to the URL after the services/data path. See Salesforce documentation for supported versions
salesforce-url The URL of the Salesforce instance including the domain without additional path information, such as [https://MyDomainName.my.salesforce.com](https://MyDomainName.my.salesforce.com)
sobject-name The Salesforce sObject to be queried
Scopes Description
CLUSTER When 'Age Field' is set, after performing a query the time of execution is stored. Subsequent queries will be augmented with an additional condition so that only records that are newer than the stored execution time (adjusted with the optional value of 'Age Delay') will be retrieved. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node can pick up where the previous node left off, without duplicating the data.
Name Description
failure The input flowfile gets sent to this relationship when the query fails.
original The input flowfile gets sent to this relationship when the query succeeds.
success For FlowFiles created as a result of a successful query.
Name Description
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer.
record.count Sets the number of records in the FlowFile.
total.record.count Sets the total number of records in the FlowFile.
Property Description
Hostname The ip address or hostname of the Splunk server.
Owner The owner to pass to Splunk.
Password The password to authenticate to Splunk.
Port The HTTP Event Collector HTTP Port Number.
Scheme The scheme for connecting to Splunk.
Security Protocol The security protocol to use for communicating with Splunk.
Token HTTP Event Collector token starting with the string Splunk. For example 'Splunk 1234578-abcd-1234-abcd-1234abcd'
Username The username to authenticate to Splunk.
max-query-size The maximum number of acknowledgement identifiers the outgoing query contains in one batch. It is recommended not to set it too low in order to reduce network communication.
request-channel Identifier of the used request channel.
ttl The maximum time the processor tries to acquire acknowledgement confirmation for an index, from the point of registration. After the given amount of time, the processor considers the index as not acknowledged and transfers the FlowFile to the "unacknowledged" relationship.
Name Description
failure A FlowFile is transferred to this relationship when the acknowledgement was not successful due to errors during the communication. FlowFiles are timing out or unknown by the Splunk server will transferred to "undetermined" relationship.
success A FlowFile is transferred to this relationship when the acknowledgement was successful.
unacknowledged A FlowFile is transferred to this relationship when the acknowledgement was not successful. This can happen when the acknowledgement did not happened within the time period set for Maximum Waiting Time. FlowFiles with acknowledgement id unknown for the Splunk server will be transferred to this relationship after the Maximum Waiting Time is reached.
undetermined A FlowFile is transferred to this relationship when the acknowledgement state is not determined. FlowFiles transferred to this relationship might be penalized. This happens when Splunk returns with HTTP 200 but with false response for the acknowledgement id in the flow file attribute.
Display Name API Name Default Value Allowable Values Description
Service to Use * Service to Use $\{recordreader.name\} Specifies the name of the user-defined property whose associated Controller Service should be used.
Display Name API Name Default Value Allowable Values Description
Service to Use * Service to Use $\{recordsetwriter.name\} Specifies the name of the user-defined property whose associated Controller Service should be used.
Display Name API Name Default Value Allowable Values Description
Cluster Max Redirects * Cluster Max Redirects 5 The maximum number of redirects that can be performed when clustered.
Communication Timeout * Communication Timeout 10 seconds The timeout to use when attempting to communicate with Redis.
Connection String * Connection String The connection string for Redis. In a standalone instance this value will be of the form hostname:port. In a sentinel instance this value will be the comma-separated list of sentinels, such as host1:port1,host2:port2,host3:port3. In a clustered instance this value will be the comma-separated list of cluster masters, such as host1:port,host2:port,host3:port.
Database Index * Database Index 0 The database index to be used by connections created from this connection pool. See the databases property in redis.conf, by default databases 0-15 will be available.
Password Password The password used to authenticate to the Redis server. See the 'requirepass' property in redis.conf.
Pool - Block When Exhausted * Pool - Block When Exhausted true - true - false Whether or not clients should block and wait when trying to obtain a connection from the pool when the pool has no available connections. Setting this to false means an error will occur immediately when a client requests a connection and none are available.
Pool - Max Idle * Pool - Max Idle 8 The maximum number of idle connections that can be held in the pool, or a negative value if there is no limit.
Pool - Max Total * Pool - Max Total 8 The maximum number of connections that can be allocated by the pool (checked out to clients, or idle awaiting checkout). A negative value indicates that there is no limit.
Pool - Max Wait Time * Pool - Max Wait Time 10 seconds The amount of time to wait for an available connection when Block When Exhausted is set to true.
Pool - Min Evictable Idle Time * Pool - Min Evictable Idle Time 60 seconds The minimum amount of time an object may sit idle in the pool before it is eligible for eviction.
Pool - Min Idle * Pool - Min Idle 0 The target for the minimum number of idle connections to maintain in the pool. If the configured value of Min Idle is greater than the configured value for Max Idle, then the value of Max Idle will be used instead.
Pool - Num Tests Per Eviction Run * Pool - Num Tests Per Eviction Run -1 The number of connections to tests per eviction attempt. A negative value indicates to test all connections.
Pool - Test On Borrow * Pool - Test On Borrow false - true - false Whether or not connections should be tested upon borrowing from the pool.
Pool - Test On Create * Pool - Test On Create false - true - false Whether or not connections should be tested upon creation.
Pool - Test On Return * Pool - Test On Return false - true - false Whether or not connections should be tested upon returning to the pool.
Pool - Test While Idle * Pool - Test While Idle true - true - false Whether or not connections should be tested while idle.
Pool - Time Between Eviction Runs * Pool - Time Between Eviction Runs 30 seconds The amount of time between attempting to evict idle connections from the pool.
Redis Mode * Redis Mode Standalone - Standalone - Sentinel - Cluster The type of Redis being communicated with - standalone, sentinel, or clustered.
SSL Context Service SSL Context Service If specified, this service will be used to create an SSL Context that will be used to secure communications; if not specified, communications will not be secure
Sentinel Master Sentinel Master The name of the sentinel master, require when Mode is set to Sentinel
Sentinel Password Sentinel Password The password used to authenticate to the Redis Sentinel server. See the 'requirepass' and 'sentinel sentinel-pass' properties in sentinel.conf.
Sentinel Username Sentinel Username The username used to authenticate to the Redis sentinel server.
Username Username The username used to authenticate to the Redis server.
Display Name API Name Default Value Allowable Values Description
TTL * redis-cache-ttl 0 secs Indicates how long the data should exist in Redis. Setting '0 secs' would mean the data would exist forever
Redis Connection Pool * redis-connection-pool
Display Name API Name Default Value Allowable Values Description
Record Reader * Record Reader The underlying RecordReaderFactory service that will be used to read records before filtering is applied.
Property Description
Record Reader Specifies the Controller Service to use for reading incoming data
Record Writer Specifies the Controller Service to use for writing out the records
Name Description
failure If a FlowFile cannot be transformed from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship
success FlowFiles that are successfully transformed will be routed to this relationship
Name Description
record.error.message This attribute provides on failure the error message encountered by the Reader or Writer.
Property Description
Record Reader Specifies the Controller Service to use for reading incoming data
Record Writer Specifies the Controller Service to use for writing out the records
Name Description
failure If a FlowFile cannot be transformed from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship
success FlowFiles that are successfully transformed will be routed to this relationship
Name Description
record.index This attribute provides the current row index and is only available inside the literal value expression.
Property Description
Character Set The Character Set in which the file is encoded
Evaluation Mode Run the 'Replacement Strategy' against each line separately (Line-by-Line) or buffer the entire file into memory (Entire Text) and run against that.
Line-by-Line Evaluation Mode Run the 'Replacement Strategy' against each line separately (Line-by-Line) for all lines in the FlowFile, First Line (Header) alone, Last Line (Footer) alone, Except the First Line (Header) or Except the Last Line (Footer).
Maximum Buffer Size Specifies the maximum amount of data to buffer (per file or per line, depending on the Evaluation Mode) in order to apply the replacement. If 'Entire Text' (in Evaluation Mode) is selected and the FlowFile is larger than this value, the FlowFile will be routed to 'failure'. In 'Line-by-Line' Mode, if a single line is larger than this value, the FlowFile will be routed to 'failure'. A default value of 1 MB is provided, primarily for 'Entire Text' mode. In 'Line-by-Line' Mode, a value such as 8 KB or 16 KB is suggested. This value is ignored if the <Replacement Strategy> property is set to one of: Append, Prepend, Always Replace
Regular Expression The Search Value to search for in the FlowFile content. Only used for 'Literal Replace' and 'Regex Replace' matching strategies
Replacement Strategy The strategy for how and what to replace within the FlowFile's text content.
Replacement Value The value to insert using the 'Replacement Strategy'. Using "Regex Replace" back-references to Regular Expression capturing groups are supported, but back-references that reference capturing groups that do not exist in the regular expression will be treated as literal value. Back References may also be referenced using the Expression Language, as '$1', '$2', etc. The single-tick marks MUST be included, as these variables are not "Standard" attribute names (attribute names must be quoted unless they contain only numbers, letters, and _).
Text to Append The text to append to the end of the FlowFile, or each line, depending on the configured value of the Evaluation Mode property
Text to Prepend The text to prepend to the start of the FlowFile, or each line, depending on the configured value of the Evaluation Mode property
Name Description
failure FlowFiles that could not be updated are routed to this relationship
success FlowFiles that have been successfully processed are routed to this relationship. This includes both FlowFiles that had text replaced and those that did not.
Property Description
Character Set The Character Set in which the file is encoded
Mapping File The name of the file (including the full path) containing the Mappings.
Mapping File Refresh Interval The polling interval to check for updates to the mapping file. The default is 60s.
Matching Group The number of the matching group of the provided regex to replace with the corresponding value from the mapping file (if it exists).
Maximum Buffer Size Specifies the maximum amount of data to buffer (per file) in order to apply the regular expressions. If a FlowFile is larger than this value, the FlowFile will be routed to 'failure'
Regular Expression The Regular Expression to search for in the FlowFile content
Name Description
failure FlowFiles that could not be updated are routed to this relationship
success FlowFiles that have been successfully updated are routed to this relationship, as well as FlowFiles whose content does not match the given Regular Expression
Display Name API Name Default Value Allowable Values Description
Proxy Configuration Service proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor.
Authentication Strategy * rest-lookup-authentication-strategy NONE - None - Basic - OAuth2 Authentication strategy to use with REST service.
Basic Authentication Password rest-lookup-basic-auth-password The password to be used by the client to authenticate against the Remote URL.
Basic Authentication Username rest-lookup-basic-auth-username The username to be used by the client to authenticate against the Remote URL. Cannot include control characters (0-31), ':', or DEL (127).
Connection Timeout * rest-lookup-connection-timeout 5 secs Max wait time for connection to remote service.
Use Digest Authentication rest-lookup-digest-auth false - true - false Whether to communicate with the website using Digest Authentication. 'Basic Authentication Username' and 'Basic Authentication Password' are used for authentication.
OAuth2 Access Token Provider * rest-lookup-oauth2-access-token-provider Enables managed retrieval of OAuth2 Bearer Token applied to HTTP requests using the Authorization Header.
Read Timeout * rest-lookup-read-timeout 15 secs Max wait time for response from remote service.
Record Path rest-lookup-record-path An optional record path that can be used to define where in a record to get the real data to merge into the record set to be enriched. See documentation for examples of when this might be useful.
Record Reader * rest-lookup-record-reader The record reader to use for loading the payload and handling it as a record set.
Response Handling Strategy * rest-lookup-response-handling-strategy RETURNED - Returned - Evaluated Whether to return all responses or throw errors for unsuccessful HTTP status codes.
SSL Context Service rest-lookup-ssl-context-service The SSL Context Service used to provide client certificate information for TLS/SSL connections.
URL * rest-lookup-url The URL for the REST endpoint. Expression language is evaluated against the lookup key/value pairs, not flowfile attributes.
Property Description
Fail on Non-numerical Overwrite If the FlowFile already has the attribute defined in 'Retry Attribute' that is *not* a number, fail the FlowFile instead of resetting that value to '1'
maximum-retries The maximum number of times a FlowFile can be retried before being passed to the 'retries_exceeded' relationship
penalize-retries If set to 'true', this Processor will penalize input FlowFiles before passing them to the 'retry' relationship. This does not apply to the 'retries_exceeded' relationship.
retry-attribute The name of the attribute that contains the current retry count for the FlowFile. WARNING: If the name matches an attribute already on the FlowFile that does not contain a numerical value, the processor will either overwrite that attribute with '1' or fail based on configuration.
reuse-mode Defines how the Processor behaves if the retry FlowFile has a different retry UUID than the instance that received the FlowFile. This generally means that the attribute was not reset after being successfully retried by a previous instance of this processor.
Name Description
failure The processor is configured such that a non-numerical value on 'Retry Attribute' results in a failure instead of resetting that value to '1'. This will immediately terminate the limited feedback loop. Might also include when 'Maximum Retries' contains attribute expression language that does not resolve to an Integer.
retries_exceeded Input FlowFile has exceeded the configured maximum retry count, do not pass this relationship back to the input Processor to terminate the limited feedback loop.
retry Input FlowFile has not exceeded the configured maximum retry count, pass this relationship back to the input Processor to create a limited feedback loop.
Name Description
Retry Attribute User defined retry attribute is updated with the current retry count
Retry Attribute .uuid User defined retry attribute with .uuid that determines what processor retried the FlowFile last
Property Description
Routing Strategy Specifies how to determine which relationship to use when evaluating the Expression Language
Name Description
unmatched FlowFiles that do not match any user-define expression will be routed here
Name Description
RouteOnAttribute.Route The relation to which the FlowFile was routed
Property Description
Character Set The Character Set in which the file is encoded
Content Buffer Size Specifies the maximum amount of data to buffer in order to apply the regular expressions. If the size of the FlowFile exceeds this value, any amount of this value will be ignored
Match Requirement Specifies whether the entire content of the file must match the regular expression exactly, or if any part of the file (up to Content Buffer Size) can contain the regular expression in order to be considered a match
Name Description
unmatched FlowFiles that do not match any of the user-supplied regular expressions will be routed to this relationship
Property Description
Character Set The Character Set in which the incoming text is encoded
Grouping Regular Expression Specifies a Regular Expression to evaluate against each line to determine which Group the line should be placed in. The Regular Expression must have at least one Capturing Group that defines the line's Group. If multiple Capturing Groups exist in the Regular Expression, the values from all Capturing Groups will be concatenated together. Two lines will not be placed into the same FlowFile unless they both have the same value for the Group (or neither line matches the Regular Expression). For example, to group together all lines in a CSV File by the first column, we can set this value to "(.*?),.*". Two lines that have the same Group but different Relationships will never be placed into the same FlowFile.
Ignore Case If true, capitalization will not be taken into account when comparing values. E.g., matching against 'HELLO' or 'hello' will have the same result. This property is ignored if the 'Matching Strategy' is set to 'Satisfies Expression'.
Ignore Leading/Trailing Whitespace Indicates whether or not the whitespace at the beginning and end of the lines should be ignored when evaluating the line.
Matching Strategy Specifies how to evaluate each line of incoming text against the user-defined properties.
Routing Strategy Specifies how to determine which Relationship(s) to use when evaluating the lines of incoming text against the 'Matching Strategy' and user-defined properties.
Name Description
original The original input file will be routed to this destination when the lines have been successfully routed to 1 or more relationships
unmatched Data that does not satisfy the required user-defined rules will be routed to this Relationship
Name Description
RouteText.Route The name of the relationship to which the FlowFile was routed.
RouteText.Group The value captured by all capturing groups in the 'Grouping Regular Expression' property. If this property is not set or contains no capturing groups, this attribute will not be added.
Property Description
Databricks Client Databricks Client Service.
Job ID Databricks Job ID
Job Name Databricks Job Name
Wait for Job Completion Wait for the Databricks job to complete before transferring the FlowFile to success
Name Description
failure Databricks failure relationship
success Databricks success relationship
Name Description
job.run.id The run id assigned to the invoked job
job.result.state The result state for the invoked job
error.code The error code for the SQL statement if an error occurred.
error.message The error message for the SQL statement if an error occurred.
Property Description
Batch Size The number of elements returned from the server in one batch.
Mongo Collection Name The name of the collection to use
Mongo Database Name The name of the database to use
allow-disk-use Set this to true to enable writing data to temporary files to prevent exceeding the maximum memory use limit during aggregation pipeline staged when handling large datasets.
json-type By default, MongoDB's Java driver returns "extended JSON". Some of the features of this variant of JSON may cause problems for other JSON parsers that expect only standard JSON types and conventions. This configuration setting controls whether to use extended JSON or provide a clean view that conforms to standard JSON.
mongo-agg-query The aggregation query to be executed.
mongo-charset Specifies the character set of the document data.
mongo-client-service If configured, this property will use the assigned client service for connection pooling.
mongo-date-format The date format string to use for formatting Date fields that are returned from Mongo. It is only applied when the JSON output format is set to Standard JSON.
mongo-query-attribute If set, the query will be written to a specified attribute on the output flowfiles.
results-per-flowfile How many results to put into a flowfile at once. The whole body will be treated as a JSON array of results.
Name Description
failure The input flowfile gets sent to this relationship when the query fails.
original The input flowfile gets sent to this relationship when the query succeeds.
results The result set of the aggregation will be sent to this relationship.
Display Name API Name Default Value Allowable Values Description
AWS Credentials Provider service * AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Bucket * Bucket $\{s3.bucket\} The S3 Bucket to interact with
Object Key * Object Key $\{filename\} The S3 Object Key to use. This is analogous to a filename for traditional file systems.
Region * Region us-west-2 - AWS GovCloud (US) - AWS GovCloud (US-East) - US East (N. Virginia) - US East (Ohio) - US West (N. California) - US West (Oregon) - EU (Ireland) - EU (London) - EU (Paris) - EU (Frankfurt) - EU (Zurich) - EU (Stockholm) - EU (Milan) - EU (Spain) - Asia Pacific (Hong Kong) - Asia Pacific (Taipei) - Asia Pacific (Mumbai) - Asia Pacific (Hyderabad) - Asia Pacific (Singapore) - Asia Pacific (Sydney) - Asia Pacific (Jakarta) - Asia Pacific (Melbourne) - Asia Pacific (Malaysia) - Asia Pacific (Thailand) - Asia Pacific (Tokyo) - Asia Pacific (Seoul) - Asia Pacific (Osaka) - South America (Sao Paulo) - China (Beijing) - China (Ningxia) - Canada (Central) - Canada West (Calgary) - Middle East (UAE) - Middle East (Bahrain) - Africa (Cape Town) - US ISO East - US ISOB East (Ohio) - US ISO West - US ISOF East1 (California) - US ISOF South1 (Alpine) - Israel (Tel Aviv) - Mexico (Central) - EU ISOE West - Use 's3.region' Attribute The AWS Region to connect to.
Display Name API Name Default Value Allowable Values Description
OAuth2 Access Token Provider * OAuth2 Access Token Provider JWT Token Provider to use in order to retrieve an access token from Salesforce that will be exchanged for a Data Cloud API token.
Refresh Window * Refresh Window 0 s The service will attempt to refresh tokens expiring within the refresh window, subtracting the configured duration from the token expiration.
Salesforce Instance * Salesforce Instance The hostname of the Salesforce instance including the domain such as MyDomainName.my.salesforce.com
Web Client Service * Web Client Service The Web Client Service to use for communicating with Salesforce
Property Description
record-reader Specifies the Controller Service to use for parsing incoming data and determining the data's schema
record-writer Specifies the Controller Service to use for writing results to a FlowFile
sample-record-interval Specifies the number of records to skip before writing a record to the outgoing FlowFile. This property is only used if Sampling Strategy is set to Interval Sampling. A value of zero (0) will cause no records to be included in theoutgoing FlowFile, a value of one (1) will cause all records to be included, and a value of two (2) will cause half the records to be included, and so on.
sample-record-probability Specifies the probability (as a percent from 0-100) of a record being included in the outgoing FlowFile. This property is only used if Sampling Strategy is set to Probabilistic Sampling. A value of zero (0) will cause no records to be included in theoutgoing FlowFile, and a value of 100 will cause all records to be included in the outgoing FlowFile..
sample-record-random-seed Specifies a particular number to use as the seed for the random number generator (used by probabilistic strategies). Setting this property will ensure the same records are selected even when using probabilistic strategies.
sample-record-range Specifies the range of records to include in the sample, from 1 to the total number of records. An example is '3,6-8,20-' which includes the third record, the sixth, seventh and eighth records, and all records from the twentieth record on. Commas separate intervals that don't overlap, and an interval can be between two numbers (i.e. 6-8) or up to a given number (i.e. -5), or from a number to the number of the last record (i.e. 20-). If this property is unset, all records will be included.
sample-record-reservoir Specifies the number of records to write to the outgoing FlowFile. This property is only used if Sampling Strategy is set to reservoir-based strategies such as Reservoir Sampling.
sample-record-sampling-strategy Specifies which method to use for sampling records from the incoming FlowFile
Name Description
failure If a FlowFile fails processing for any reason (for example, any record is not valid), the original FlowFile will be routed to this relationship
original The original FlowFile is routed to this relationship if sampling is successful
success The FlowFile is routed to this relationship if the sampling completed successfully
Name Description
mime.type The MIME type indicated by the record writer
record.count The number of records in the resulting flow file
Privilege Object Notes
`CREATE ZEROCOPY CONNECTOR` Schema Required to create a Zerocopy Connector. By default, the schema owner has this privilege.
`OPERATE` Zerocopy Connector Required to connect or disconnect (`ALTER ... CONNECT` / `ALTER ... DISCONNECT`) and to publish a data product (`SYSTEM$SAP_PUBLISH_DATA_PRODUCT`).
`USAGE` Zerocopy Connector Required to create a catalog-linked database from the connector (also requires `CREATE DATABASE` on the account) and to add or remove a share from the connector (also requires `OWNERSHIP` on the share).
`MODIFY` Zerocopy Connector Required to set or unset properties (comment, share_back, etc.).
`MONITOR` Zerocopy Connector Any privilege on the connector (e.g. `MONITOR`) is sufficient to describe the connector, show connectors, or list shares.
`OWNERSHIP` Zerocopy Connector Required to rename or drop the connector.
`CREATE DATABASE` Account Required to create a catalog-linked database from a Zerocopy Connector (also requires `USAGE` on the connector).
State Description
`NEW` Initial state after the connector is created. No connection has been attempted yet.
`CONNECTING` A connection attempt is in progress. The connector enters this state immediately after `ALTER ... CONNECT` is issued.
`CONNECTED` The connection is established. Catalog-linked databases can only be created when the connector is in this state. Sharing data between Snowflake and SAP® BDC is only allowed when the connector is in this state.
`CONNECT_ERROR` The connection attempt failed. The error message is persisted on the connector. You can retry the connection from this state.
`DISCONNECTING` A disconnection is in progress. The connector enters this state immediately after `ALTER ... DISCONNECT` is issued.
`DISCONNECTED` The connection has been dropped. You can reconnect from this state.
`DISCONNECT_ERROR` The disconnection attempt failed. The error message is persisted on the connector.
`DELETED` The connector has been dropped. This state is permanent — Zerocopy Connectors do not support `UNDROP`.
Property Description
Attribute Pattern Regular Expression that specifies the names of attributes whose values will be matched against the terms in the dictionary
Dictionary File A new-line-delimited text file that includes the terms that should trigger a match. Empty lines are ignored. The contents of the text file are loaded into memory when the processor is scheduled and reloaded when the contents are modified.
Dictionary Filter Pattern A Regular Expression that will be applied to each line in the dictionary file. If the regular expression does not match the line, the line will not be included in the list of terms to search for. If a Matching Group is specified, only the portion of the term that matches that Matching Group will be used instead of the entire term. If not specified, all terms in the dictionary will be used and each term will consist of the text of the entire line in the file
Match Criteria If set to All Must Match, then FlowFiles will be routed to 'matched' only if all specified attributes 'values are found in the dictionary. If set to At Least 1 Must Match, FlowFiles will be routed to' matched' if any attribute specified is found in the dictionary
Name Description
matched FlowFiles whose attributes are found in the dictionary will be routed to this relationship
unmatched FlowFiles whose attributes are not found in the dictionary will be routed to this relationship
Property Description
Dictionary Encoding Indicates how the dictionary is encoded. If 'text', dictionary terms are new-line delimited and UTF-8 encoded; if 'binary', dictionary terms are denoted by a 4-byte integer indicating the term length followed by the term itself
Dictionary File The filename of the terms dictionary
Name Description
matched FlowFiles that match at least one term in the dictionary are routed to this relationship
unmatched FlowFiles that do not match any term in the dictionary are routed to this relationship
Name Description
matching.term The term that caused the Processor to route the FlowFile to the 'matched' relationship; if FlowFile is routed to the 'unmatched' relationship, this attribute is not added
Property Description
Module Directory Comma-separated list of paths to files and/or directories which contain modules required by the script.
Record Reader The Record Reader to use parsing the incoming FlowFile into Records
Record Writer The Record Writer to use for serializing Records after they have been transformed
Script Body Body of script to execute. Only one of Script File or Script Body may be used
Script Engine The Language to use for the script
Script File Path to script file to execute. Only one of Script File or Script Body may be used
Required Permission Explanation
execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
Name Description
failure In case of any issue during processing the incoming FlowFile, the incoming FlowFile will be routed to this relationship.
original After successful procession, the incoming FlowFile will be transferred to this relationship. This happens regardless the number of filtered or remaining records.
success Matching records of the original FlowFile will be routed to this relationship. If there are no matching records, no FlowFile will be routed here.
Name Description
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer
record.count The number of records within the flow file.
record.error.message This attribute provides on failure the error message encountered by the Reader or Writer.
Display Name API Name Default Value Allowable Values Description
Module Directory Module Directory Comma-separated list of paths to files and/or directories which contain modules required by the script.
Script Body Script Body Body of script to execute. Only one of Script File or Script Body may be used
Script Engine * Script Engine Groovy - Groovy Language Engine for executing scripts
Script File Script File Path to script file to execute. Only one of Script File or Script Body may be used
Required Permission Explanation
execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
Property Description
Module Directory Comma-separated list of paths to files and/or directories which contain modules required by the script.
Record Reader The Record Reader to use parsing the incoming FlowFile into Records
Record Writer The Record Writer to use for serializing Records after they have been transformed
Script Body Body of script to execute. Only one of Script File or Script Body may be used
Script Engine The Language to use for the script
Script File Path to script file to execute. Only one of Script File or Script Body may be used
Required Permission Explanation
execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
Name Description
failure If a FlowFile cannot be partitioned from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship
original Once all records in an incoming FlowFile have been partitioned, the original FlowFile is routed to this relationship.
success FlowFiles that are successfully partitioned will be routed to this relationship
Name Description
partition The partition of the outgoing flow file. If the script indicates that the partition has a null value, the attribute will be set to the literal string "<null partition>" (without quotes). Otherwise, the attribute is set to the String representation of whatever value is returned by the script.
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer
record.count The number of records within the flow file.
record.error.message This attribute provides on failure the error message encountered by the Reader or Writer.
fragment.index A one-up number that indicates the ordering of the partitioned FlowFiles that were created from a single parent FlowFile
fragment.count The number of partitioned FlowFiles generated from the parent FlowFile
Display Name API Name Default Value Allowable Values Description
Module Directory Module Directory Comma-separated list of paths to files and/or directories which contain modules required by the script.
Script Body Script Body Body of script to execute. Only one of Script File or Script Body may be used
Script Engine * Script Engine Groovy - Groovy Language Engine for executing scripts
Script File Script File Path to script file to execute. Only one of Script File or Script Body may be used
Required Permission Explanation
execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
Display Name API Name Default Value Allowable Values Description
Module Directory Module Directory Comma-separated list of paths to files and/or directories which contain modules required by the script.
Script Body Script Body Body of script to execute. Only one of Script File or Script Body may be used
Script Engine * Script Engine Groovy - Groovy Language Engine for executing scripts
Script File Script File Path to script file to execute. Only one of Script File or Script Body may be used
Required Permission Explanation
execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
Display Name API Name Default Value Allowable Values Description
Module Directory Module Directory Comma-separated list of paths to files and/or directories which contain modules required by the script.
Script Body Script Body Body of script to execute. Only one of Script File or Script Body may be used
Script Engine * Script Engine Groovy - Groovy Language Engine for executing scripts
Script File Script File Path to script file to execute. Only one of Script File or Script Body may be used
Required Permission Explanation
execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
Property Description
Module Directory Comma-separated list of paths to files and/or directories which contain modules required by the script.
Record Reader The Record Reader to use parsing the incoming FlowFile into Records
Record Writer The Record Writer to use for serializing Records after they have been transformed
Script Body Body of script to execute. Only one of Script File or Script Body may be used
Script Engine The Language to use for the script
Script File Path to script file to execute. Only one of Script File or Script Body may be used
Required Permission Explanation
execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
Name Description
failure Any FlowFile that cannot be transformed will be routed to this Relationship
success Each FlowFile that were successfully transformed will be routed to this Relationship
Name Description
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer
record.count The number of records in the FlowFile
record.error.message This attribute provides on failure the error message encountered by the Reader or Writer.
Property Description
Module Directory Comma-separated list of paths to files and/or directories which contain modules required by the script.
Record Reader The Record Reader to use parsing the incoming FlowFile into Records
Record Writer The Record Writer to use for serializing Records after they have been transformed
Script Body Body of script to execute. Only one of Script File or Script Body may be used
Script Engine The Language to use for the script
Script File Path to script file to execute. Only one of Script File or Script Body may be used
Required Permission Explanation
execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
Name Description
failure In case of any issue during processing the incoming flow file, the incoming FlowFile will be routed to this relationship.
invalid FlowFile containing the invalid records from the incoming FlowFile will be routed to this relationship. If there are no invalid records, no FlowFile will be routed to this Relationship.
original After successful procession, the incoming FlowFile will be transferred to this relationship. This happens regardless the FlowFiles might routed to "valid" and "invalid" relationships.
valid FlowFile containing the valid records from the incoming FlowFile will be routed to this relationship. If there are no valid records, no FlowFile will be routed to this Relationship.
Name Description
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer
record.count The number of records within the flow file.
record.error.message This attribute provides on failure the error message encountered by the Reader or Writer.
Property Description
Aggregation Results Format Format of Aggregation output.
Aggregation Results Split Output a flowfile containing all aggregations or one flowfile for each individual aggregation.
Aggregations One or more query aggregations (or "aggs"), in JSON syntax. Ex: \{"items": \{"terms": \{"field": "product", "size": 10\}\}\}
Client Service An Elasticsearch client service to use for running queries.
Fields Fields of indexed documents to be retrieved, in JSON syntax. Ex: ["user.id", "http.response.*", \{"field": "@timestamp", "format": "epoch_millis"\}]
Index The name of the index to use.
Max JSON Field String Length The maximum allowed length of a string value when parsing a JSON document or attribute.
Output No Hits Output a "hits" flowfile even if no hits found for query. If true, an empty "hits" flowfile will be output even if "aggregations" are output.
Pagination Keep Alive Pagination "keep_alive" period. Period Elasticsearch will keep the scroll/pit cursor alive in between requests (this is not the time expected for all pages to be returned, but the maximum allowed time for requests between page retrievals).
Pagination Type Pagination method to use. Not all types are available for all Elasticsearch versions, check the Elasticsearch docs to confirm which are applicable and recommended for your service.
Query A query in JSON syntax, not Lucene syntax. Ex: \{"query":\{"match":\{"somefield":"somevalue"\}\}\}. If the query is empty, a default JSON Object will be used, which will result in a "match_all" query in Elasticsearch.
Query Attribute If set, the executed query will be set on each result flowfile in the specified attribute.
Query Clause A "query" clause in JSON syntax, not Lucene syntax. Ex: \{"match":\{"somefield":"somevalue"\}\}. If the query is empty, a default JSON Object will be used, which will result in a "match_all" query in Elasticsearch.
Query Definition Style How the JSON Query will be defined for use by the processor.
Restart On Finish Whether the processor should start another search with the same query once a paginated search has completed.
Script Fields Fields to created using script evaluation at query runtime, in JSON syntax. Ex: \{"test1": \{"script": \{"lang": "painless", "source": "doc[ 'price'].value * 2"\}\}, "test2": \{"script": \{"lang": "painless", "source": "doc[ 'price'].value * params.factor", "params": \{"factor": 2.0\}\}\}\}
Search Results Format Format of Hits output.
Search Results Split Output a flowfile containing all hits or one flowfile for each individual hit or one flowfile containing all hits from all paged responses.
Size The maximum number of documents to retrieve in the query. If the query is paginated, this "size" applies to each page of the query, not the "size" of the entire result set.
Sort Sort results by one or more fields, in JSON syntax. Ex: [\{"price" : \{"order" : "asc", "mode" : "avg"\}\}, \{"post_date" : \{"format": "strict_date_optional_time_nanos"\}\}]
Type The type of this document (used by Elasticsearch for indexing and searching).
Scopes Description
LOCAL The pagination state (scrollId, searchAfter, pitId, hitCount, pageCount, pageExpirationTimestamp) is retained in between invocations of this processor until the Scroll/PiT has expired (when the current time is later than the last query execution plus the Pagination Keep Alive interval).
Name Description
aggregations Aggregations are routed to this relationship.
failure All flowfiles that fail for reasons unrelated to server availability go to this relationship.
hits Search hits are routed to this relationship.
retry All flowfiles that fail due to server/cluster availability go to this relationship.
Name Description
mime.type application/json
aggregation.name The name of the aggregation whose results are in the output flowfile
aggregation.number The number of the aggregation whose results are in the output flowfile
page.number The number of the page (request), starting from 1, in which the results were returned that are in the output flowfile
hit.count The number of hits that are in the output flowfile
elasticsearch.query.error The error message provided by Elasticsearch if there is an error querying the index.
Property Description
Segment Size The maximum data size in bytes for each segment
Name Description
original The original FlowFile will be sent to this relationship
segments All segments will be sent to this relationship. If the file was small enough that it was not segmented, a copy of the original is sent to this relationship as well as original
Name Description
fragment.identifier All segments produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.index A one-up number that indicates the ordering of the segments that were created from a single parent FlowFile
fragment.count The number of segments generated from the parent FlowFile
segment.original.filename The filename of the parent FlowFile
segment.original.filename The filename will be updated to include the parent's filename, the segment index, and the segment count
Order Task Description Persona
1 [Setup core Snowflake](/user-guide/data-integration/openflow/setup-openflow-spcs-sf) Before creating a deployment, you must configure core Snowflake which include an Openflow admin role, required privileges, and network configuration. Snowflake administrator
2 Optionally [Set up PrivateLink UI access](/user-guide/data-integration/openflow/setup-openflow-spcs-configure-pr-ui) Configure PrivateLink to access the Snowflake Openflow Runtime UI using private connectivity. Snowflake administrator
3 [Create deployment](/user-guide/data-integration/openflow/setup-openflow-spcs-deployment) After configuring core Snowflake, you then create an Openflow deployment. Optionally, configure a Openflow-specific event table to store Openflow logs and metrics. Deployment engineer, Snowflake administrator for event table configuration
4 [Create Snowflake role](/user-guide/data-integration/openflow/setup-openflow-spcs-create-rr) After creating an %ofsfspcs%, you must create a Snowflake role and associated external access integrations. Data engineer
5 [Create runtime](/user-guide/data-integration/openflow/setup-openflow-spcs-create-runtime) Create a runtime associated with the previously created Snowflake role. Data engineer
6 [Configure allowed domains for Openflow connectors](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) Configure access to external domains for Openflow connectors. Data engineer
7 [Connect your data sources using Openflow connectors](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) Configure one or more connectors in the %ofsfspcs%. Data engineer
Field Description
**Runtime Name** Enter a name for your runtime.
**Deployment** drop down Choose the deployment previously created in [](/user-guide/data-integration/openflow/setup-openflow-spcs-deployment)
**Node Type** Choose a node type from the **Node type** drop-down list. This specifies the size of your nodes.
**Min/Max node** In the **Min/Max node** range selector, select a range. The minimum value specifies the number of nodes that the runtime starts with when idle and the maximum value specifies the number of nodes that the runtime can scale up to, in the event of high data volume or CPU load.
**Snowflake Role** Choose the Snowflake role previously created in [](/user-guide/data-integration/openflow/setup-openflow-spcs-create-rr).
**Usage Roles** Optionally, select the roles created to grant usage to the runtime for required databases, schema, and table access.
**External Access Integrations** Optionally, select the previously created external access integrations to grant access to external resources.
Service Actions Resources (ARNs) Purpose
Amazon Kinesis Data Streams `kinesis:DescribeStream`, `kinesis:DescribeStreamConsumer`, `kinesis:GetRecords`, `kinesis:GetShardIterator`, `kinesis:ListShards`, `kinesis:RegisterStreamConsumer` `arn:aws:kinesis:${REGION}:${ACCOUNT_ID}:stream/${STREAM_NAME}` Discovers shards, reads records through shared-throughput polling, resolves the stream ARN, registers an Enhanced Fan-Out consumer, and polls consumer status during registration.
Amazon Kinesis Data Streams `kinesis:DeregisterStreamConsumer`, `kinesis:DescribeStreamConsumer`, `kinesis:SubscribeToShard` `arn:aws:kinesis:${REGION}:${ACCOUNT_ID}:stream/${STREAM_NAME}/consumer/*` Describes, subscribes to, and deregisters Enhanced Fan-Out consumers by consumer ARN.
Amazon DynamoDB `dynamodb:CreateTable`, `dynamodb:DeleteTable`, `dynamodb:DescribeTable`, `dynamodb:GetItem`, `dynamodb:PutItem`, `dynamodb:Query`, `dynamodb:Scan`, `dynamodb:UpdateItem` `arn:aws:dynamodb:${REGION}:${ACCOUNT_ID}:table/${APPLICATION_NAME}`, `arn:aws:dynamodb:${REGION}:${ACCOUNT_ID}:table/${APPLICATION_NAME}_migration` Creates and manages the checkpoint/lease table (shard leases, node heartbeats, checkpoints) and a temporary migration table used during one-time migration from legacy checkpoint tables.
Placeholder Description
`${REGION}` Your AWS region (for example, `us-east-1`)
`${ACCOUNT_ID}` Your AWS account ID (for example, `123456789012`)
`${STREAM_NAME}` The value of the **AWS Kinesis Stream Name** connector parameter
`${APPLICATION_NAME}` The value of the **AWS Kinesis Application Name** connector parameter. Used as the DynamoDB checkpoint table name and as the Enhanced Fan-Out registered consumer name.
Object Privilege Notes
Database USAGE
Schema USAGE
Table OWNERSHIP Required for the connector to ingest data into a table.
Service Purpose PrivateLink support
Amazon Kinesis Data Streams Reads stream records. Supported by this connector.
Amazon DynamoDB Stores checkpoint metadata for processed records. Not supported. Use the public endpoint.
Parameter Description Required
AWS Access Key ID The AWS Access Key ID to connect to your Kinesis Stream and DynamoDB. Yes
AWS Kinesis Region The AWS Region to connect to. Use regular AWS region format, for example: `us-west-2`, `ap-southeast-1`, `eu-west-1`. See the [AWS Regions](https://docs.aws.amazon.com/general/latest/gr/rande.html#kinesis_region) page. Yes
AWS Secret Access Key The AWS Secret Access Key to connect to your Kinesis Stream and DynamoDB. Yes
AWS Kinesis Application Name The name that is used as the DynamoDB table name for tracking the application's progress on Kinesis Stream consumption. Yes
AWS Kinesis Consumer Type The strategy used to read records from a Kinesis Stream. Must be one of the following values: **SHARED_THROUGHPUT**, **ENHANCED_FAN_OUT**. For more information, see [Differences between shared throughput consumer and enhanced fan-out consumer](https://docs.aws.amazon.com/streams/latest/dev/enhanced-consumers.html). Yes
AWS Kinesis Initial Stream Position The initial stream position from which the data starts replication. This takes effect only during the initial start for a given AWS Kinesis Application Name. Possible values are: **LATEST**: Latest stored record, **TRIM_HORIZON**: Earliest stored record. Yes
AWS Kinesis Stream Name The AWS Kinesis Stream Name to consume data from. Yes
Snowflake Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Snowflake Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples: `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME`. `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively. Yes
Snowflake Destination Table The table where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Field Name Field Type Example Value Description
stream String `stream-name` The name of the Kinesis stream the record came from.
shardId String `shardId-000000000001` The identifier of the shard in the stream the record came from.
approximateArrival String `2025-11-05T09:12:15.300` The approximate time that the record was inserted into the stream (ISO 8601 format).
partitionKey String `key-1234` The partition key specified by the data producer for the record.
sequenceNumber String `123456789` The unique sequence number assigned by Kinesis Data Streams to the record in the shard.
subSequenceNumber Number `2` The subsequence number for the record (used for aggregated records with the same sequence number).
shardedSequenceNumber String `12345678900002` A combination of the sequence number and the subsequence number for the record.
Column Description
`name` Name of the Zerocopy Connector.
`partner` The data partner (e.g., `SAP_BDC`).
`config` The configuration of the data partner. For SAP® BDC, this contains the SAP® BDC Connector Endpoint.
`status` Current connector state. See [](#connector-states).
`connection_error` Error message if the connector is in `CONNECT_ERROR` or `DISCONNECT_ERROR` state; otherwise empty.
`catalog_linked_databases` Mounted catalog-linked databases that are visible to the current role.
`share_back` Whether sharing data from Snowflake to SAP® BDC is enabled for this connector.
`shares` Snowflake data shares that are associated with this connector.
`database_name` Database in which the connector resides.
`schema_name` Schema in which the connector resides.
`owner` Role that owns the connector.
`owner_role_type` Type of the owner role.
`comment` Optional comment set on the connector.
`created_on` Timestamp when the connector was created.
`updated_on` Timestamp when the connector was last updated.
Order Task Description Persona
1 Review [](#label-oracle-of-connector-prerequisites) Review and confirm all required prerequisites. **Snowflake account administrator**
2 [Enable the connector](#label-oracle-enable-service) Accept the Oracle XStream terms to make the connector visible in the list of available connectors. **Organization administrator (ORGADMIN)**
3 [Configure the Oracle database](/user-guide/data-integration/openflow/connectors/oracle/setup-oracledb) Configure the Oracle database for %oracleofc% including replication settings and credentials. **Oracle database administrator**
4 [Set up Snowflake](/user-guide/data-integration/openflow/connectors/oracle/setup-snowflake) Create the destination database, service user, role, warehouse, and key pair authentication for the %oracleofc%. **Snowflake account administrator**
5 [Configure the connector](/user-guide/data-integration/openflow/connectors/oracle/setup-connector) Install, configure, and run the %oracleofc% connector. **Snowflake account administrator**
6 [Set up licensing](#label-oracle-license-setup) Configure your licensing model after the connector detects your source database inventory. **Organization administrator (ORGADMIN)**
Table (Enabled Tables value) Additional Jira API scope Notes
`SPRINT` (populates `SPRINT` and `BOARD_SPRINT`) `read:sprint:jira-software` No additional permission required.
`BOARD_PROJECT` None. No additional permission required.
`BOARD_ISSUE` `read:jira-work` Issues that fail per-issue permission checks (for example, issue-level security) are skipped silently.
Parameter Description
Jira Email Email address for the Atlassian account used for authentication.
Jira API Token API access token for your Atlassian Jira account. See [Required API scopes](#label-jira-agile-api-scopes) for the scopes to configure.
Environment URL URL to the Atlassian Jira environment. For example, `https://your-domain.atlassian.net`.
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Enabled Tables Comma-separated list of optional tables to populate. Ingestion of `BOARD` is always enabled and can't be disabled. Available values: - `BOARD_ISSUE` (issues associated with the board) - `BOARD_PROJECT` (projects associated with the boards) - `SPRINT` (sprints and board-sprint associations, populates both `SPRINT` and `BOARD_SPRINT`) Default value: `BOARD_ISSUE, BOARD_PROJECT, SPRINT`.
Merge Interval Time interval between journal-to-destination merge operations. When a merge runs, the Snowflake warehouse resumes. The merge is skipped if no new data has been loaded since the previous merge. Default value: `1 min`.
Table (Enabled Tables value) Additional Jira API scope Additional Jira permission
`ISSUE_VOTE` None. **View voters and watchers** on the relevant projects.
`ISSUE_WATCHER` None. **View voters and watchers** on the relevant projects.
`WORKLOG` None. **View worklogs** on the relevant projects.
`ISSUE_SECURITY_SCHEME` `manage:jira-configuration` **Administer Jira** (global).
`DELETED_ISSUE` (`Deletes Fetch Strategy = AUDIT`) `manage:jira-configuration` **Administer Jira** (global).
Parameter Description
Jira Email Email address for the Atlassian account used for authentication.
Jira API Token API access token for your Atlassian Jira account. See [Required API scopes](#label-jira-core-api-scopes) for the scopes to configure.
Environment URL URL to the Atlassian Jira environment. For example, `https://your-domain.atlassian.net`.
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Enabled Tables Comma-separated list of optional tables to populate. See [](#label-jira-core-enabled-tables) for the full list of values and guidance on which tables to enable. Default value: `CHANGELOG, COMMENT, WORKLOG`.
Issue Fields A list of fields to return for each issue, used to retrieve a subset of fields. See [](#label-jira-core-issue-fields) for available values and custom field handling. Default value: `*standard`.
Project Keys Filter Optional comma-separated list of Jira project keys to limit ingestion to specific projects. If empty, all projects accessible by the API token owner are fetched. For example, `PROJ1, PROJ2`.
Deletes Fetch Strategy Strategy for fetching deleted issues. Set to `NONE` to skip delete tracking, or `AUDIT` to fetch deleted issues from the Jira audit log endpoint. The `AUDIT` strategy requires the API token owner to have the **Administer Jira** global permission and the `manage:jira-configuration` scope. Default value: `NONE`.
Merge Interval Time interval between journal-to-destination merge operations. When a merge runs, the Snowflake warehouse resumes. The merge is skipped if no new data has been loaded since the previous merge. Default value: `1 min`.
Jira field type Snowflake column type
`number` NUMBER
`array` ARRAY
`progress`, `votes`, `watches`, `timetracking` VARIANT
All other types VARCHAR
Parameter Description
Client ID Client ID of the Amazon Advertising account
Client Secret Client secret of the Amazon Advertising account
OAuth Base URL The URL of the authorization server that issues the access token
Possible values:
- [https://api.amazon.com/auth/o2/token](https://api.amazon.com/auth/o2/token) - [https://api.amazon.co.uk/auth/o2/token](https://api.amazon.co.uk/auth/o2/token) - [https://api.amazon.co.jp/auth/o2/token](https://api.amazon.co.jp/auth/o2/token)
Refresh Token Refresh Token for Amazon Ads API
Region Environment from which the advertising data is downloaded
Possible values:
- NA - EU - FE
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Report Name Name of the report to be used as a destination table name. The name must be unique within the destination schema.
Report Ad Product Type of advertising product being reported
Possible values:
- SPONSORED_PRODUCTS - SPONSORED_BRANDS - SPONSORED_DISPLAY - SPONSORED_TELEVISION - DEMAND_SIDE_PLATFORM
Report Columns Set of columns which will be present in the end report. The list of available columns depends on the report type and can be found in the [Amazon Ads API documentation](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/overview). For example, for the `spCampaigns` report type, the list of available columns can be found in the [Sponsored Products documentation](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/campaign#sponsored-products).
Report Filters Set of filters used to trim the data returned. The list of available filters depends on the report type and can be found in the [Amazon Ads API documentation](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/overview). For example, for the `spCampaigns` report type, the list of available filters can be found in the [Sponsored Products documentation](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/campaign#sponsored-products). Filters must be in the format of `columnName=filterValue` and values must separated by a comma (`,`). For example, `campaignStatus=ENABLED,PAUSED`.
Report Group By Determines the level of granularity and how the data within the report will be aggregated and presented. The list of available group by columns depends on the report type and can be found in the [Amazon Ads API documentation](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/overview). For example, for the `spCampaigns` report type, the list of available group by columns can be found in the [Sponsored Products documentation](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/campaign#sponsored-products).
Report Ingestion Strategy Mode in which data is fetched, either snapshot or incremental
Possible values:
- `SNAPSHOT` - `INCREMENTAL`
Report Ingestion Window Specifies the number of days, data from which should be downloaded during incremental ingestion. For example, with a 30-day report ingestion window, an incremental load starts ingestion from 30 days prior to the last successful ingestion date, unless this calculated date falls before the overall start date, in which case ingestion begins from the overall start date. If the `SNAPSHOT` ingestion strategy is used, all available data from the start date to the present is downloaded, so there is no need to use a report ingestion window.
Report Profile ID The [profile ID](https://advertising.amazon.com/API/docs/en-us/guides/get-started/retrieve-profiles) associated with an advertising account in a specific marketplace
Report Time Unit Date aggregation
Possible values:
- `DAILY`: Each day is represented by a one row - `SUMMARY`: The whole ingested date period is represented as one row
Report Type The Amazon Ads API supports a number of [report types](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/overview). For example: [sbAds](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/ad) and [spCampaigns](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/campaign). Copy value of `reportTypeId` from the documentation and paste it into the parameter value.
Report Start Date Start date from which the ingestion should happen. The date format is YYYY-MM-DD.
Report Schedule Schedule time for processor creating reports. For example: `8 h` or `1 d`. The `h` represents hours and `d` days.
Parameter Description
Box App Config JSON An application JSON configuration that was downloaded during the app creation.
Box App Config File An application json file that was downloaded during the app creation. Either "Box App Config File" or "Box App Config JSON" has to be set. Select the **Reference asset** checkbox to upload the config file.
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Box Folder ID The ID of the folder to read the files from. Set this to `0` to synchronize all folders the Box app has access to. It can be retrieved from the URL, for example [https://app.box.com/folder/FOLDER_ID](https://app.box.com/folder/FOLDER_ID).
File Extensions To Ingest A comma-separated list that specifies file extensions to ingest. The connector tries to convert the files to PDF format first, if possible. Nonetheless, the extension check is performed on the original file extension. If some of the specified file extensions are not supported by Cortex Parse Document, then the connector ignores those files, logs a warning message in an event log, and continues processing other files.
Snowflake File Hash Table Name Name of the table to store file hashes to determine if the content has changed. This parameter should generally not be changed.
Parameter Description
Box App Config JSON An application JSON configuration that was downloaded during the app creation.
Box App Config File An application json file that was downloaded during the app creation. Either "Box App Config File" or "Box App Config JSON" has to be set. Select the **Reference asset** checkbox to upload the config file.
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Box Folder ID The ID of the folder to read the files from. Set this to `0` to synchronize all folders the Box app has access to. It can be retrieved from the URL, for example [https://app.box.com/folder/FOLDER_ID](https://app.box.com/folder/FOLDER_ID).
File Extensions To Ingest A comma-separated list that specifies file extensions to ingest. The connector tries to convert the files to PDF format first, if possible. Nonetheless, the extension check is performed on the original file extension. If some of the specified file extensions are not supported by Cortex Parse Document, then the connector ignores those files, logs a warning message in an event log, and continues processing other files.
Snowflake File Hash Table Name Name of the table to store file hashes to determine if the content has changed. This parameter should generally not be changed.
OCR Mode The OCR mode to use when parsing files with [](/user-guide/snowflake-cortex/parse-document) function. The value can be `OCR` or `LAYOUT`.
Snowflake Cortex Search Service User Role An identifier of a role that is assigned usage permissions on the Cortex Search service.
Snowflake File Hash Table Name Name of the table to store file hashes to determine if the content has changed. This parameter should generally not be changed.
Column name Type Description
`full_name` String A full path to the file from the Box site documents root. Example: `folder_1/folder_2/file_name.pdf`.
`web_url` String A URL that displays an original Box file in a browser.
`last_modified_date_time` String Date and time when the item was most recently modified.
`chunk` String A piece of text from the document that matched the Cortex Search query.
`user_ids` Array An array of user IDs that have access to the document.
`user_emails` Array An array of user email IDs that have access to the document. It also includes user email IDs from all the Microsoft 365 groups that are assigned to the document.
Parameter Description
Box App Config JSON An application JSON configuration that was downloaded during the app creation.
Box App Config File An application json file that was downloaded during the app creation. Either "Box App Config File" or "Box App Config JSON" has to be set. Select the **Reference asset** checkbox to upload the config file.
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Box Folder ID The ID of the folder to read the files from. Set this to `0` to synchronize all folders the Box app has access to. The ID can be retrieved from the URL, for example [https://app.box.com/folder/FOLDER_ID](https://app.box.com/folder/FOLDER_ID).
Box File Identifier Column The column of the metadata table that will store the Box file ID to associate the given metadata with a file. This column must be of type VARCHAR and be part of the table created in [Create a Snowflake table for storing the Box metadata](#create-a-snowflake-table-for-storing-the-box-metadata).
Destination Metadata Table The Snowflake table you created in [Create a Snowflake table for storing the Box metadata](#create-a-snowflake-table-for-storing-the-box-metadata), which has the columns of the metadata you want to collect.
Parameter Description
Source Database Snowflake Database that contains the schema that contains the Snowflake Stream that ingests the changes
Source Schema Schema that contains the Snowflake Stream that ingests the changes
Snowflake Account Identifier Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide your Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted.
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.
Snowflake Private Key Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide the RSA private key used for authentication. The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either Snowflake Private Key File or Snowflake Private Key must be defined.
Snowflake Private Key File Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, upload the file that contains the RSA Private Key used for authentication to Snowflake, formatted according to PKCS8 standards and having standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. Select the **Reference asset** checkbox to upload the private key file.
Snowflake Private Key Password Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide the password associated with the Snowflake Private Key File.
Snowflake Role When using Session Token for your Authentication Strategy, use your Snowflake Role. You can find your Snowflake Role in the Openflow UI, by going to View Details for your Runtime. When using Key Pair for your Authentication Strategy, use a valid role configured for your service user.
Snowflake Username Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide the user name used to connect to Snowflake instance.
Snowflake Warehouse Snowflake warehouse used to run queries
Snowflake Stream Name Snowflake stream name used for ingestion of changes from the source Snowflake table. You must create it before starting the connector and link to the table.
Parameter Description
Box App Config JSON An application JSON configuration that was downloaded during the app creation.
Box App Config File An application json file that was downloaded during the app creation. Either "Box App Config File" or "Box App Config JSON" has to be set. Select the **Reference asset** checkbox to upload the config file.
Parameter Description
Box File Identifier Column The column of the metadata table that will store the Box file ID to associate the given metadata with a file. This column must be of type VARCHAR and be part of the table created in [Create a Snowflake table for storing the Box metadata](#create-a-snowflake-table-for-storing-the-box-metadata).
Box Metadata Template Name Template name of the Box metadata template that will be added to the Box files. You don't need to manually create a template before starting the connector. If you enter a value in this parameter, a template is automatically created with this template name. The name provided should not overlap with any template that you have already created in your Box environment.
Box Metadata Template Key The Box template key of the Box metadata template that will be added to the Box files. This is the key that will be used to reference the template in the Box API. You don't need to manually create a template before starting the connector. If you enter a value in this parameter, a template is automatically created with this template key. The key provided should not overlap with any template that you have already created in your Box environment.
Parameter Description Required
Client Account ID ID of the account in the Google Ads for which given report should be ingested true
Login Customer ID Customer ID of the Google Ads manager account (MCC) for which the report should be ingested false
Google Ads Resource Name Name of the resource in Google Ads that is a source for the report true
Report Attributes Attributes of the selected resource true
Report Metrics Metrics collected in the context of a given resource false
Report Segments Buckets in which metrics should be grouped false
Report Start Date Start date from which the ingestion should happen. The date format is YYYY-MM-DD. false
Schedule Get Google Ads Report processor schedule true
Parameter Description Required
Google Developer Token Developer token required to query Google Ads API true
Google Service Account JSON Service Account JSON required for Google Ads authentication true
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Google Delegation User The user that is used by the service account
GCP Service Account JSON The service account JSON downloaded from Google Cloud Console to allow access to Google APIs in the connector
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Google Drive ID The Google Shared Drive to watch for content and updates
Google Folder Name Optionally, the Google Drive folder identifier (human readable folder name) can be set to filter incoming files by. If all file types are desired then select "Set Empty String". When set, only files that are in the provided folder or subfolder will be retrieved. When blank or unset, no folder filtering is applied and all files under the drive are retrieved.
Google Domain The Google Workspace Domain that the Google Groups and Drive resides in.
File Extensions To Ingest A comma-separated list that specifies file extensions to ingest. The connector tries to convert the files to PDF format first, if possible. Nonetheless, the extension check is performed on the original file extension. If some of the specified file extensions are not supported by Cortex Parse Document, then the connector ignores those files, logs a warning message in an event log, and continues processing other files.
Snowflake File Hash Table Name Internal table used to store file content hashes to prevent updates to content when it has not changed.
Parameter Description
Google Delegation User The user that is used by the service account
GCP Service Account JSON The service account JSON downloaded from Google Cloud Console to allow access to Google APIs in the connector
Parameter Description
Destination Database The database where data will be persisted. It must already exist in Snowflake
Destination Schema The schema where data will be persisted. It must already exist in Snowflake
Snowflake Account Identifier Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide your Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted.
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.
Snowflake Private Key Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide the RSA private key used for authentication. The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either Snowflake Private Key File or Snowflake Private Key must be defined.
Snowflake Private Key File Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, upload the file that contains the RSA Private Key used for authentication to Snowflake, formatted according to PKCS8 standards and having standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. Select the **Reference asset** checkbox to upload the private key file.
Snowflake Private Key Password Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide the password associated with the Snowflake Private Key File.
Snowflake Role When using Session Token for your Authentication Strategy, use your Snowflake Role. You can find your Snowflake Role in the Openflow UI, by going to View Details for your Runtime. When using Key Pair for your Authentication Strategy, use a valid role configured for your service user.
Snowflake Username Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide the user name used to connect to Snowflake instance.
Snowflake Warehouse Snowflake warehouse used to run queries
Parameter Description
Google Drive ID The Google Shared Drive to watch for content and updates
Google Folder Name
Optionally, the Google Drive folder identifier (human readable folder name) can be set to filter incoming files by. If all file types are desired then select "Set Empty String".
When set, only files that are in the provided folder or subfolder will be retrieved. When blank or unset, no folder filtering is applied and all files under the drive are retrieved.
Google Domain The Google Workspace Domain that the Google Groups and Drive resides in.
OCR Mode The OCR mode to use when parsing files with [](/user-guide/snowflake-cortex/parse-document) function. The value can be `OCR` or `LAYOUT`.
File Extensions To Ingest A comma-separated list that specifies file extensions to ingest. The connector tries to convert the files to PDF format first, if possible. Nonetheless, the extension check is performed on the original file extension. If some of the specified file extensions are not supported by Cortex Parse Document, then the connector ignores those files, logs a warning message in an event log, and continues processing other files.
Snowflake File Hash Table Name Internal table used to store file content hashes to prevent updates to content when it has not changed.
Snowflake Cortex Search Service User Role An identifier of a role that is assigned usage permissions on the Cortex Search service.
Column name Type Description
`full_name` String A full path to the file from the Google Drive documents root. Example: `folder_1/folder_2/file_name.pdf`.
`web_url` String A URL that displays an original Google Drive file in a browser.
`last_modified_date_time` String Date and time when the item was most recently modified.
`chunk` String A piece of text from the document that matched the Cortex Search query.
`user_ids` Array An array of Microsoft 365 user IDs that have access to the document. It also includes user IDs from all the Microsoft 365 groups that are assigned to the document. To find a specific user ID, see [Get a user](https://learn.microsoft.com/en-us/graph/api/user-get?view=graph-rest-1.0&tabs=http).
`user_emails` Array An array of Microsoft 365 user email IDs that have access to the document. It also includes user email IDs from all the Microsoft 365 groups that are assigned to the document.
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Service Account JSON Contents of the file containing Service Account credentials, such as client_id, client_email, and private_key. Copy the entire contents of the file.
Parameter Description
Date Time Render Option Determines how dates should be rendered in the output. You can select one of these options: `SERIAL_NUMBER` and `FORMATTED_STRING`. Select `SERIAL_NUMBER` only when the Value Render Option parameter is set to `UNFORMATTED_VALUE`. For more information, see [DateTimeRenderOption](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption).
Destination Database The destination database in which the destination table is created.
Destination Schema The destination schema in which the destination table is created.
Destination Table Prefix The destination table prefix is where report data pulled from Google Sheets is stored. The connector creates one destination table for each range. If no ranges are provided then sheet names are used as table identifiers. The first row in a sheet represents the column names in the destination table.
Ranges The list of ranges to retrieve from the spreadsheet. If no range is specified, all sheets in the specified spreadsheet will be downloaded. Provide each range in either [A1 or R1C1 notation](https://developers.google.com/sheets/api/guides/concepts#cell), separated by a comma. For example: `Sheet1!A1:B2,Sheet2!D4:E5,Sheet3`.
Run Schedule Run schedule on which data is retrieved from Google Sheets and saved in Snowflake. By default, the timer-driven scheduling strategy is used and here the user specifies an interval, for example, `8h`.
Spreadsheet ID The [unique identifier](https://developers.google.com/sheets/api/guides/concepts) for a spreadsheet. You can find it in the URL of the spreadsheet.
Value Render Option Determines how values should be rendered in the output. You can select one of these options: `FORMATTED_VALUE` and `UNFORMATTED_VALUE`. If you select `FORMATTED_VALUE`, then all the columns in the destination table are of VARCHAR type. For more information, see [ValueRenderOption](https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption).
Parameter Description
HubSpot Access Token HubSpot Private Application access token.
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Object Types List of comma-separated HubSpot object types to ingest. Supported object type values are:
- Appointments - Calls - Campaigns - Carts - Commerce Payments - Communications - Companies - Contacts - Courses - Deals - Discounts - Emails - Fees - Feedback Submissions - Goals - Invoices - Leads - Line Items - Listings - Meetings - Notes - Orders - Postal Mail - Products - Quotes - Quote Templates - Services - Subscriptions - Tasks - Taxes - Tickets - Users
Updated After Filter objects updated after specified date or time. This parameter is optional.
Data Ingestion Schedule Time between the next schedule. It should have a valid time duration, such as 30 minutes or 1 hour.
Parameter Description
Jira Email Email address for the Atlassian account.
Jira API Token API access token for your Atlassian Jira account with the necessary scopes (`read:jira-work` and `read:jira-user`).
Environment URL URL to the Atlassian Jira environment. For example, `https://your-domain.atlassian.net`.
Connection Method Must be set to `DIRECT` unless otherwise instructed by Snowflake.
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Search Type Type of search to perform. It has one of these possible values `SIMPLE` and `JQL`. Default value: `SIMPLE`.
Destination Table The Snowflake table where data is stored. It will be created if it doesn't exist. The name of the table must be unquoted and must be provided in uppercase. Additionally to the destination table, a flattened view based on destination table is created. The view name is a concatenation of the table name and the suffix `_VIEW`
JQL Query A JQL query used to search for Jira issues to fetch. It should be used only when Search Type is `JQL`.
Project Names List of projects from which the issues should be fetched. You can search for issues belonging to a particular project by project name, project key, or project ID. It should be used only when Search Type is `SIMPLE`. Provide a list of items, separated by commas. For example: `Project1, Project2`.
Status Category Status category filter for simple search. It should be used only when Search Type is `SIMPLE`. Example values are: `Done`, `In Progress`, `To Do`.
Updated After Filter issues updated after a specified date and time. It should be used only when Search Type is `SIMPLE`. It should be in the yyyy-MM-dd format, such as 2023-10-01.
Created After Filter issues created after a specified date and time. It should be used only when Search Type is `SIMPLE`. It should be in the yyyy-MM-dd format, such as 2023-10-01.
Issue Fields A list of fields to return for each issue, which is used to retrieve a subset of fields. IDs of custom fields can be obtained by following [this guide](https://confluence.atlassian.com/jirakb/get-custom-field-ids-for-jira-and-jira-service-management-744522503.html). This parameter accepts a comma-separated list. You can use special values: `*all` to fetch all fields, `*navigable` to fetch navigable fields, field prefixed with minus (`-`) to exclude field. For example, `*all,-description` returns all fields except description. Default value: `*all`.
Fetch All Worklogs Determines whether to fetch all worklogs for each issue. Default value: `false`. - When set to `true`, the connector enriches issues with all associated worklogs, beyond the default 20 worklogs per issue returned by the Jira Cloud REST API. - When set to `false`, only the first 20 worklogs per issue are fetched. Setting this parameter to `true` can impact performance due to the increased number of API calls required to fetch all worklogs for issues with more than 20 worklogs.
Maximum Page Size Maximum number of issues to return per request, with a default and maximum value of `1000`. Note that the Jira API may return fewer results depending on the total response size.
Object Privilege Notes
Database USAGE
Schema USAGE
Table OWNERSHIP Required for the connector to ingest data into a table.
Parameter Description Required
Kafka Auto Offset Reset Automatic offset configuration applied when no previous consumer offset is found corresponding to Kafka `auto.offset.reset` property. Possible values: **earliest**: automatically reset the offset to the earlier offset, **latest**: automatically reset the offset to the latest offset, **none**: throw exception to the consumer if no previous offset found for the consumer group. Default: latest Yes
Kafka Bootstrap Servers A comma-separated list of Kafka bootstrap servers, should contain a port, for example `kafka-broker:9092`. Yes
Kafka Consumer Group ID The ID of a consumer group used by the connector. Can be arbitrary but must be unique. Yes
Kafka SASL Password Password provided with configured password when using SASL512 SCRAM Mechanism
Kafka SASL Username Username provided with configured password when using SASL512 SCRAM Mechanism
Kafka Topic Format One of: names / pattern. Specifies whether the "Kafka Topics" provided are a comma separated list of names or a single regular expression. Yes
Kafka Topics A comma-separated list of Kafka topics or a regular expression. Yes
Snowflake Destination Database The database where data is persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Snowflake Destination Schema The schema where data is persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples: `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME`. `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively. Yes
Snowflake Destination Table The table where data is persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Field Data Type Description
topic String The name of the Kafka topic that the record came from.
partition number The number of the partition within the topic. (Note that this is the Kafka partition, not the Snowflake micro-partition.)
offset number The offset in that partition.
timestamp number Timestamp when the record was added to Kafka.
key String If the message is a Kafka KeyedMessage, this is the key for that message. In order for the connector to store the key in the RECORD_METADATA, the `key.converter` parameter in the Kafka configuration properties must be set to `org.apache.kafka.connect.storage.StringConverter`; otherwise, the connector ignores keys.
headers Object A header is a user-defined key-value pair associated with the record. Each record can have 0, 1, or multiple headers.
Parameter Description
Client ID The client ID of an application registered on LinkedIn
Client Secret The client secret related to the client ID
Refresh Token A user obtains the refresh token after the app registration process. They use it together with the client ID and the client secret to get an access token.
Token Endpoint The token endpoint is obtained by a user during the app registration process
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Report Name The unique name of the report. It is uppercased and used as the destination table name.
Start Date Start date from which ingestion should begin. Must be in the yyyy-MM-dd format.
Time Granularity Time granularity of results. Possible values: - `ALL`: Results grouped into a single result across the entire time range of the report. - `DAILY`: Results grouped by day. - `MONTHLY`: Results grouped by month. - `YEARLY`: Results grouped by year.
Conversion Window The timeframe for which data is refreshed during incremental load when `DAILY` time granularity is chosen. For example, if the [conversion window](https://www.linkedin.com/help/lms/answer/a426359) is equal to 30 days, then during the INCREMENTAL load, the ingestion starts from the date of the last successful ingestion minus 30 days. Required when `DAILY` time granularity is specified. For other possible time granularities, such as `ALL`, `MONTHLY`, and `YEARLY`, the SNAPSHOT ingestion strategy is used. Data from the start date to the present is always downloaded, so there is no need to use a conversion window. The conversion window can be any number from 1 to 365.
Metrics Comma-separated list of metrics. Metrics are case-sensitive. For more information, see [Reporting](https://learn.microsoft.com/en-us/linkedin/marketing/integrations/ads-reporting/ads-reporting?view=li-lms-2025-03&tabs=http#metrics-available). The `pivotValues` and `dateRange` metrics are mandatory and are automatically included by the connector. Up to 20 metrics can be specified, including the mandatory metrics.
Pivots Comma-separated list of pivots. The available pivots are as follows: - [Analytics Finder](https://learn.microsoft.com/en-us/linkedin/marketing/integrations/ads-reporting/ads-reporting?view=li-lms-2025-03&tabs=http#analytics-finder) - [Statistics Finder](https://learn.microsoft.com/en-us/linkedin/marketing/integrations/ads-reporting/ads-reporting?view=li-lms-2025-03&tabs=http#statistics-finder) The connector uses the Analytics Finder when zero or one pivot is specified, and switches to the Statistics Finder when two or three pivots are selected. You can use a maximum of three pivots.
Shares Comma-separated list of share IDs. This parameter can be used to filter results by share ID.
Campaigns Comma-separated list of campaign IDs. This parameter can be used to filter results by campaign ID.
Campaign Groups Comma-separated list of campaign group IDs. This parameter can be used to filter results by campaign group ID.
Accounts Comma-separated list of account IDs. This parameter can be used to filter results by account ID.
Companies Comma-separated list of company IDs. This parameter can be used to filter results by company ID.
Parameter Description
Access Token Token required to request Meta Ads Insights API
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Report Name Name of the report to be used as a destination table name. The name must be unique within the destination schema.
Report Object Id Identifier of the downloaded object from Meta Ads.
Reference to API listing different object ids:
- [Ad Accounts](https://developers.facebook.com/docs/graph-api/reference/user/adaccounts) - [Ad Sets](https://developers.facebook.com/docs/marketing-api/reference/ad-account/adsets/) - [Ads](https://developers.facebook.com/docs/marketing-api/reference/ad-account/ads/) - [Campaigns](https://developers.facebook.com/docs/marketing-api/reference/ad-account/campaigns/)
Report Ingestion Strategy Mode in which data is fetched, either snapshot or incremental
Meta Ads Version Version of Meta Ads API used for downloading reports. Allowed value: `v22.0`.
Report Level Presents the aggregation level of the result.
Possible values:
- `account` - `campaign` - `ad` - `adset`.
Report Fields Comma separated list of report fields
Report Breakdowns Comma separated list of report breakdowns. Full list of available breakdowns can be found [here](https://developers.facebook.com/docs/marketing-api/insights/breakdowns).
Report Time Increment Level of aggregation based on the day count
Possible values:
- `1` - Daily - `3` - Every 3 days - `7` - Weekly - `monthly` - Monthly - `90` - Quarterly - `all_days` - All days; do not slice the result
Report Action Time Time of action stats
Possible values:
- `conversion` - Reports action based on conversion date - `impression` - Reports action based on impression date - `mixed` - Mixed approach between conversion and impression
Report Click Attribution Window Attribution window for the click action
Possible values:
- `1d_click` - `7d_click` - `28d_click`
Report View Attribution Window Attribution window for the view action
Possible values:
- `1d_view` - `7d_view` - `28d_view`
Report Schedule Schedule time for processor creating reports
Report Start Date Start date from which the ingestion should happen. The date format is YYYY-MM-DD.
Parameter Description
Source Dataverse Environment URL The main identifier of a source system to fetch data. The URL indicates a namespace where Dataverse tables exist. It also lets you create a scope parameter for OAuth.
Source Tenant ID Microsoft Azure Tenant ID. It's used to create OAuth URLs. Microsoft Dataverse Environment must belong to this tenant.
Source OAuth Client ID Microsoft Azure Client ID used to access Microsoft Dataverse API. [Microsoft Dataverse Web API](https://learn.microsoft.com/en-us/power-apps/developer/data-platform/webapi/overview) uses OAuth authentication to secure access, and the connector uses the client credentials flow. To learn about client ID and how to find it in Microsoft Entra, see [Application ID (client ID)](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application#application-id-client-id).
Source OAuth Client Secret Microsoft Azure Client Secret used to access Microsoft Dataverse API. [Microsoft Dataverse Web API](https://learn.microsoft.com/en-us/power-apps/developer/data-platform/webapi/overview) uses OAuth authentication to secure access, and the connector uses the client credentials flow. To learn about client secret and how to find it in Microsoft Entra, see [Certificates & secrets](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application#certificates--secrets).
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Scheduling Interval Interval to be used as a triggering interval for the processor fetching list of tables and initializing ingestion.
Source Tables Filter Strategy Strategy for filtering tables to be ingested. Can be one of REGEXP and LIST.
Source Tables Filter Value Value of the tables filter. When Source Tables Filter Strategy is set to REGEXP - this is the regular expression to be matching selected tables. When LIST is provided, then it is a comma separated list of table names.
Column Filter JSON Optional. A JSON array specifying per-table column filters. Columns can be included or excluded by name (`included`, `excluded`) or by regular expression pattern (`includedPattern`, `excludedPattern`). The `table` value must be the **singular logical entity name** (e.g., `annotation`), not the plural entity set name used in `Source Tables Filter Value` (e.g., `annotations`). For example: `[ {"table": "mytable", "excluded": ["binarycolumn", "binarycolumn_binary"]} ]` excludes large binary columns from `mytable`. See [Replicate a subset of columns in a table](#replicate-a-subset-of-columns-in-a-table) for full details.
`log_bin` Set to `on`. This enables the binary log that records structural and data changes.
`binlog_format` Set to `row`. The connector supports only row-based replication. MySQL 8.x versions may be the last ones to support this setting, and future versions will only support row-based replication. Not applicable in GCP Cloud SQL, where it is fixed at the right value.
`binlog_row_metadata` Set to `full`. The connector requires all row metadata to operate, most importantly, column names and primary key information. Under Microsoft Azure Database for MySQL the `binlog_row_metadata` field isn't user modifiable. Raise a Microsoft support ticket to change this value.
`binlog_row_image` Set to `full`. The connector requires that all columns be written into the binary log. Not applicable in Amazon Aurora, where it is fixed at the right value.
`binlog_row_value_options` Leave empty. This option only affects JSON columns, where it can be set to include only the modified parts of JSON documents for `UPDATE` statements. The connector requires that full documents are written into the binary log.
`binlog_expire_logs_seconds` Set to at least a few hours, or longer to ensure that the database agent can continue incremental replication after extended pauses or downtime. Snowflake recommends that you set the [binary log expiration period (binlog_expire_logs_seconds)](https://dev.mysql.com/doc/refman/8.4/en/replication-options-binary-log.html#sysvar_binlog_expire_logs_seconds) to at least a few hours to ensure stable working of the connector. After binary log expiration period ends, binary log files might be automatically removed. If the integration is paused for a long period, for example due to maintenance work, and the expired binary log files are deleted during this time, Openflow can't replicate the data from these files. If you're using scheduled replication, the value needs to be longer than the configured schedule.
`binlog_legacy_event_pos` Set to `ON`. Required only when the source is MariaDB. The connector requires this flag to track binary log positions correctly during replication. Not applicable to MySQL.
TO ''@'%' ``` For more information on replication security, see [Binary log](https://dev.mysql.com/doc/refman/8.4/en/binary-log.html). 7. As a Snowflake account administrator, perform the following tasks: 1. Create a Snowflake user with the type as [SERVICE](#label-user-type-property). Create a database to store the replicated data, and set up privileges for the Snowflake user to create objects in that database by granting the [USAGE and CREATE SCHEMA privileges](#label-database-privileges). ```sql CREATE DATABASE ; CREATE USER TYPE=SERVICE COMMENT='Service user for automated access of Openflow'; CREATE ROLE ; GRANT ROLE TO USER ; GRANT USAGE ON DATABASE TO ROLE ; GRANT CREATE SCHEMA ON DATABASE TO ROLE ; CREATE WAREHOUSE WITH WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE; GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE ; ``` 2. Create a pair of secure keys (public and private). Store the private key for the user in a file to supply to the connector's configuration. Assign the public key to the Snowflake service user: ```sql ALTER USER SET RSA_PUBLIC_KEY = 'thekey'; ``` For more information, see [pair of keys](/user-guide/key-pair-auth). 3. Designate a warehouse for the connector to use. Start with the `XSMALL` warehouse size, then experiment with size depending on the amount of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than the warehouse size. ## Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ## Runtime sizing The runtime size determines the CPU and memory available to the connector. The available sizes are Small, Medium, and Large. The connector requires Medium or Large. Choose the size when you create the runtime: you can't change the size of an existing runtime in place. Choose Large if you expect high replication throughput or if source tables contain wide rows. ## Resize a runtime Runtime size is fixed at creation, so to change size you run the connector on a different runtime. You have two options depending on whether you want to preserve the current replication progress. If you don't need to keep the progress of the current connector, the simplest path is to create a new runtime at the size you need and install a new connector instance on it. The new connector starts from scratch: it snapshots all configured tables and then captures ongoing changes from that point. The replication progress of the existing connector is discarded. To keep the progress of the current connector, for example to avoid re-snapshotting tables that took a long time to snapshot initially, migrate the connector to the new runtime. This reuses the existing destination tables and resumes incremental replication from where it left off. For migration instructions, see [Reinstall the connector](#label-mysql-reinstall-connector). ## Configure the connector To configure the connector, do the following as a data engineer: 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values. For more information on the required parameter values, see the following sections: - [](#label-of-mysql-source-parameters): Used to establish a connection with MySQL. - [](#label-of-mysql-destination-parameters): Used to establish a connection with Snowflake. - [](#label-of-mysql-ingestion-parameters): Used to specify the tables to replicate. Start with setting the parameters of the MySQL Source Parameters context, then the MySQL Destination Parameters context. After this is done, you can enable the connector. The connector should connect to both MySQL and Snowflake and start running. However, the connector doesn't replicate any data until any tables to be replicated are explicitly added to its configuration. To configure specific tables for replication, edit the MySQL Ingestion Parameters context. After you apply the changes to the Replication Parameters context, the configuration is picked up by the connector, and the replication lifecycle starts for every table. ### MySQL Source Parameters
### MySQL Destination Parameters
### MySQL Ingestion Parameters
## Restart table replication A table in FAILED state — for example, due to a missing primary key or unsupported schema change — does not restart automatically. If a table enters a FAILED state or you need to restart replication from scratch, use the following procedure to remove and re-add the table to replication. If the failure was caused by an issue in the source table such as a missing primary key, resolve that issue in the source database before continuing. 1. Remove the table from replication, using one of the following methods: - Add the table to the **Re-snapshot Table Exclusions** parameter to temporarily exclude it from replication. This is convenient when the table is matched by an **Included Table Regex** that you don't want to change. - In the Ingestion Parameters context, either remove the table from **Included Table Names** or modify the **Included Table Regex** so the table is no longer matched. 2. Verify the table has been removed: 1. In the Openflow runtime canvas, right-click a processor group and choose **Controller Services**. 2. In the table listing controller services, locate the **Table State Store** row, click the three vertical dots on the right side of the row, then choose **View State**. You must wait until the table's state is fully removed from this list before proceeding. Do not continue until this configuration change has completed. 3. Clean up the destination: Once the table's state shows as fully removed, manually [DROP](/sql-reference/sql/drop-table) the destination table in Snowflake. Note that the connector will not overwrite an existing destination table during the snapshot phase; if the table still exists, replication will fail again. Optionally, the journal table and stream can also be removed if they are no longer needed. 4. Re-add the table by reversing the change you made in the first step: either remove the table from **Re-snapshot Table Exclusions**, or add it back to **Included Table Names** or **Included Table Regex**. The connector then re-snapshots the table. 5. Verify the restart: Check the **Table State Store** using the instructions given previously. The state of the table should appear with the status NEW, then transition to SNAPSHOT_REPLICATION, and finally INCREMENTAL_REPLICATION. ## Replicate a subset of columns in a table The connector can filter the data replicated per table to a subset of configured columns. Primary key columns are always included regardless of exclusions. To apply column filters, set the **Column Filter JSON** parameter in the Ingestion Parameters context to a JSON array of filter objects, one per table you want to filter. Columns can be included or excluded by name or by regular expression pattern. You can apply a single condition per table, or combine multiple conditions, with exclusions always taking precedence over inclusions. ## Syntax Each object in the array identifies a table and specifies which columns to include or exclude. ```javascript [ { "schema": "" | "schemaPattern": "", "table": "
Parameter Description
MySQL Connection URL The full JDBC URL to the source database. The connector uses the MariaDB driver, which is compatible with MySQL and requires the `jdbc:mariadb` prefix in the URL. If the SSL is disabled, then the connection URL should have the `allowPublicKeyRetrieval` parameter set to `true`. Examples: - With SSL enabled: `jdbc:mariadb://example.com:3306` - With SSL disabled: `jdbc:mariadb://example.com:3306?allowPublicKeyRetrieval=true`
MySQL JDBC Driver The absolute path to the [MariaDB JDBC driver jar](https://mariadb.com/downloads/connectors/connectors-data-access/java8-connector/). The connector uses the MariaDB driver, which is compatible with MySQL. Select the **Reference asset** checkbox to upload the MariaDB JDBC driver. Example: `/opt/resources/drivers/mariadb-java-client-3.5.2.jar`
MySQL Username The username for the connector.
MySQL Password The password for the connector.
Parameter Description Required
Destination Database The database where data is persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema Pattern A pattern for the names of destination schemas where data is persisted. The connector creates the schemas if they don't exist. You can customize the pattern per ingested table using these optional variables: - `${source.schema.name}`: the source database (a database in MySQL maps to a schema in Snowflake). - `${source.table.name}`: a source table's name. For example, for a table `my_database.users`, the pattern `prefix_${source.schema.name}` evaluates to `prefix_my_database`. To ingest all tables into a single schema, provide a schema name without any variables, like `destination_schema`. Don't change this setting after the connector has begun ingesting data. Changing this setting after ingestion has begun breaks the existing ingestion. If you must change this setting, create a new connector instance. Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data is persisted. Yes
Snowflake Connection Strategy When using KEY_PAIR, specify the strategy for connecting to Snowflake: - **STANDARD** (default): Connect using standard public routing to Snowflake services. - **PRIVATE_CONNECTIVITY**: Connect using private addresses associated with the supporting cloud platform such as AWS PrivateLink. Required for BYOC with KEY_PAIR only, otherwise ignored.
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake Private Key File. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use Snowflake Role assigned to the runtime or child role granted to this Snowflake Role. You can find your runtime Snowflake Role in the Openflow UI, by expanding the **More Options [⋮]** button for your runtime and selecting **Set Snowflake role**. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Included Table Names A comma-separated list of table paths, including their schemas. Example: `public.my_table, other_schema.other_table`
Included Table Regex A regular expression to match against table paths. Every path matching the expression will be replicated, and new tables matching the pattern that get created later will also be included automatically. Example: `public\.auto_.*`
Column Filter JSON Optional. A JSON array of filter objects specifying which columns to include or exclude per table. For syntax details and examples, see [Replicate a subset of columns in a table](#replicate-a-subset-of-columns-in-a-table).
Merge Task Schedule CRON CRON expression defining periods when merge operations from Journal to Destination Table will be triggered. Set it to `* * * * * ?` if you want to have continuous merge or time schedule to limit warehouse run time. For example, the string `* 0 * * * ?` indicates that you want to schedule merges at full hour for one minute. The string `* 20 14 ? * MON-FRI` indicates that you want to schedule merges at 2:20 PM every Monday through Friday. For more information and examples, see the [CronTrigger tutorial](https://www.quartz-scheduler.org/documentation/quartz-2.2.2/tutorials/tutorial-lesson-06.html).
Object Identifier Resolution Specifies how source object identifiers such as the names of schemas, tables, and columns are stored and queried in Snowflake. This setting specifies that you must use double quotes in SQL queries. Option 1: Default, case-sensitive. For backwards compatibility. - **Transformation**: Case is preserved. For example, `My_Table` remains `My_Table`. - **Queries**: SQL queries must use double quotes to match the exact case for database objects. For example, `SELECT * FROM "My_Table";`. Snowflake recommends using this option if you must preserve source casing for legacy or compatibility reasons. For example, if the source database includes table names that differ in case only–such as `MY_TABLE` and `my_table`–that would result in a name collision when using when using case-insensitive comparisons. Option 2: Recommended, case-insensitive - **Transformation**: All identifiers are converted to uppercase. For example, `My_Table` becomes `MY_TABLE`. - **Queries**: SQL queries are case-insensitive and don't require SQL double quotes. For example, `SELECT * FROM my_table;` returns the same results as `SELECT * FROM MY_TABLE;`. Snowflake recommends using this option if database objects are not expected to have mixed case names. Do not change this setting after the connector has begun ingesting data. Changing this setting after ingestion has begun breaks the existing ingestion. If you must change this setting, create a new connector instance.
Concurrent Snapshot Queries Maximum number of concurrent queries to the source database to run in the Snapshot flow. Increasing this can speed up snapshotting large numbers of tables, but will also increase the load on the source database.
" | "tablePattern": "", "included": ["", ""], "excluded": ["", ""], "includedPattern": "", "excludedPattern": "" } ] ``` The following rules apply: - Use `schema` and `table` for exact name matching, or `schemaPattern` and `tablePattern` for regex matching. You can't use both a field and its pattern variant in the same object (for example, `schema` and `schemaPattern` can't both appear). - At least one of `included`, `excluded`, `includedPattern`, or `excludedPattern` must be provided. - When both included and excluded filters are specified, exclusions take precedence. - When multiple filters match the same table, the last matching filter is used, with exact matches taking precedence over pattern-based filters. - The value can be an array of objects to apply different filters to different tables. ## Examples Include specific columns by name: ```javascript [ { "schema": "public", "table": "orders", "included": ["account_id", "status", "created_at"] } ] ``` Exclude specific columns by name: ```javascript [ { "schema": "public", "table": "orders", "excluded": ["internal_note", "debug_flag"] } ] ``` Combine an include pattern with a specific exclusion (for example, include all email columns except `admin_email`): ```javascript [ { "schema": "public", "table": "contacts", "includedPattern": ".*_email", "excluded": ["admin_email"] } ] ``` Mix a schema pattern with an exact table name to apply a filter across schemas: ```javascript [ { "schemaPattern": "data_.*", "table": "customers", "excluded": ["internal_note"] } ] ``` Pass multiple filter objects to apply different rules to different tables: ```javascript [ {"schema": "public", "table": "orders", "included": ["account_id", "status"]}, {"schema": "public", "table": "customers", "excludedPattern": ".*_internal"} ] ``` ### Including and excluding the same column Removing a column from a table's replicated set (by excluding it or by removing it from the included list) has the same effect on the destination as dropping the column at the source: the connector soft-deletes the column on the destination by renaming it with a suffix (by default, `__SNOWFLAKE_DELETED`). If you then add the column back to the replicated set and later remove it a second time, replication for the affected table fails because the soft-deleted column name is already taken. To recover, restart replication for the affected table. ## Track data changes in tables The connector replicates not only the current state of data from the source tables, but also every state of every row from every changeset. This data is stored in journal tables created in the same schema as the destination table. The journal table names are formatted as: `_JOURNAL__` where `` is the value of epoch seconds when the source table was added to replication, and `` is an integer increasing with every schema change on the source table. As a result, source tables that undergo schema changes will have multiple journal tables. When a table is removed from replication, then added back, the `` value will change, and `` will start again from `1`. Snowflake recommends that you don't alter the structure of journal tables in any way. They are used by the connector to update the destination table as part of the replication process. The connector never drops journal tables, but does make use of the latest journal for every replicated source table, only reading append-only streams on top of journals. To reclaim the storage, you can: - Truncate all journal tables at any time. - Drop the journal tables related to source tables that were removed from replication. - Drop all but the latest generation journal tables for actively replicated tables. For example, if your connector is set to actively replicate source table `orders`, and you have earlier removed table `customers` from replication, you may have the following journal tables. In this case you can drop all of them *except* `orders_5678_2`. ```text customers_1234_1 customers_1234_2 orders_5678_1 orders_5678_2 ``` ## Configure scheduling of merge tasks The connector uses a warehouse to merge change data capture (CDC) data into destination tables. This operation is triggered by the MergeSnowflakeJournalTable processor. If there are no new changes or if no new flow files are waiting in the MergeSnowflakeJournalTable queue, no merge is triggered and the warehouse auto-suspends. To limit the warehouse cost and limit merges to only scheduled time, use the CRON expression in the Merge task Schedule CRON parameter. It throttles the flow files coming to the MergeSnowflakeJournalTable processor and merges are triggered only in a dedicated period of time. For more information about scheduling, see [Scheduling strategy](https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-strategy). ## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. --- title: Set up the Openflow Connector for PostgreSQL source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/postgres/setup.md section: Loading & Unloading Data --- # Set up the %postgresql% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/postgres/about) - [](/user-guide/data-integration/openflow/connectors/postgres/data-mapping) - [](/user-guide/data-integration/openflow/connectors/postgres/maintenance) This topic describes the steps to set up the %postgresql%. This connector can be configured to immediately start replicating incremental changes for newly added tables, bypassing the snapshot load phase. This option is often useful when reinstalling the connector in an account where previously replicated data exists and you want to continue replication without having to re-snapshot tables. For details on the incremental load process, see [Incremental replication](/user-guide/data-integration/openflow/connectors/postgres/incremental-replication). For information about restarting table replication for failed tables, see [Restart table replication](#label-of-postgres-restart-table-replication). ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/postgres/about). 2. Ensure that you have reviewed the [supported PostgreSQL versions](#label-supported-pg-versions). 3. Recommended: Ensure that you add only one connector instance per runtime. 4. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 5. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-postgresql) connector. 6. As a database administrator, perform the following tasks: 1. [Configure wal_level](#configure-wal-level) 2. [Create a publication](#create-a-publication) 3. Ensure that there is enough disk space on your PostgreSQL server for the WAL. This is because once created, a replication slot causes PostgreSQL to retain the WAL data from the position held by the replication slot, until the connector confirms and advances that position. 4. Allow at least **1** logical replication slot and **2** WAL senders per %postgresql% connector instance on the server. Set `max_replication_slots` and `max_wal_senders` high enough to cover that and all other replication traffic on the instance. 5. Ensure that every table enabled for replication has one of the following identity key configurations: - **Primary key**: The connector uses primary key columns as the identity key and requires the table's REPLICA IDENTITY to be set to `DEFAULT`. - **Unique index (for tables without a primary key)**: Use a unique index that meets the requirements in [](#label-postgres-configure-replica-identity-using-index) as the identity key. You must run `ALTER TABLE REPLICA IDENTITY USING INDEX ` before enabling replication. 6. For tables without a primary key, see [](#label-postgres-configure-replica-identity-using-index). 7. Create a user for the connector. The connector requires a user with the `REPLICATION` attribute and permissions to SELECT from every replicated table. Create that user with a password to enter into the connector's configuration. For more information on replication security, see [Security](https://www.postgresql.org/docs/current/logical-replication-security.html). 7. As a Snowflake account administrator, perform the following tasks: 1. Create a Snowflake user with the type as [SERVICE](#label-user-type-property). Create a database to store the replicated data, and set up privileges for the Snowflake user to create objects in that database by granting the [USAGE and CREATE SCHEMA privileges](#label-database-privileges). ```sql CREATE DATABASE ; CREATE USER TYPE=SERVICE COMMENT='Service user for automated access of Openflow'; CREATE ROLE ; GRANT ROLE TO USER ; GRANT USAGE ON DATABASE TO ROLE ; GRANT CREATE SCHEMA ON DATABASE TO ROLE ; CREATE WAREHOUSE WITH WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE; GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE ; ``` 2. Create a pair of secure keys (public and private). Store the private key for the user in a file to use while configuring the connector. Assign the public key to the Snowflake service user: ```sql ALTER USER SET RSA_PUBLIC_KEY = 'thekey'; ``` For more information, see [key-pair authentication](/user-guide/key-pair-auth). 3. Designate a warehouse for the connector to use. Start with the `XSMALL` warehouse size, then experiment with size depending on the amount of tables being replicated, and the amount of data transferred. Large numbers of tables typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than the warehouse size. ### Configure wal_level %postgresql% requires [wal_level](https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LEVEL) to be set to `logical`. Depending on where your PostgreSQL server is hosted, you can configure the wal_level as follows:
### Create a publication %postgresql% requires a [publication](https://www.postgresql.org/docs/current/logical-replication-publication.html#LOGICAL-REPLICATION-PUBLICATION) to be created and configured in PostgreSQL before replication starts. You can create it for all, or a subset of tables, as well as for specific tables with specified columns only. Make sure that every table and column that you plan to have replicated is included in the publication. You can also modify the publication later, while the connector is running. To create and configure a publication, do the following: 1. Log in as a user with the CREATE privilege on the database and run the following query: - For PostgreSQL 13 and later: ```sqlsyntax CREATE PUBLICATION WITH (publish_via_partition_root = true); ``` The additional `publish_via_partition_root` is needed for correct replication of partitioned tables. To learn more about ingestion of partitioned tables see [](#label-postgres-connector-replicate-partitioned-table). - For PostgreSQL versions earlier than 13: ```sqlsyntax CREATE PUBLICATION ; ``` 2. Define tables that the database agent will be able to see using:
```sqlsyntax ALTER PUBLICATION ADD TABLE
On premise Execute following query with superuser or user with `ALTER SYSTEM` privilege:
```ini ALTER SYSTEM SET wal_level = logical; ```
RDS User used by the agent needs to have the `rds_superuser` or `rds_replication` roles assigned. You also need to set:
- `rds.logical_replication` static parameter to 1. - `max_replication_slots`, `max_connections` and `max_wal_senders` parameters according to your database and replication setup.
AWS Aurora Set the `rds.logical_replication` static parameter to 1.
GCP Set the following flags:
- `cloudsql.logical_decoding=on`. - `cloudsql.enable_pglogical=on`. For more information, see [Google Cloud documentation](https://cloud.google.com/sql/docs/postgres/replication/configure-logical-replication#set-up-logical-replication-with-pglogical).
Azure Set the replication support to `Logical`. For more information, see [Azure documentation](https://learn.microsoft.com/en-us/azure/postgresql/single-server/concepts-logical#set-up-your-server).
; ``` For partitioned tables, it's enough to just add the root partition table to the publication. See [](#label-postgres-connector-replicate-partitioned-table) for more details. **PostgreSQL 15 and later** support configuring publications for a specified subset of table columns. For the connector to support this correctly, you must use the [column filtering settings](#label-postgres-connector-replication-subset-of-columns) to include the same columns as set on the publication. Without this setting, the connector will behave as follows: - In the destination table, columns that aren't included in the filter will be suffixed with `__DELETED`. All data replicated during the snapshot phase will be retained. - After you add new columns to the publication, the table will be permanently failed, and you will need to restart its replication. For more information, see [ALTER PUBLICATION](https://www.postgresql.org/docs/current/sql-alterpublication.html). ### Configure replica identity for tables without a primary key For tables without a primary key, you can use a unique index as the identity key by setting `REPLICA IDENTITY USING INDEX`: ```sqlsyntax ALTER TABLE REPLICA IDENTITY USING INDEX ; ``` The connector automatically detects this setting during schema discovery and uses the unique index columns for UPDATE and DELETE operations. The unique index must meet all of the following requirements. PostgreSQL validates these when you run the `ALTER TABLE` command and rejects any index that doesn't qualify:
To check the current REPLICA IDENTITY setting for a table, run: ```sqlsyntax SELECT n.nspname AS schema_name, c.relname AS table_name, CASE c.relreplident WHEN 'd' THEN 'DEFAULT' WHEN 'n' THEN 'NOTHING' WHEN 'f' THEN 'FULL' WHEN 'i' THEN 'USING INDEX: ' || i.relname END AS replica_identity FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace LEFT JOIN pg_index ix ON c.oid = ix.indrelid AND ix.indisreplident LEFT JOIN pg_class i ON ix.indexrelid = i.oid WHERE n.nspname = '' AND c.relname = '' AND c.relkind IN ('r','p'); ``` ## Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ## Runtime sizing The runtime size determines the CPU and memory available to the connector. The available sizes are Small, Medium, and Large. The connector requires Medium or Large. Choose the size when you create the runtime: you can't change the size of an existing runtime in place. Choose Large if you expect high replication throughput or if source tables contain wide rows. ## Resize a runtime Runtime size is fixed at creation, so to change size you run the connector on a different runtime. You have two options depending on whether you want to preserve the current replication progress. If you don't need to keep the progress of the current connector, the simplest path is to create a new runtime at the size you need and install a new connector instance on it. The new connector starts from scratch: it snapshots all configured tables and then captures ongoing changes from that point. The replication progress of the existing connector is discarded. To keep the progress of the current connector, for example to avoid re-snapshotting tables that took a long time to snapshot initially, migrate the connector to the new runtime. This reuses the existing destination tables and resumes incremental replication from where it left off. For migration instructions, see [Reinstall the connector](#label-postgres-reinstall-connector). ## Configure the connector To configure the connector, do the following as a data engineer: 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values. For more information on the required parameter values, see the following sections: - [](#label-of-postgres-source-parameters): Used to establish a connection with PostgreSQL. - [](#label-of-postgres-destination-parameters): Used to establish a connection with Snowflake. - [](#label-of-postgres-ingestion-parameters): Used to specify the tables to replicate. Start with setting the parameters of the PostgreSQL Source Parameters context, then the PostgreSQL Destination Parameters context. Once this is done, you can enable the connector, and it should connect both to PostgreSQL and Snowflake and start running. However, it won't replicate any data until any tables are explicitly added to its configuration. To configure specific tables for replication, edit the PostgreSQL Ingestion Parameters context. Shortly after you apply the changes to the Replication Parameters context, the configuration will be picked up by the connector, and the replication lifecycle will start for every table. ### PostgreSQL Source Parameters
### PostgreSQL Destination Parameters
### PostgreSQL Ingestion Parameters
## Replicate tables from a PostgreSQL replica server The connector can ingest data from a primary server, a [hot standby replica](https://www.postgresql.org/docs/current/hot-standby.html), or subscriber server using [logical replication](https://www.postgresql.org/docs/current/logical-replication.html). Before configuring the connector to connect to a PostgreSQL replica, ensure that replication between primary and replica nodes works correctly. When investigating issues with missing data in the connector, first ensure that missing rows are present in replica server used by the connector. Additional considerations when connecting to a standby replica:
- PostgreSQL version of the server must be >= 16. Amazon Aurora is not supported because it [doesn't offer logical decoding from read replicas](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Replication.Logical.html). - Only connecting to hot standby replica is supported. Note that warm standby replicas can't accept connections from clients until they are promoted to a primary instance. - [The publication](#label-postgres-connector-create-a-publication) needed by the connector must be created on the primary server, not the standby server. The standby server is read-only and doesn't allow to create publication.
If you connect to a hot standby instance and see **Trying to create the replication slot '<replication slot>' timed out. If connecting to a standby instance, ensure there is some traffic on the primary PostgreSQL instance, otherwise the call to create a replication slot will never return.** error in the Openflow bulletin, or the **Read PostgreSQL CDC Stream** processor isn't starting, log in to the primary PostgreSQL instance and execute the following query: ```sqlsyntax SELECT pg_log_standby_snapshot(); ``` The error occurs when there are no data changes in the primary server. As such the connector can stall while creating a replication slot on the replica server. This results from the replica server requiring information about running transactions from the primary server to be able to create a replication slot. Primary servers won't send the information while idle. The `pg_log_standby_snapshot()` function forces the primary server to send information about running transactions to the replica server. ## Replicate a subset of columns in a table The connector can filter the data replicated per table to a subset of configured columns. Primary key columns are always included regardless of exclusions. To apply column filters, set the **Column Filter JSON** parameter in the Ingestion Parameters context to a JSON array of filter objects, one per table you want to filter. Columns can be included or excluded by name or by regular expression pattern. You can apply a single condition per table, or combine multiple conditions, with exclusions always taking precedence over inclusions. ## Syntax Each object in the array identifies a table and specifies which columns to include or exclude. ```javascript [ { "schema": "" | "schemaPattern": "", "table": "
Requirement Details
`Unique` The index must be unique so each row can be identified.
Covers all rows The index must cover all rows in the table. Partial indexes (those with a `WHERE` clause) aren't supported.
Non-deferrable The index can't be based on a deferrable unique constraint. Indexes created with `CREATE UNIQUE INDEX` are non-deferrable by default.
All columns `NOT NULL` Every column in the index must be defined as `NOT NULL`.
Plain columns only Expression indexes such as `LOWER(email)` aren't supported. Only plain column indexes are supported.
Parameter Description
PostgreSQL Connection URL The full JDBC URL to the source database. Example: `jdbc:postgresql://example.com:5432/public` If you are connecting to PostgreSQL replica server, see [](#label-replicate-tables-from-postgres-replica).
PostgreSQL JDBC Driver The path to the [PostgreSQL JDBC driver jar](https://jdbc.postgresql.org/). Download the jar from its website, then select the **Reference asset** checkbox to upload and attach it.
PostgreSQL Username The username for the connector.
PostgreSQL Password The password for the connector.
Publication Name The name of the publication you created earlier.
Replication Slot Name Optional. When no value is provided, the connector will create a new, uniquely-named slot. When given a value, the connector will use the existing slot, or create a new one with the provided name. Changing the value for a running connector will restart reading the incremental change data capture (CDC) stream from the updated slot's position.
Parameter Description Required
Destination Database The database where data is persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema Pattern A pattern for the names of destination schemas where data is persisted. The connector creates the schemas if they don't exist. You can customize the pattern per ingested table using these optional variables: - `${source.database.name}`: a source table's database. - `${source.schema.name}`: a source table's schema. - `${source.table.name}`: a source table's name. For example, for a table with the qualified name `source_db.tenant_a.data`, the pattern `prefix_${source.database.name}_${source.schema.name}` evaluates to `prefix_source_db_tenant_a`. To ingest all tables into a single schema, provide a schema name without any variables, like `destination_schema`. Don't change this setting after the connector has begun ingesting data. Changing this setting after ingestion has begun breaks the existing ingestion. If you must change this setting, create a new connector instance. Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data is persisted. Yes
Snowflake Connection Strategy When using KEY_PAIR, specify the strategy for connecting to Snowflake: - **STANDARD** (default): Connect using standard public routing to Snowflake services. - **PRIVATE_CONNECTIVITY**: Connect using private addresses associated with the supporting cloud platform such as AWS PrivateLink. Required for BYOC with KEY_PAIR only, otherwise ignored.
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake Private Key File. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use Snowflake Role assigned to the runtime or child role granted to this Snowflake Role. You can find your runtime Snowflake Role in the Openflow UI, by expanding the **More Options [⋮]** button for your runtime and selecting **Set Snowflake role**. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Included Table Names A comma-separated list of table paths, including their schemas. Example: `public.my_table, other_schema.other_table`. Select tables either by name or by Regex. If you use both, all matching tables from either option will be included. Tables being sub-partitions are always excluded from ingestion. See [](#label-postgres-connector-replicate-partitioned-table) for more information.
Included Table Regex A regular expression to match against table paths. Every path matching the expression will be replicated, and new tables matching the pattern that get created later will also be included automatically. Example: `public\.auto_.*` Select tables either by name or by Regex. If you use both, all matching tables from either option will be included. Tables being sub-partitions are always excluded from ingestion. See [](#label-postgres-connector-replicate-partitioned-table) for more information.
Column Filter JSON Optional. A JSON array of filter objects specifying which columns to include or exclude per table. For syntax details and examples, see [](#label-postgres-connector-replication-subset-of-columns).
Merge Task Schedule CRON CRON expression defining periods when merge operations from Journal to Destination Table will be triggered. Set it to `* * * * * ?` if you want to have continuous merge or time schedule to limit warehouse run time. For example:
- The string `* 0 * * * ?` indicates that you want to schedule merges at full hour for one minute - The string `* 20 14 ? * MON-FRI` indicates that you want to schedule merges at 2:20 PM every Monday through Friday.
For additional information and examples, see the cron triggers tutorial in the [Quartz Documentation](https://www.quartz-scheduler.org/documentation/quartz-2.2.2/tutorials/tutorial-lesson-06.html)
Object Identifier Resolution Specifies how source object identifiers such as the names of schemas, tables, and columns are stored and queried in Snowflake. This setting specifies that you must use double quotes in SQL queries. Option 1: Default, case-sensitive. For backwards compatibility. - **Transformation**: Case is preserved. For example, `My_Table` remains `My_Table`. - **Queries**: SQL queries must use double quotes to match the exact case for database objects. For example, `SELECT * FROM "My_Table";`. Snowflake recommends using this option if you must preserve source casing for legacy or compatibility reasons. For example, if the source database includes table names that differ in case only–such as `MY_TABLE` and `my_table`–that would result in a name collision when using when using case-insensitive comparisons. Option 2: Recommended, case-insensitive - **Transformation**: All identifiers are converted to uppercase. For example, `My_Table` becomes `MY_TABLE`. - **Queries**: SQL queries are case-insensitive and don't require SQL double quotes. For example, `SELECT * FROM my_table;` returns the same results as `SELECT * FROM MY_TABLE;`. Snowflake recommends using this option if database objects are not expected to have mixed case names. Do not change this setting after the connector has begun ingesting data. Changing this setting after ingestion has begun breaks the existing ingestion. If you must change this setting, create a new connector instance.
Concurrent Snapshot Queries Maximum number of concurrent queries to the source database to run in the Snapshot flow. Increasing this can speed up snapshotting large numbers of tables, but will also increase the load on the source database.
" | "tablePattern": "", "included": ["", ""], "excluded": ["", ""], "includedPattern": "", "excludedPattern": "" } ] ``` The following rules apply: - Use `schema` and `table` for exact name matching, or `schemaPattern` and `tablePattern` for regex matching. You can't use both a field and its pattern variant in the same object (for example, `schema` and `schemaPattern` can't both appear). - At least one of `included`, `excluded`, `includedPattern`, or `excludedPattern` must be provided. - When both included and excluded filters are specified, exclusions take precedence. - When multiple filters match the same table, the last matching filter is used, with exact matches taking precedence over pattern-based filters. - The value can be an array of objects to apply different filters to different tables. ## Examples Include specific columns by name: ```javascript [ { "schema": "public", "table": "orders", "included": ["account_id", "status", "created_at"] } ] ``` Exclude specific columns by name: ```javascript [ { "schema": "public", "table": "orders", "excluded": ["internal_note", "debug_flag"] } ] ``` Combine an include pattern with a specific exclusion (for example, include all email columns except `admin_email`): ```javascript [ { "schema": "public", "table": "contacts", "includedPattern": ".*_email", "excluded": ["admin_email"] } ] ``` Mix a schema pattern with an exact table name to apply a filter across schemas: ```javascript [ { "schemaPattern": "data_.*", "table": "customers", "excluded": ["internal_note"] } ] ``` Pass multiple filter objects to apply different rules to different tables: ```javascript [ {"schema": "public", "table": "orders", "included": ["account_id", "status"]}, {"schema": "public", "table": "customers", "excludedPattern": ".*_internal"} ] ``` ### Including and excluding the same column Removing a column from a table's replicated set (by excluding it or by removing it from the included list) has the same effect on the destination as dropping the column at the source: the connector soft-deletes the column on the destination by renaming it with a suffix (by default, `__SNOWFLAKE_DELETED`). If you then add the column back to the replicated set and later remove it a second time, replication for the affected table fails because the soft-deleted column name is already taken. To recover, restart replication for the affected table. ## Replicate a partitioned table The connector supports replication of partitioned tables for PostgreSQL servers with version >= 15. A PostgreSQL partitioned table will be replicated into Snowflake as a single destination table. For example, if you have a partitioned table `orders`, with sub-partitions `orders_2023`, `orders_2024`, and configured the connector to ingest all tables matching `orders.*` pattern, then only the `orders` table will be replicated to Snowflake, and it will include data from all sub-partitions. To support replication of partitioned tables, ensure that [the publication](#label-postgres-connector-create-a-publication) created in PostgreSQL has the `publish_via_partition_root` option set to `true`. Ingestion of partitioned tables has currently the following limitations: - When a table is attached as a partition to a partitioned table after ingestion was started, the connector won't fetch data that existed in the partition table before attaching. - When a sub-partition table is detached from the partitioned table after ingestion was started, the connector won't mark the data from this sub-partition as deleted in the root partition table. - Truncate operation on subpartitions won't mark affected records as deleted. ## Track data changes in tables The connector replicates not only the current state of data from the source tables, but also every state of every row from every changeset. This data is stored in journal tables created in the same schema as the destination table. The journal table names are formatted as: `_JOURNAL__` where `` is the value of epoch seconds when the source table was added to replication, and `` is an integer increasing with every schema change on the source table. As a result, source tables that undergo schema changes will have multiple journal tables. When a table is removed from replication, then added back, the `` value will change, and `` will start again from `1`. Snowflake recommends that you don't alter the structure of journal tables in any way. They are used by the connector to update the destination table as part of the replication process. The connector never drops journal tables, but does make use of the latest journal for every replicated source table, only reading append-only streams on top of journals. To reclaim the storage, you can: - Truncate all journal tables at any time. - Drop the journal tables related to source tables that were removed from replication. - Drop all but the latest generation journal tables for actively replicated tables. For example, if your connector is set to actively replicate source table `orders`, and you have earlier removed table `customers` from replication, you may have the following journal tables. In this case you can drop all of them *except* `orders_5678_2`. ```text customers_1234_1 customers_1234_2 orders_5678_1 orders_5678_2 ``` ## Configure scheduling of merge tasks The connector uses a warehouse to merge change data capture (CDC) data into destination tables. This operation is triggered by the MergeSnowflakeJournalTable processor. If there are no new changes or if no new flow files are waiting in the MergeSnowflakeJournalTable queue, no merge is triggered and the warehouse auto-suspends. To limit the warehouse cost and limit merges to only scheduled time, use the CRON expression in the Merge task Schedule CRON parameter. It throttles the flow files coming to the MergeSnowflakeJournalTable processor and merges are triggered only in a dedicated period of time. For more information about scheduling, see [Scheduling strategy](https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-strategy). ## Stop or delete the connector When stopping or removing the connector, you have to consider the [replication slot](https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION-SLOTS) that the connector uses. The connector creates its own replication slot with a name starting with `snowflake_connector_` followed by a random suffix. As the connector reads the replication stream, it advances the slot, so that PostgreSQL can trim its WAL log and free up disk space. When the connector is paused, the slot isn't advanced, and changes to the source database keep increasing the WAL log size. You should not keep the connector paused for extended periods of time, especially on high-traffic databases. When the connector is removed, whether by deleting it from the Openflow canvas, or any other means, such as deleting the whole Openflow instance, the replication slot remains in place, and must be dropped manually. If you have multiple connector instances replicating from the same PostgreSQL database, each instance will create its own uniquely-named replication slot. When dropping a replication slot manually, make sure it's the right one. You can see which replication slot is used by a given connector instance by checking the state of the `CaptureChangePostgreSQL` processor. ## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. --- title: Set up the Openflow Connector for SharePoint source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/sharepoint/setup.md section: Loading & Unloading Data --- # Set up the Openflow Connector for SharePoint This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up the Openflow Connector for SharePoint. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/sharepoint/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-sharepoint) connector. ## Set up access to your SharePoint site As an Azure or Office 365 account administrator, perform the following actions: 1. Ensure that you have a [Microsoft Graph](https://learn.microsoft.com/en-us/graph/overview) application registered and that it is configured with the following [application permissions](https://learn.microsoft.com/en-us/graph/permissions-overview?tabs=http#application-permissions) based on your requirements:
**For Microsoft SharePoint (Cortex Search, document ACLs) and Microsoft SharePoint (Simple Ingest, document ACLs):** -
`Sites.Selected`: Limits access to only specified sites.
For more information, see [Sites.Selected](https://learn.microsoft.com/en-us/graph/permissions-reference#sitesselected).
-
`GroupMember.Read.All`: Used for resolving SharePoint group permissions.
For more information, see [GroupMember.Read.All](https://learn.microsoft.com/en-us/graph/permissions-reference#groupmemberreadall).
-
`User.ReadBasic.All`: Used for resolving Microsoft 365 user emails.
For more information, see [User.ReadBasic.All](https://learn.microsoft.com/en-us/graph/permissions-reference#userreadbasicall).
**For Microsoft SharePoint (Cortex Search, no document ACLs) and Microsoft SharePoint (Simple Ingest, no document ACLs):** -
`Sites.Selected`: Limits access to only specified sites.
For more information, see [Sites.Selected](https://learn.microsoft.com/en-us/graph/permissions-reference#sitesselected).
1. Grant the `fullcontrol` role to the application in the selected sites. This role handles folder access changes during CDC ingestion. Grant it using the [Grant-PnPAzureADAppSitePermission](https://github.com/pnp/powershell/blob/dev/documentation/Grant-PnPAzureADAppSitePermission.md) cmdlet, or by calling the [GraphAPI permission endpoint](https://learn.microsoft.com/en-us/graph/api/site-post-permissions), e.g. using `curl`. For more information, see [Roles](https://learn.microsoft.com/en-us/graph/permissions-selected-overview?tabs=http#roles). If you cannot grant the `fullcontrol` role, grant the narrower `read` role to the application instead. However, if access to a folder in the ingested site changes, the connector may enter an irreparable state and will require a full re-ingestion of data. Snowflake recommends granting the `fullcontrol` role to fully mitigate this issue. 2. Configure application credentials based on your use case: **For Microsoft SharePoint (Cortex Search, document ACLs) and Microsoft SharePoint (Simple Ingest, document ACLs):** - Add a new certificate or ensure that you have access to the existing certificate file and its private key. For more information, see [Option 1: Add a certificate](https://learn.microsoft.com/en-us/graph/auth-register-app-v2#option-1-add-a-certificate). -
Create a new client secret and record the secret's value.
For more information, see [Option 2: Add a client secret](https://learn.microsoft.com/en-us/graph/auth-register-app-v2#option-2-add-a-client-secret).
**For Microsoft SharePoint (Cortex Search, no document ACLs) and Microsoft SharePoint (Simple Ingest, no document ACLs):** -
Create a new client secret and record the secret's value.
For more information, see [Option 2: Add a client secret](https://learn.microsoft.com/en-us/graph/auth-register-app-v2#option-2-add-a-client-secret).
3. Record the following information from your Microsoft Graph application: -
The client ID of your application.
For more information, see [Application ID (client ID)](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application#application-id-client-id).
-
The tenant ID of your application.
For more information, see [Find your Microsoft 365 tenant ID](https://learn.microsoft.com/en-us/sharepoint/find-your-office-365-tenant-id).
- The site URL of the Microsoft 365 SharePoint site with the files or folders that you want to ingest into Snowflake; for example, `https://yourtenant.sharepoint.com/sites/YourSite`. ## Set up your Snowflake account As a Snowflake account administrator, perform the following tasks manually or by using the script included below: 1. Create a new role or use an existing role and grant the [](#label-database-privileges). 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. ### Example setup
```sql --The following script assumes you'll need to create all required roles, users, and objects. --However, you may want to reuse some that are already in existence. --Create a Snowflake service user to manage the connector USE ROLE USERADMIN; CREATE USER TYPE=SERVICE COMMENT='Service user for Openflow automation'; --Create a pair of secure keys (public and private). For more information, see --key-pair authentication. Store the private key for the user in a file to supply --to the connector’s configuration. Assign the public key to the Snowflake service user: ALTER USER SET RSA_PUBLIC_KEY = ''; --Create a role to manage the connector and the associated data and --grant it to that user USE ROLE SECURITYADMIN; CREATE ROLE ; GRANT ROLE TO USER ; --The following block is for USE CASE 2 (Cortex connect) ONLY --Create a role for read access to the cortex search service created by this connector. --This role should be granted to any role that will use the service CREATE ROLE ; GRANT ROLE TO ROLE ; --Create the database the data will be stored in and grant usage to the roles created USE ROLE ACCOUNTADMIN; --use whatever role you want to own your DB CREATE DATABASE IF NOT EXISTS ; GRANT USAGE ON DATABASE TO ROLE ; --Create the schema the data will be stored in and grant the necessary privileges --on that schema to the connector admin role: USE DATABASE ; CREATE SCHEMA IF NOT EXISTS ; GRANT USAGE ON SCHEMA TO ROLE ; GRANT CREATE TABLE, CREATE DYNAMIC TABLE, CREATE STAGE, CREATE SEQUENCE, CREATE CORTEX SEARCH SERVICE ON SCHEMA TO ROLE ; --The following block is for CASE 2 (Cortex connect) ONLY --Grant the Cortex read-only role access to the database and schema GRANT USAGE ON DATABASE TO ROLE ; GRANT USAGE ON SCHEMA TO ROLE ; --Create the warehouse this connector will use if it doesn't already exist. Grant the --appropriate privileges to the connector admin role. Adjust the size according to your needs. CREATE WAREHOUSE WITH WAREHOUSE_SIZE = 'MEDIUM' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE; GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE ; ```
## Use case 1: Ingest files only Use a connector to: - Ingest and continuously update Sharepoint files for custom processing within Snowflake - Optionally ingest file permissions (ACL connectors) to persist access controls downstream ### Set up the connector As a data engineer, perform the following tasks to configure the connector: #### Install the connector There are multiple variants of the SharePoint connector. Choose the variant that best fits your use case as described in [](#label-sharepoint-overview-use-cases). To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. #### Configure the connector 1. Populate the process group parameters 1. Right-click on the imported process group and select **Parameters**. 2. Enter the required parameter values as described in [Sharepoint Ingestion Parameters](#sharepoint-ingestion-parameters), [Sharepoint Destination Parameters](#sharepoint-destination-parameters) and [Sharepoint Source Parameters](#sharepoint-source-parameters). ##### Sharepoint Source Parameters **For all connectors:**
**For ACL connectors only:**
##### Sharepoint Destination Parameters
##### Sharepoint Ingestion Parameters **For all connectors:**
**For ACL connectors only:**
1. Run the flow. 1. Start the process group. The flow will create all required objects inside of Snowflake. 2. Right click on the imported process group and select **Start**. ## Use case 2: Ingest files and perform processing with Cortex Use the predefined flow definition to: - Create AI assistants for documents within your organization's SharePoint site - Enable your AI assistants to adhere to access controls specified in your organization's SharePoint site ### Set up the connector As a data engineer, perform the following tasks to configure the connector: #### Install the connector 1. Create a database and schema in Snowflake for the connector to store ingested data. Grant required [](#label-database-privileges) to the role created in the first step. Substitute the role placeholder with the actual value and use the following SQL commands: ```sql CREATE DATABASE DESTINATION_DB; CREATE SCHEMA DESTINATION_DB.DESTINATION_SCHEMA; GRANT USAGE ON DATABASE DESTINATION_DB TO ROLE ; GRANT USAGE ON SCHEMA DESTINATION_DB.DESTINATION_SCHEMA TO ROLE ; GRANT CREATE TABLE ON SCHEMA DESTINATION_DB.DESTINATION_SCHEMA TO ROLE ; ``` To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. #### Configure the connector 1. Populate the process group parameters 1. Right click on the imported process group and select **Parameters**. 2. Enter the required parameter values as described in [Sharepoint Cortex Connect Source Parameters](#sharepoint-cortex-connect-source-parameters), [Sharepoint Cortex Connect Destination Parameters](#sharepoint-cortex-connect-destination-parameters) and [Sharepoint Cortex Connect Ingestion Parameters](#sharepoint-cortex-connect-ingestion-parameters). ##### Sharepoint Cortex Connect Source Parameters **For all connectors:**
**For ACL connectors only:**
##### Sharepoint Cortex Connect Destination Parameters
##### Sharepoint Cortex Connect Ingestion Parameters **For all connectors:**
**For ACL connectors only:**
1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. 3. [Query the Cortex Search service](#query-the-cortex-search-service). ## Use case 3: Customise the connector definition Customize the connector definition to perform custom processing on ingested files. ### Set up the connector As a data engineer, perform the following tasks to configure the connector: #### Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. #### Configure the connector 1. Customize the connector definition. 1. Remove the following process groups: - Check If Duplicate Content - Snowflake Stage and Parse PDF - Update Snowflake Cortex - (Optional) Process Microsoft365 Groups 2. Attach any custom processing to the output of the `Process SharePoint Metadata` process group. Each flow file represents a single SharePoint file change. 2. Populate the process group parameters. Follow the same process as for the use case 1. Note that after modifying the connector definition, not all parameters might be required. 3. Run the flow. 1. Start the process group. The flow will create all required objects inside of Snowflake. 2. Right click on the imported process group and select **Start**. 4. [Query the Cortex Search service](#query-the-cortex-search-service). ## Enabling Sharepoint site groups ### Microsoft Graph application for site groups In addition to the steps specified in [](#label-openflow-sharepoint-setup-access), do the following: 1. Add [Sites.Selected](https://learn.microsoft.com/en-us/graph/permissions-reference#sitesselected) SharePoint permission. You should see *Sites.Selected* in both Microsoft Graph and SharePoint permissions. 2. [Generate a key pair](https://learn.microsoft.com/en-us/entra/identity-platform/howto-create-self-signed-certificate). Alternatively, you can create a self-signed certificate with *openssl* by running the following command: ```bash openssl req -x509 -nodes -newkey rsa:2048 -keyout key.pem -out cert.pem -days 365 ``` The command above doesn't encrypt the generated private key. Remove the *-nodes* argument if you want to generate an encrypted key. 3. [Attach the certificate](https://learn.microsoft.com/en-us/graph/applications-how-to-add-certificate?tabs=http) to the Microsoft Graph application. ## Query the Cortex Search service You can use the [Cortex Search](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) service to build chat and search applications to chat with or query your documents in SharePoint. After you install and configure the connector and it begins ingesting content from Sharepoint, you can query the Cortex Search service. For more information about using Cortex Search, see [Query a Cortex Search service](/user-guide/snowflake-cortex/cortex-search/query-cortex-search-service). **Filter responses** To restrict responses from the Cortex Search service to documents that a specific user has access to in SharePoint, you can specify a filter containing the user ID or email address of the user when you query Cortex Search. For example, `filter.@contains.user_ids` or `filter.@contains.user_emails`. The name of the Cortex Search service created by the connector is `search_service` in the schema `Cortex`. Run the following SQL code in a SQL worksheet to query the Cortex Search service with files ingested from your SharePoint site. Replace the following: - application_instance_name: Name of your database and connector application instance. - user_emailID: Email ID of the user who you want to filter the responses for. - your_question: The question that you want to get responses for. - number_of_results: Maximum number of results to return in the response. The maximum value is 1000 and the default value is 10. ```sql SELECT PARSE_JSON( SNOWFLAKE.CORTEX.SEARCH_PREVIEW( '.cortex.search_service', '{ "query": "", "columns": ["chunk", "web_url"], "filter": {"@contains": {"user_emails": ""} }, "limit": }' ) )['results'] AS results ``` Here's a complete list of values that you can enter for `columns`: **For all connectors:**
**For ACL connectors only:**
**Example: Query an AI assistant for human resources (HR) information** You can use Cortex Search to query an AI assistant for employees to chat with the latest versions of HR information, such as onboarding, code of conduct, team processes, and organization policies. Using response filters, you can also allow HR team members to query employee contracts while adhering to access controls configured in SharePoint.
Run the following in a [SQL worksheet](#label-snowsight-worksheets-create-file) to query the Cortex Search service with files ingested from SharePoint. Select the database as your application instance name and schema as **Cortex**. Replace the following: - application_instance_name: Name of your database and connector application instance. - user_emailID: Email ID of the user who you want to filter the responses for. ```sql SELECT PARSE_JSON( SNOWFLAKE.CORTEX.SEARCH_PREVIEW( '.cortex.search_service', '{ "query": "What is my vacation carry over policy?", "columns": ["chunk", "web_url"], "filter": {"@contains": {"user_emails": ""} }, "limit": 1 }' ) )['results'] AS results ```
**Python:** Run the following code in a [Python worksheet](#label-snowsight-worksheets-create) to query the Cortex Search service with files ingested from SharePoint. Ensure that you add the `snowflake.core` package to your database. Replace the following: - application_instance_name: Name of your database and connector application instance. - user_emailID: Email ID of the user who you want to filter the responses for. ```python from snowflake.snowpark import Session from snowflake.core import Root def main(session: snowpark.Session): root = Root(session) # fetch service my_service = (root .databases[""] .schemas["cortex"] .cortex_search_services["search_service"] ) # query service resp = my_service.search( query="What is my vacation carry over policy?", columns = ["chunk", "web_url"], filter = {"@contains": {"user_emails": ""} }, limit=1 ) return (resp.to_json()) ``` **REST API:** Execute the following code in a command-line interface to query the Cortex Search service with files ingested from your SharePoint. You will need to authentication through key pair authentication and OAuth to access the Snowflake REST APIs. For more information, see [](#label-cortex-search-query-syntax-rest) and [](/developer-guide/snowflake-rest-api/authentication). Replace the following: - application_instance_name: Name of your database and connector application instance. - account_url: Your Snowflake account URL. For instructions on finding your account URL, see [](#label-account-name-find). ```bash curl --location "https:///api/v2/databases//schemas/cortex/cortex-search-services/search_service" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer " \ --data '{ "query": "What is my vacation carry over policy?", "columns": ["chunk", "web_url"], "limit": 1 }' ``` Sample response: ```text { "results" : [ { "web_url" : "https://.sharepoint.com/sites//", "chunk" : "Answer to the question asked." } ] } ``` ## Finding files in stage Files stored in the stage may have unreadable names. To find specific files, use the metadata tables as your source of truth. These tables contain the mapping between file names and their corresponding file IDs in the stage. For Cortex-enabled setups, use the following query to find files: ```sql SELECT DISTINCT METADATA:id FROM DOCS_CHUNKS WHERE METADATA:fullName LIKE '%%'; ``` For non-Cortex setups, use the following query: ```sql SELECT FILE_ID FROM DOC_METADATA WHERE FILE_NAME = ''; ``` Replace `` with the name or partial name of the file you're looking for. The files in the stage start with the ID returned from these queries. --- title: Set up the Openflow Connector for Slack source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/slack/setup.md section: Loading & Unloading Data --- # Set up the Openflow Connector for Slack This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/slack/setup) This topic describes the steps to set up the Openflow Connector for Slack. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/slack/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-slack) connector. ## Set up a Slack App Set up a Slack App in your Slack workspace. A Slack Admin is needed to set up access to the Slack Workspace. This is done by creating or supplying credentials to a Slack App and installing the App to the Slack workspace and channels. You can create a Slack App by using the JSON configuration: 1. Update the JSON manifest. Copy the JSON manifest text below. Change the name and display name properties from `EXAMPLE_NAME_CHANGE_THIS` to the desired name of your Slack App. It is recommended to use the same name and display name for your App. ```json { "display_information": { "name": "EXAMPLE_NAME_CHANGE_THIS" }, "features": { "bot_user": { "display_name": "EXAMPLE_NAME_CHANGE_THIS", "always_online": false } }, "oauth_config": { "scopes": { "bot": [ "channels:history", "channels:read", "groups:history", "groups:read", "im:history", "im:read", "mpim:history", "mpim:read", "users.profile:read", "users:read", "users:read.email", "files:read", "app_mentions:read", "reactions:read" ] } }, "settings": { "event_subscriptions": { "bot_events": [ "message.channels", "message.groups", "message.im", "message.mpim", "reaction_added", "reaction_removed", "file_created", "file_deleted", "file_change" ] }, "interactivity": { "is_enabled": true }, "org_deploy_enabled": false, "socket_mode_enabled": true, "token_rotation_enabled": false } } ``` 2. Create a Slack app through the [Apps page](https://api.slack.com/apps). 1. On the **Your Apps** page, select **Create New App**. 2. Select **From a manifest**. 3. Select the **Workspace** where you'll be developing your app. You'll be able to [distribute your app](https://api.slack.com/distribution) to other workspaces later if you choose. 4. Copy the updated manifest JSON from step 1. 3. Generate an app-level token. You need to create an app-level token even after using the JSON manifest. Under **Basic Information**, scroll to the **App-level tokens** section and click the button to generate an [app-level token](https://api.slack.com/concepts/token-types#app). Include the *connections:write* scope to the token. 4. Install and authorize the app. 1. Return to the **Basic Information** section of the app management page. 2. Install your app by selecting the **Install to Workspace** button. 3. You'll now be sent through the Slack OAuth flow. Select **Allow** on the following screen. If you want to add your app to a different workspace besides your own, these steps would need to be performed by a user from that workspace. After installation, navigate back to the **OAuth & Permissions** page. You'll see an **access token** under **OAuth Tokens**. Access tokens represent the permissions delegated to your app by the installing user. Keep it safe and secure. Avoid checking them into public version control. Instead, access them through an environment variable. 5. Adding the App to channels. Your app isn't a member of any channels yet, so pick a channel to add some test messages in and `/invite` your app. For example, `/invite @Grocery Reminders`. Restart the processors to load the new channels. After the App is added to a new channel, the `Consume Slack Conversation` processor in the OpenFlow Runtime needs to be stopped and restarted. ## Setup necessary ingress rules A Snowflake Admin should follow the [egress guide](#label-working-with-services-jobs-egress) to apply egress rules to the endpoint `https://slack.com/api` and enable WebSocket egress on `wss://wss.slack.com`. This is easiest done by adding a rule to enable egress on the “slack.com” domain. ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1. Create a new role or use an existing role and grant the [](#label-database-privileges). 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. ## Use case 1: Ingest Slack content only Use the connector definition to:
- Perform custom analysis on ingested Slack data (no Cortex Search processing). - Ingest Slack messages, reactions, file attachments, and member lists into Snowflake, and keep them up to date.
### Set up the connector As a data engineer, perform the following tasks to configure the connector: #### Install the connector 1. Create a database and schema in Snowflake for the connector to store ingested data. Grant required [](#label-database-privileges) to the role created in the first step. Substitute the role placeholder with the actual value and use the following SQL commands: ```sql CREATE DATABASE DESTINATION_DB; CREATE SCHEMA DESTINATION_DB.DESTINATION_SCHEMA; GRANT USAGE ON DATABASE DESTINATION_DB TO ROLE ; GRANT USAGE ON SCHEMA DESTINATION_DB.DESTINATION_SCHEMA TO ROLE ; GRANT CREATE TABLE ON SCHEMA DESTINATION_DB.DESTINATION_SCHEMA TO ROLE ; ``` To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. #### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Enter the required parameter values as described in **Flow parameters: Ingest content only** below. 3. Right-click on the canvas and select **Enable all controller services**. 4. Right-click on the imported process group and select **Start**. The flow creates all required Snowflake objects and begins ingesting Slack data. ##### Flow parameters: Ingest content only
## Use case 2: Ingest Slack content and enable Cortex Use the connector definition to:
- Make Slack data ready for conversational search with Snowflake Cortex. - Ensure Slack channel access controls are respected in search results.
### Set up the connector As a data engineer, perform the following tasks to configure the connector: #### Install the connector 1. Create a database and schema in Snowflake for the connector to store ingested data. Grant required [](#label-database-privileges) to the role created in the first step. Substitute the role placeholder with the actual value and use the following SQL commands: ```sql CREATE DATABASE DESTINATION_DB; CREATE SCHEMA DESTINATION_DB.DESTINATION_SCHEMA; GRANT USAGE ON DATABASE DESTINATION_DB TO ROLE ; GRANT USAGE ON SCHEMA DESTINATION_DB.DESTINATION_SCHEMA TO ROLE ; GRANT CREATE TABLE ON SCHEMA DESTINATION_DB.DESTINATION_SCHEMA TO ROLE ; ``` To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. #### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Enter the required parameter values as described in **Flow parameters: Ingest content and enable Cortex** below. 3. Right-click on the canvas and select **Enable all controller services**. 4. Right-click on the imported process group and select **Start**. 5. Once the flow is running, proceed to [Query the Cortex Search service](#query-the-cortex-search-service) for testing. ##### Flow parameters: Ingest content and enable Cortex
## Enabling private-channel ACLs No extra steps are required beyond **inviting the Slack App** to each private channel. The connector automatically refreshes the member list and stores it in the membership table at each **Refresh Slack Members** interval. ## Query the Cortex Search service After Use case 2 is running and the Cortex Search service has been created, you can query it as follows: ```sql SELECT PARSE_JSON( SNOWFLAKE.CORTEX.SEARCH_PREVIEW( '..<', '{ "query": "What is my vacation carry over policy?", "columns": ["text","channel","ts","username"], "filter": {"@contains": {"memberemails": "alice@example.com"}}, "limit": 10 }' ) )['results'] AS results; ``` **Common searchable columns** `text`, `type`, `subtype`, `channel`, `user`, `username`, `connectorId`, `workspaceId`, `ts`, `threadTs` **Example: Query an AI assistant for human resources (HR) information** You can use Cortex Search to query an AI assistant for employees to chat about the latest Slack posts. The messages that are searched can come from informative Slack channels such as general or it-help.
Run the following in a [SQL worksheet](#label-snowsight-worksheets-create-file) to query the Cortex Search service over messages ingested from Slack. Replace the following: - cortex_db: Name of the database containing the cortex search service, specified by the *Destination Database* parameter. - cortex_schema: Name of the schema containing the cortex search service, specified by the *Destination Schema* parameter. - cortex_search_service_name: Name of the cortex search service, specified by the *Cortex Search Name* parameter. - user_emailID: Email ID of the user who you want to filter the responses for. ```sql SELECT PARSE_JSON( SNOWFLAKE.CORTEX.SEARCH_PREVIEW( '..', '{ "query": "What is my vacation carry over policy?", "columns": ["text", "channel", “ts”,”username”], "filter": {"@contains": {"memberemails": ""} }, "limit": 1 }' ) )['results'] AS results ```
**Python:** Run the following code in a [Python worksheet](#label-snowsight-worksheets-create) to query the Cortex Search service over messages ingested from Slack Ensure that you add the `snowflake.core` package to your database. Replace the following: - cortex_db: Name of the database containing the cortex search service, specified by the *Destination Database* parameter. - cortex_schema: Name of the schema containing the cortex search service, specified by the *Destination Schema* parameter. - cortex_search_service_name: Name of the cortex search service, specified by the *Cortex Search Name* parameter. - user_emailID: Email ID of the user who you want to filter the responses for. ```python from snowflake.snowpark import Session from snowflake.core import Root def main(session: snowpark.Session): root = Root(session) # fetch service my_service = (root .databases[""] .schemas[""] .cortex_search_services[""] ) # query service resp = my_service.search( query="What is my vacation carry over policy?", columns = ["text", "channel", "ts","username"], filter = {"@contains": {"memberemails": ""} }, limit=1 ) return (resp.to_json()) ``` **REST API:** Execute the following code in a command-line interface to query the Cortex Search service over messages ingested from Slack. You will need to authentication through key pair authentication and OAuth to access the Snowflake REST APIs. For more information, see [](#label-cortex-search-query-syntax-rest) and [](/developer-guide/snowflake-rest-api/authentication). Replace the following: - cortex_db: Name of the database containing the cortex search service, specified by the *Destination Database* parameter. - cortex_schema: Name of the schema containing the cortex search service, specified by the *Destination Schema* parameter. - cortex_search_service_name: Name of the cortex search service, specified by the *Cortex Search Name* parameter. - account_url: Your Snowflake account URL. For instructions on finding your account URL, see [](#label-account-name-find). ```bash curl --location "https:///api/v2/databases//schemas//cortex-search-services/" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer " \ --data '{ "query": "What is my vacation carry over policy?", "columns": ["text", "channel"], "limit": 1 }' ``` Sample response: ```text { "results" : [ { "channel" : "dev notes", "text" : "Answer to the question asked." } ] } ``` --- title: Set up the Openflow Connector for Snowflake to Kafka source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/snowflake-to-kafka/setup.md section: Loading & Unloading Data --- # Set up the %sf-kafka% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up the %sf-kafka%. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/snowflake-to-kafka/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. Create a Snowflake stream that will be queried for the changes. 4. Create a Kafka topic that will receive CDC messages from the Snowflake stream. ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1. Create the database, source table, and the stream object that the connector will use for reading CDC events. For example: ```sql create database stream_db; use database stream_db; create table stream_source (user_id varchar, data varchar); create stream stream_on_table on table stream_source; ``` 2. Create a new role or use an existing role, and grant the SELECT privilege on the stream and the source object for the stream. The connector will also need the USAGE privilege on the database and schema containing the stream and source object for the stream. For example: ```sql create role stream_reader; grant usage on database stream_db to role stream_reader; grant usage on schema stream_db.public to role stream_reader; grant select on stream_source to role stream_reader; grant select on stream_on_table to role stream_reader; ``` 3. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). For example: ```sql create user stream_user type = service; ``` 4. Grant the Snowflake service user the role you created in the previous steps. For example: ```sql grant role stream_reader to user stream_user; ``` 5. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 3. 6. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. However, note that the private key generated in step 4 can be used directly as a configuration parameter for the connector configuration. In such a case, the private key is stored in Openflow runtime configuration. If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 7. Designate a warehouse for the connector to use. One connector can replicate single table to a single Kafka Topic. For this kind of processing, you can select the smallest warehouse. ## Set up the connector As a data engineer, perform the following tasks to install and configure a connector: 1. Navigate to the Openflow Overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find and choose the connector depending on what kind of Kafka broker instance the connector should communicate with. - mTLS version: Choose this connector if you are using the SSL (mutual TLS) security protocol, or if you are using the SASL_SSL protocol and connecting to the broker that is using self-signed certificates. - SASL version: Choose this connector if you are using any other security protocol 3. Select **Add to runtime**. 4. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list. 5. Select **Add**. 6. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 7. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. 8. Right-click on the imported process group and select **Parameters**. 9. Populate the required parameter values as described in [Flow parameters](#flow-parameters). ### Flow parameters This section describes the flow parameters that you can configure based on the following parameter contexts: - [Kafka Sink Source Parameters](#kafka-sink-source-parameters) - [Kafka Sink Destination Parameters](#kafka-sink-destination-parameters) - [Kafka Sink Ingestion Parameters](#kafka-sink-ingestion-parameters) #### Kafka Sink Source Parameters
#### Kafka Sink Destination Parameters
#### Kafka Sink Ingestion Parameters
## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. --- title: Set up the Openflow Connector for SQL Server source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/sql-server/setup.md section: Loading & Unloading Data --- # Set up the %sqlserver% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/sql-server/about) - [](/user-guide/data-integration/openflow/connectors/sql-server/data-mapping) This topic describes how to set up the %sqlserver%. For information on the incremental load process, see [Incremental replication](/user-guide/data-integration/openflow/connectors/sql-server/incremental-replication). ## Prerequisites Before setting up the connector, ensure that you have completed the following prerequisites: 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/sql-server/about). 2. Ensure that you have reviewed [](#label-sql-server-versions). 3. Ensure that you have set up your runtime deployment. For more information, see the following topics: - [](/user-guide/data-integration/openflow/setup-openflow-byoc) - [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 4. If you use %ofsfspcs-plural%, ensure that you have reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-sqlserver) connector. ## Set up your SQL Server instance Before setting up the connector, perform the following tasks in your SQL Server environment: You must perform these tasks as a database administrator. 1. Enable change tracking on the [databases](https://learn.microsoft.com/en-us/sql/relational-databases/track-changes/enable-and-disable-change-tracking-sql-server?view=sql-server-ver16#enable-change-tracking-for-a-database) and [tables](https://learn.microsoft.com/en-us/sql/relational-databases/track-changes/enable-and-disable-change-tracking-sql-server?view=sql-server-ver16#enable-change-tracking-for-a-table) that you plan to replicate, as shown in the following SQL Server example: ```sql ALTER DATABASE SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON); ALTER TABLE .
Parameter Description
SharePoint Site URL URL or SharePoint site from which the connector will ingest content
SharePoint Client ID Microsoft Entra client ID. To learn about client ID and how to find it in Microsoft Entra, see [Application ID (client ID)](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application#application-id-client-id).
SharePoint Client Secret Microsoft Entra Client Secret. To learn about a client secret and how to find it in Microsoft Entra, see [Certificates & secrets](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application#certificates--secrets).
SharePoint Tenant ID Microsoft Entra Tenant ID. To learn about tenant ID and how to find it in Microsoft Entra, see [Find your Microsoft 365 tenant ID](https://learn.microsoft.com/en-us/sharepoint/find-your-office-365-tenant-id).
Parameter Description
Sharepoint Application Private Key A generated application private key in PEM format. The key must be unencrypted.
Sharepoint Site Domain A domain name of the synchronized Sharepoint site.
Sharepoint Application Certificate A generated application certificate in PEM format.
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
SharePoint Source Folder Supported files from this folder and all its subfolders is ingested into Snowflake. The folder path is relative to a Shared Documents library.
File Extensions To Ingest A comma-separated list that specifies file extensions to ingest. The connector tries to convert the files to PDF format first, if possible. Nonetheless, the extension check is performed on the original file extension. To learn about the formats that can be converted, see [Format options](https://learn.microsoft.com/en-us/graph/api/driveitem-get-content-format?view=graph-rest-1.0&tabs=http#format-options) If some of the specified file extensions are not supported by Cortex Parse Document, then the connector ignores those files, logs a warning message in an event log, and continues processing other files.
Sharepoint Document Library Name A library in the SharePoint Site to ingest files from.
Snowflake File Hash Table Name Name of the table to store file hashes to determine if the content has changed. This parameter should generally not be changed.
Parameter Description
Sharepoint Site Groups Enabled Specifies whether the Site Groups functionality is enabled.
Parameter Description
SharePoint Site URL URL or SharePoint site from which the connector will ingest content
SharePoint Client ID Microsoft Entra client ID. To learn about client ID and how to find it in Microsoft Entra, see [Application ID (client ID)](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application#application-id-client-id).
SharePoint Client Secret Microsoft Entra Client Secret. To learn about a client secret and how to find it in Microsoft Entra, see [Certificates & secrets](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application#certificates--secrets).
SharePoint Tenant ID Microsoft Entra Tenant ID. To learn about tenant ID and how to find it in Microsoft Entra, see [Find your Microsoft 365 tenant ID](https://learn.microsoft.com/en-us/sharepoint/find-your-office-365-tenant-id).
Parameter Description
Sharepoint Application Private Key A generated application private key in PEM format. The key must be unencrypted.
Sharepoint Site Domain A domain name of the synchronized Sharepoint site.
Sharepoint Application Certificate A generated application certificate in PEM format.
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
SharePoint Source Folder Supported files from this folder and all its subfolders is ingested into Snowflake. The folder path is relative to a Shared Documents library.
File Extensions To Ingest A comma-separated list that specifies file extensions to ingest. The connector tries to convert the files to PDF format first, if possible. Nonetheless, the extension check is performed on the original file extension. To learn about the formats that can be converted, see [Format options](https://learn.microsoft.com/en-us/graph/api/driveitem-get-content-format?view=graph-rest-1.0&tabs=http#format-options) If some of the specified file extensions are not supported by Cortex Parse Document, then the connector ignores those files, logs a warning message in an event log, and continues processing other files.
Sharepoint Document Library Name A library in the SharePoint Site to ingest files from.
Snowflake File Hash Table Name Name of the table to store file hashes to determine if the content has changed. This parameter should generally not be changed.
OCR Mode The OCR mode to use when parsing files with [](/user-guide/snowflake-cortex/parse-document) function. The value can be `OCR` or `LAYOUT`. In `OCR` mode, only raw text content is extracted, ignoring formatting and table structures. In `LAYOUT` mode, the output preserves table structures as Markdown.
Snowflake Cortex Search Service User Role An identifier of a role that is assigned usage permissions on the Cortex Search service.
Parameter Description
Sharepoint Site Groups Enabled Specifies whether the Site Groups functionality is enabled.
Column name Type Description
`full_name` String A full path to the file from the Sharepoint site documents root. Example: `folder_1/folder_2/file_name.pdf`.
`web_url` String A URL that displays an original Sharepoint file in a browser.
`last_modified_date_time` String Date and time when the item was most recently modified.
`chunk` String A piece of text from the document that matched the Cortex Search query.
Column name Type Description
`user_ids` Array An array of Microsoft 365 user IDs that have access to the document. It also includes user IDs from all the Microsoft 365 groups that are assigned to the document. To find a specific user ID, see [Get a user](https://learn.microsoft.com/en-us/graph/api/user-get?view=graph-rest-1.0&tabs=http).
`user_emails` Array An array of Microsoft 365 user email IDs that have access to the document. It also includes user email IDs from all the Microsoft 365 groups that are assigned to the document.
Parameter Description
App Token Slack *App-level token* generated in the Slack App.
Bot Token Slack *Bot token* generated in the Slack App.
Destination Database Database to contain all connector objects (created if absent).
Destination Schema Schema inside the database (created if absent).
Snowflake Account Snowflake account identifier.
Snowflake Role Role the flow assumes after authentication.
Snowflake User Username the flow uses to connect.
Snowflake Private Key RSA private key used for authentication (PKCS8 PEM format). Note that either Snowflake Private Key or Snowflake Private Key File must be defined.
Snowflake Private Key Password Password for the encrypted private key (leave blank if unencrypted).
Snowflake Private Key File File containing the RSA Private Key (PKCS8 PEM format). The header line starts with `-----BEGIN PRIVATE`.
Snowflake Warehouse Warehouse used for SQL executed by the flow.
Upload Interval Time to gather data before pushing to Snowflake. A longer interval reduces load on Snowflake but may increase latency and memory usage.
Refresh Slack Members Minutes between Slack membership (ACL) refreshes.
Parameter Description
App Token Slack *App-level token* generated in the Slack App.
Bot Token Slack *Bot token* generated in the Slack App.
Destination Database Database to contain all connector objects (created if absent).
Destination Schema Schema inside the database (created if absent).
Upload Interval Time to gather data before pushing to Snowflake. A larger value reduces load but increases data latency.
Snowflake Account Snowflake account identifier.
Snowflake Role Role the flow assumes after authentication.
Snowflake User Username the flow uses to connect.
Snowflake Private Key PEM-formatted private key for key-pair authentication.
Snowflake Private Key Password Password for the encrypted private key (blank if unencrypted).
Snowflake Warehouse Warehouse used for all SQL executed by the flow **and** by Cortex.
Refresh Slack Members Minutes between Slack membership (ACL) refreshes.
Parameter Description Required
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Source Database Source database. This database should contain the Snowflake Stream object that will be consumed. Yes
Snowflake Private Key Password When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake Private Key File. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake Role. You can find your Snowflake Role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Snowflake Private Key Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide the RSA private key used for authentication. The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either Snowflake Private Key File or Snowflake Private Key must be defined. Yes
Snowflake Private Key File Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, upload the file that contains the RSA Private Key used for authentication to Snowflake, formatted according to PKCS8 standards and having standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. Select the **Reference asset** checkbox to upload the private key file. No
Source Schema The source schema. This schema should contain Snowflake Stream object that will be consumed. Yes
Snowflake Warehouse Snowflake warehouse used to run queries Yes
Parameter Description Required
Kafka Bootstrap Servers A comma-separated list of Kafka brokers to send data to. Yes
Kafka SASL Mechanism SASL mechanism used for authentication. Corresponds to the Kafka Client `sasl.mechanism` property. Possible values: - `PLAIN` - `SCRAM-SHA-256` - `SCRAM-SHA-512` - `AWS_MSK_IAM` Yes
Kafka SASL Username The username to authenticate to Kafka Yes
Kafka SASL Password The password to authenticate to Kafka Yes
Kafka Security Protocol Security protocol used to communicate with brokers. Corresponds to the Kafka Client `security.protocol` property. Possible values: - `PLAINTEXT` - `SASL_PLAINTEXT` - `SASL_SSL` - `SSL` Yes
Kafka Topic The Kafka topic, where CDCs from Snowflake Stream will be sent Yes
Kafka Message Key Field Specify the database column name that will be used as the Kafka message key. If not specified, the message key will not be set. If specified, the value of this column will be used as a message key. The value of this parameter is case-sensitive. No
Kafka Keystore Filename A full path to a keystore storing a client key and certificate for mTLS authentication method. Required for mTLS authentication and when the security protocol is SSL. No
Kafka Keystore Type The type of keystore. Required for mTLS authentication. Possible values: - `PKCS12` - `JKS` - `BCFKS` No
Kafka Keystore Password The password used to secure keystore file. No
Kafka Key Password A password for the private key stored in the keystore. Required for mTLS authentication. No
Kafka Truststore Filename A full path to a truststore storing broker certificates. The client will use the certificate from this truststore to verify broker identity. No
Kafka Truststore Type The type of truststore file. Possible values: - `PKCS12` - `JKS` - `BCFKS` No
Kafka Truststore Password A password for the truststore file. No
Parameter Description Required
Snowflake FQN Stream Name Fully qualified Snowflake stream name. Yes
ENABLE CHANGE_TRACKING; ``` Run these commands for every database and table that you plan to replicate. The connector requires that change tracking is enabled on the databases and tables before replication starts. Ensure that every table that you plan to replicate has enabled change tracking. You can also enable change tracking on additional tables while the connector is running. 2. Create a login for the SQL Server instance: ```sql CREATE LOGIN WITH PASSWORD = ''; ``` This login is used to create users for the databases you plan to replicate. 3. Create a user for each database you are replicating by running the following SQL Server command in each database: ```sql USE ; CREATE USER FOR LOGIN ; ``` 4. Grant the SELECT and VIEW CHANGE TRACKING permissions to the user for each database that you are replicating: ```sql GRANT SELECT ON ..
TO ; GRANT VIEW CHANGE TRACKING ON ..
TO ; ``` Run these commands in each database for every table that you plan to replicate. These permissions must be granted to the user of each database that you created in a previous step. 5. (Optional) Grant the VIEW DEFINITION privilege on the User Defined Data Types (UDDT). If your tables contain columns that use User Defined Data Types (UDDT), and the UDDT is owned by a different user than the connector user, you must grant the VIEW DEFINITION permission to the connector user as shown in the following SQL Server example: ```sql GRANT VIEW DEFINITION TO ; ``` Without this permission, columns using UDDT are silently excluded from replication. 6. (Optional) Configure SSL connection. If you use an SSL connection to connect SQL Server, create the root certificate for your database server. This is required when configuring the connector. ## Set up your Snowflake environment As a Snowflake administrator, perform the following tasks: 1. Create a destination database in Snowflake to store the replicated data: ```sql CREATE DATABASE ; ``` 2. Create a Snowflake [service user](#label-user-type-property): ```sql CREATE USER TYPE = SERVICE COMMENT='Service user for automated access of Openflow'; ``` 3. Create a Snowflake role for the connector and grant the required privileges: ```sql CREATE ROLE ; GRANT ROLE TO USER ; GRANT USAGE ON DATABASE TO ROLE ; GRANT CREATE SCHEMA ON DATABASE TO ROLE ; ``` Use this role to manage the connector's access to the Snowflake database. To create objects in the destination database, you must grant the [USAGE and CREATE SCHEMA privileges](#label-database-privileges) on the database to the role used to manage access. 4. Create a Snowflake warehouse for the connector and grant the required privileges: ```sql CREATE WAREHOUSE WITH WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE; GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE ; ``` Snowflake recommends starting with a XSMALL warehouse size, then experimenting with size depending on the number of tables being replicated and the amount of data transferred. Large numbers of tables typically scale better with multi-cluster warehouses, rather than a larger warehouse size. For more information, see [multi-cluster warehouses](/user-guide/warehouses-multicluster). 5. Set up the public and private keys for key pair authentication: 1. Create a pair of secure keys (public and private). 2. Store the private key for the user in a file to supply to the connector's configuration. 3. Assign the public key to the Snowflake service user: ```sql ALTER USER SET RSA_PUBLIC_KEY = 'thekey'; ``` For more information, see [](/user-guide/key-pair-auth). ## Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ## Runtime sizing The runtime size determines the CPU and memory available to the connector. The available sizes are Small, Medium, and Large. The connector requires Medium or Large. Choose the size when you create the runtime: you can't change the size of an existing runtime in place. Choose Large if you expect high replication throughput or if source tables contain wide rows. ## Resize a runtime Runtime size is fixed at creation, so to change size you run the connector on a different runtime. You have two options depending on whether you want to preserve the current replication progress. If you don't need to keep the progress of the current connector, the simplest path is to create a new runtime at the size you need and install a new connector instance on it. The new connector starts from scratch: it snapshots all configured tables and then captures ongoing changes from that point. The replication progress of the existing connector is discarded. To keep the progress of the current connector, for example to avoid re-snapshotting tables that took a long time to snapshot initially, migrate the connector to the new runtime. This reuses the existing destination tables and resumes incremental replication from where it left off. For migration instructions, see [Reinstall the connector](#label-sql-server-reinstall-connector). ## Configure the connector To configure the connector, do the following as a data engineer: 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values. For more information on the required parameter values, see the following sections: - [](#label-of-sqlserver-source-parameters): Used to establish a connection with SQL Server. - [](#label-of-sqlserver-destination-parameters): Used to establish a connection with Snowflake. - [](#label-of-sqlserver-ingestion-parameters): Used to specify the tables to replicate. Start by setting the parameters of the SQLServer Source Parameters context, then the SQLServer Destination Parameters context. After you complete this, enable the connector. The connector connects to both SQLServer and Snowflake and starts running. However, the connector doesn't replicate any data until any tables to be replicated are explicitly added to its configuration. To configure specific tables for replication, edit the SQLServer Ingestion Parameters context. After you apply the changes to the SQLServer Ingestion Parameters context, the configuration is picked up by the connector, and the replication lifecycle starts for every table. ### SQLServer Source Parameters
Azure SQL Database refers to the single-database PaaS offering, not Azure SQL Managed Instance. ### SQLServer Destination Parameters
### SQLServer Ingestion Parameters
## Replicate tables from a SQL Server replica server The connector can ingest data from a primary server or from a subscriber server using [transactional replication](https://learn.microsoft.com/en-us/sql/relational-databases/replication/transactional/transactional-replication). Before configuring the connector to connect to a SQL Server replica, ensure that replication between the primary and replica nodes works correctly. For instructions on setting up transactional replication, see [Tutorial: Configure transactional replication](https://learn.microsoft.com/en-us/sql/relational-databases/replication/tutorial-replicating-data-between-continuously-connected-servers). When investigating issues with missing data in the connector, first ensure that missing rows and change tracking events are present in the replica server used by the connector. When using a replica server, the connector setup differs from the standard primary server configuration. The connection user and change tracking don't need to be configured on the primary server. Instead, make sure that the connection user is available on the replica server and has access to the data and change tracking tables there. To configure the connector to read from a subscriber server instead of the publisher, specify the subscriber server URL in the **SQLServer Connection URL** parameter. Do not change the database server after replication has started. Each database maintains its own change tracking state independently, so switching to a different server would cause the connector to lose track of which changes have already been processed, and may result in data loss. ## Restart table replication A table in FAILED state — for example, due to a missing primary key or unsupported schema change — does not restart automatically. If a table enters a FAILED state or you need to restart replication from scratch, use the following procedure to remove and re-add the table to replication. If the failure was caused by an issue in the source table such as a missing primary key, resolve that issue in the source database before continuing. 1. Remove the table from replication, using one of the following methods: - Add the table to the **Re-snapshot Table Exclusions** parameter to temporarily exclude it from replication. This is convenient when the table is matched by an **Included Table Regex** that you don't want to change. - In the Ingestion Parameters context, either remove the table from **Included Table Names** or modify the **Included Table Regex** so the table is no longer matched. 2. Verify the table has been removed: 1. In the Openflow runtime canvas, right-click a processor group and choose **Controller Services**. 2. In the table listing controller services, locate the **Table State Store** row, click the three vertical dots on the right side of the row, then choose **View State**. You must wait until the table's state is fully removed from this list before proceeding. Do not continue until this configuration change has completed. 3. Clean up the destination: Once the table's state shows as fully removed, manually [DROP](/sql-reference/sql/drop-table) the destination table in Snowflake. Note that the connector will not overwrite an existing destination table during the snapshot phase; if the table still exists, replication will fail again. Optionally, the journal table and stream can also be removed if they are no longer needed. 4. Re-add the table by reversing the change you made in the first step: either remove the table from **Re-snapshot Table Exclusions**, or add it back to **Included Table Names** or **Included Table Regex**. The connector then re-snapshots the table. 5. Verify the restart: Check the **Table State Store** using the instructions given previously. The state of the table should appear with the status NEW, then transition to SNAPSHOT_REPLICATION, and finally INCREMENTAL_REPLICATION. ## Replicate a subset of columns in a table The connector can filter the data replicated per table to a subset of configured columns. Primary key columns are always included regardless of exclusions. To apply column filters, set the **Column Filter JSON** parameter in the Ingestion Parameters context to a JSON array of filter objects, one per table you want to filter. Columns can be included or excluded by name or by regular expression pattern. You can apply a single condition per table, or combine multiple conditions, with exclusions always taking precedence over inclusions. ## Syntax Each object in the array identifies a table and specifies which columns to include or exclude. Because this connector uses three-part fully qualified names (database, schema, and table), each object can include a `database` or `databasePattern` field in addition to the schema and table fields. ```javascript [ { "database": "" | "databasePattern": "", "schema": "" | "schemaPattern": "", "table": "
Parameter Description
SQLServer Connection URL The full JDBC URL used to connect to the source. For a standalone SQL Server instance or Azure SQL Managed Instance, point the URL at the instance. The connector discovers the databases to replicate from that instance. - `jdbc:sqlserver://example.com:1433;encrypt=false` For Azure SQL Database, point the URL at a specific database using the `databaseName` property. Use one connector instance per database you want to replicate. - `jdbc:sqlserver://your-server.database.windows.net:1433;encrypt=true;databaseName=your_database`
SQLServer JDBC Driver Select the **Reference asset** checkbox to upload the [SQL Server JDBC driver](https://learn.microsoft.com/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server).
SQLServer Username The user name for the connector.
SQLServer Password The password for the connector.
Parameter Description Required
Destination Database The database where data is persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema Pattern A pattern for the names of destination schemas where data is persisted. The connector creates the schemas if they don't exist. You can customize the pattern per ingested table using these optional variables: - `${source.database.name}`: a source table's database. - `${source.schema.name}`: a source table's schema. - `${source.table.name}`: a source table's name. For example, for a table with the qualified name `source_db.tenant_a.data`, the pattern `prefix_${source.database.name}_${source.schema.name}` evaluates to `prefix_source_db_tenant_a`. To ingest all tables into a single schema, provide a schema name without any variables, like `destination_schema`. Don't change this setting after the connector has begun ingesting data. Changing this setting after ingestion has begun breaks the existing ingestion. If you must change this setting, create a new connector instance. Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data is persisted. Yes
Snowflake Connection Strategy When using KEY_PAIR, specify the strategy for connecting to Snowflake: - **STANDARD** (default): Connect using standard public routing to Snowflake services. - **PRIVATE_CONNECTIVITY**: Connect using private addresses associated with the supporting cloud platform such as AWS PrivateLink. Required for BYOC with KEY_PAIR only, otherwise ignored.
Snowflake Object Identifier Resolution Specifies how source object identifiers such as schemas, tables, and columns names are stored and queried in Snowflake. This setting dictates whether you must use double quotes in SQL queries.

Option 1: Default, case-insensitive (recommended).

- **Transformation**: All identifiers are converted to uppercase. For example, `My_Table` becomes `MY_TABLE`. - **Queries**: SQL queries are case-insensitive and don't require SQL double quotes. For example `SELECT * FROM my_table;` returns the same results as `SELECT * FROM MY_TABLE;`. Snowflake recommends using this option if database objects are not expected to have mixed case names. Do not change this setting after connector ingestion has begun. Changing this setting after ingestion has begun breaks the existing ingestion. If you must change this setting, create a new connector instance.

Option 2: case-sensitive.

- **Transformation**: Case is preserved. For example, `My_Table` remains `My_Table`. - **Queries**: SQL queries must use double quotes to match the exact case for database objects. For example, `SELECT * FROM "My_Table";`. Snowflake recommends using this option if you must preserve source casing for legacy or compatibility reasons. For example, if the source database includes table names that differ in case only, such as `MY_TABLE` and `my_table`, that result in a name collision when using case-insensitive comparisons.
Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake Private Key File. No
Snowflake Role When using: - **Session Token Authentication Strategy**: Use Snowflake Role assigned to the runtime or child role granted to this Snowflake Role. You can find your runtime Snowflake Role in the Openflow UI, by expanding the **More Options [⋮]** button for your runtime and selecting **Set Snowflake role**. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Included Table Names A comma-separated list of source table paths, including their databases and schemas, for example: `database_1.public.table_1, database_2.schema_2.table_2`
Included Table Regex A regular expression to match against table paths, including database and schema names. Every path matching the expression is replicated, and new tables matching the pattern that are created later are also included automatically, for example: `database_name\.public\.auto_.*`
Column Filter JSON Optional. A JSON array of filter objects specifying which columns to include or exclude per table. For syntax details and examples, see [Replicate a subset of columns in a table](#replicate-a-subset-of-columns-in-a-table).
Merge Task Schedule CRON CRON expression defining periods when merge operations from Journal to Destination Table will be triggered. Set it to `* * * * * ?` if you want to have continuous merge or time schedule to limit warehouse run time. For example:
- The string `* 0 * * * ?` indicates that you want to schedule merges at full hour for one minute - The string `* 20 14 ? * MON-FRI` indicates that you want to schedule merges at 2:20 PM every Monday through Friday.
For additional information and examples, see the cron triggers tutorial in the [Quartz Documentation](https://www.quartz-scheduler.org/documentation/quartz-2.2.2/tutorials/tutorial-lesson-06.html)
Concurrent Select Queries For Snapshot Maximum number of concurrent queries to the source database to run in the Snapshot flow. Increasing this can speed up snapshotting large numbers of tables, but will also increase the load on the source database.
" | "tablePattern": "", "included": ["", ""], "excluded": ["", ""], "includedPattern": "", "excludedPattern": "" } ] ``` The following rules apply: - Use `database`, `schema`, and `table` for exact name matching, or `databasePattern`, `schemaPattern`, and `tablePattern` for regex matching. You can't use both a field and its pattern variant in the same object (for example, `schema` and `schemaPattern` can't both appear). - At least one of `included`, `excluded`, `includedPattern`, or `excludedPattern` must be provided. - When both included and excluded filters are specified, exclusions take precedence. - When multiple filters match the same table, the last matching filter is used, with exact matches taking precedence over pattern-based filters. - The value can be an array of objects to apply different filters to different tables. ## Examples Include specific columns by name: ```javascript [ { "database": "my_db", "schema": "dbo", "table": "orders", "included": ["account_id", "status", "created_at"] } ] ``` Exclude specific columns by name: ```javascript [ { "database": "my_db", "schema": "dbo", "table": "orders", "excluded": ["internal_note", "debug_flag"] } ] ``` Combine an include pattern with a specific exclusion (for example, include all email columns except `admin_email`): ```javascript [ { "database": "my_db", "schema": "dbo", "table": "contacts", "includedPattern": ".*_email", "excluded": ["admin_email"] } ] ``` Mix a database pattern with an exact schema and table name to apply a filter across databases: ```javascript [ { "databasePattern": "prod_.*", "schema": "dbo", "table": "customers", "excluded": ["internal_note"] } ] ``` Pass multiple filter objects to apply different rules to different tables: ```javascript [ {"database": "my_db", "schema": "dbo", "table": "orders", "included": ["account_id", "status"]}, {"database": "my_db", "schema": "dbo", "table": "customers", "excludedPattern": ".*_internal"} ] ``` ### Including and excluding the same column Removing a column from a table's replicated set (by excluding it or by removing it from the included list) has the same effect on the destination as dropping the column at the source: the connector soft-deletes the column on the destination by renaming it with a suffix (by default, `__SNOWFLAKE_DELETED`). If you then add the column back to the replicated set and later remove it a second time, replication for the affected table fails because the soft-deleted column name is already taken. To recover, restart replication for the affected table. ## Replicate a partitioned table The connector supports replication of partitioned tables. A SQL Server partitioned table is replicated into Snowflake as a single destination table, containing data from all partitions. To replicate a partitioned table, ensure that change tracking is enabled on the partitioned table, as described in [](#label-sql-server-connector-setup-instance). ### Performance on partitioned tables The connector supports partitioned tables. When the connector reads from a partitioned table, it places the partition column first in the queries it sends to SQL Server. This lets SQL Server use partition elimination to quickly narrow the read to the relevant partitions instead of scanning every partition, which significantly reduces the time and resources required to replicate large partitioned tables. This behavior applies whether or not the partition column is part of the table's primary key. You don't need to configure anything to benefit from it. The connector applies the optimization automatically for all partitioned tables. ## Track data changes in tables The connector replicates the current state of data from the source tables, as well as detected changes from each polling interval. This data is stored in journal tables created in the same schema as the destination table. Because the connector uses SQL Server Change Tracking, multiple updates to the same row between polling intervals are rolled up into a single change. Journal tables reflect the net result of changes, not every intermediate state. For more information, see [](/user-guide/data-integration/openflow/connectors/sql-server/about). The journal table names are formatted as: `_JOURNAL__` where `` is the value of epoch seconds when the source table was added to replication, and `` is an integer increasing with every schema change on the source table. As a result, source tables that undergo schema changes will have multiple journal tables. When you remove a table from replication, then add it back, the `` value changes, and `` starts again from `1`. Snowflake recommends not altering the structure of journal tables in any way. The connector uses them to update the destination table as part of the replication process. The connector never drops journal tables, but uses the latest journal for every replicated source table, only reading append-only streams on top of journals. To reclaim the storage, you can: - Truncate all journal tables at any time. - Drop the journal tables related to source tables that were removed from replication. - Drop all but the latest generation journal tables for actively replicated tables. For example, if your connector is set to actively replicate source table `orders`, and you have earlier removed table `customers` from replication, you may have the following journal tables. In this case you can drop all of them *except* `orders_5678_2`. ```text customers_1234_1 customers_1234_2 orders_5678_1 orders_5678_2 ``` ## Configure scheduling of merge tasks The connector uses a warehouse to merge change data capture (CDC) data into destination tables. This operation is triggered by the MergeSnowflakeJournalTable processor. If there are no new changes or if no new flow files are waiting in the MergeSnowflakeJournalTable queue, no merge is triggered and the warehouse auto-suspends. Use the CRON expression in the Merge task Schedule CRON parameter to limit the warehouse cost and limit merges to only scheduled time. It throttles the flow files coming to the MergeSnowflakeJournalTable processor and merges are triggered only in a dedicated period of time. For more information about scheduling, see [Scheduling strategy](https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-strategy). ## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. --- title: Set up the Openflow Connector for Workday source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/workday/setup.md section: Loading & Unloading Data --- # Set up the Openflow Connector for Workday This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to set up the Openflow Connector for Workday. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/workday/about). 2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs). 3. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-workday) connector. ## Get the credentials As a Workday administrator, perform the following actions: 1. Create a user in Workday: 1. Go to Workday and log in as an administrator. In the Workday search bar, type **Create user**. 2. Click **Create Integration System User: Task**. 3. Enter a username and password. 2. Create a security group and add the user from step 1 to it: 1. In the Workday search bar, type **Create Security Group**. 2. Click **Create Security Group: Task**. 3. Set the type to **Integration System Security Group (Unconstrained)**. 4. Enter a Security Group Name and click **OK**. 5. In the **Edit Integration System Security Group (Unconstrained)** window, add the integration system user created in Step 1 in the **Integration System Users** field. 3. Add domain security policies to the security group created on step 2: 1. In the Workday search bar, type **View Security Group**. 2. Go to **Security Group Settings** %raa% **Maintain Domain Permissions for Security Group**. 3. In the **Integration Permissions** section, in the Domain Security Policies permitting Get access field, select the security domains associated with the reports you want to sync. 4. Go to the **Activate Pending Security Policy Changes** page and click **OK**. 4. Create an OAuth client app: 1. In the Workday search bar, type **Register API Client**, and click **Register API Client for Integrations: Task**. 2. Enter a Client Name. 3. Click **Non-Expiring Refresh Token**. 4. In the Scope search bar, type **System** and select it. 5. Click **OK**. 6. Copy the Client ID and Client Secret, then click **Done**. 5. In the **View Integration System Security Group** page, note the functional areas under Domain Security Policies. Then, add these as Scopes/Functional Areas in the API Client: 1. In the search bar, type **View API Client**. 2. Choose your API client from the list. 3. In the top blue bar, click the three dots, then select **API Client** %raa% **API Clients for Integrations**. 4. In the **Scope (Functional Areas)** field, search for and add the functional areas that you noted. 6. In the same menu as before (5c), select **Manage Refresh Tokens for Integrations**. 1. In the form, search for the ISU user and select it. 2. Click **OK**. 3. Click **Generate new token** and copy the refresh token details which will be used later. ## Set up Snowflake account As a Snowflake account administrator, perform the following tasks: 1. Create a new role or use an existing role and grant the [](#label-database-privileges). 2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property). 3. Grant the Snowflake service user the role you created in the previous steps. 4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2. 5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store. If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization. 1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted. 2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values. 3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow. 6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1. 7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with [multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes. ## Set up the connector As a data engineer, perform the following tasks to configure the connector: ### Install the connector 1. Create a database and schema in Snowflake for the connector to store ingested data. Grant required [](#label-database-privileges) to the role created in the first step. Substitute the role placeholder with the actual value and use the following SQL commands: ```sql CREATE DATABASE DESTINATION_DB; CREATE SCHEMA DESTINATION_DB.DESTINATION_SCHEMA; GRANT USAGE ON DATABASE DESTINATION_DB TO ROLE ; GRANT USAGE ON SCHEMA DESTINATION_DB.DESTINATION_SCHEMA TO ROLE ; GRANT CREATE TABLE, CREATE PIPE ON SCHEMA DESTINATION_DB.DESTINATION_SCHEMA TO ROLE ; ``` To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ### Configure the connector 1. Right-click on the imported process group and select **Parameters**. 2. Populate the required parameter values as described in [Flow parameters](#flow-parameters). #### Flow parameters The configuration is divided into three parameter contexts. The *Workday Destination Parameters* and *Workday Source Parameters* contexts are responsible for connecting with Snowflake and Workday. The *Workday Ingestion Parameters* contains all parameters from both configs and other parameters specific to a given report (e.g., *Report URL*). Because the *Workday Ingestion Parameters* parameter context contains report-specific details, new parameter contexts must be created for each new report and process group. To create a new parameter context, go to the menu, select **Parameter Contexts**, and add a new context. It should inherit from both the *Workday Destination Parameters* and *Workday Source Parameters* parameter contexts. **Workday Destination Parameters** **parameter context**
**Workday Source Parameters** **parameter context**
**Workday Ingestion Parameters** **parameter context**
## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. --- title: SetCacheClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/setcacheclientservice.md section: Loading & Unloading Data --- # SetCacheClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides the ability to communicate with a SetCacheServer. This can be used in order to share a Set between nodes in a NiFi cluster ## Tags cache, cluster, distributed, set, state ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: SetCacheServer source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/setcacheserver.md section: Loading & Unloading Data --- # SetCacheServer This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a set (collection of unique values) cache that can be accessed over a socket. Interaction with this service is typically accomplished via a DistributedSetCacheClient service. ## Tags cache, distinct, distributed, server, set ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: Setting Up the Openflow Connector for Google BigQuery source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/google-big-query/setup.md section: Loading & Unloading Data --- # Setting Up the %bigqueryof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/google-big-query/about) - [](/user-guide/data-integration/openflow/connectors/google-big-query/use) This topic describes the steps to set up the %bigqueryof%. ## Prerequisites 1. Review [](/user-guide/data-integration/openflow/connectors/google-big-query/about). 2. Set up your runtime deployment. - [](/user-guide/data-integration/openflow/setup-openflow-byoc) - [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs) 3. If you are using %ofsfspcs-plural%, ensure that you have reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the [domains](#label-of-bq-req-domains) required by the connector. 4. You have access to the Openflow admin role or similar role you use to manage Openflow. 5. If you are creating a Snowflake service user to manage the connector, you have created a key pair authentication. For more information, see [key-pair authentication](/user-guide/key-pair-auth). ## Required endpoints The following endpoints are required for the connector to function: - `bigquery.googleapis.com:443` - `bigquerystorage.googleapis.com:443` - `oauth2.googleapis.com:443` If you are using Openflow - BYOC, you need to configure your cloud network egress to allow TLS 443 access to the endpoints listed above. If you are using %ofsfspcs-plural%, you need to create a network rule and an external access integration (EAI). Then, grant the Snowflake Role usage privileges on the EAI. ## Set up BigQuery 1. Create a Google Cloud Service account and grant it the necessary permissions to read BigQuery data. The connector uses this account for authentication. This account must have the following permissions: - [BigQuery User](https://docs.cloud.google.com/bigquery/docs/access-control#bigquery.user) - [BigQuery Data Editor](https://docs.cloud.google.com/bigquery/docs/access-control#bigquery.dataEditor)
`BigQuery Data Editor` must be granted at the **project level**, not at individual datasets. The connector queries `{project}.{region}.INFORMATION_SCHEMA.TABLES` to discover tables across all configured regions - a region-scoped view that requires project-level access. The connector also queries `{project}.{dataset}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE` to determine primary keys for each replicated table. Without project-level access, the query fails with a `Access Denied` error and the connector does not run correctly.
1. Generate and download the corresponding JSON key file for the service account. You will need the full contents of this file for the connector's configuration. 2. Enable change history on each source table to allow the connector to perform incremental replication. This feature allows BigQuery to track row-level changes (inserts, updates, and deletes), which the connector uses to sync data efficiently. Run the following query in the BigQuery console for each table: ```sql ALTER TABLE `project.dataset.table` SET OPTIONS (enable_change_history = TRUE); ``` ## Set up your Snowflake account As an Openflow administrator, perform the following tasks to set up your Snowflake account: 1. Create a Snowflake service user: ```sql USE ROLE USERADMIN; CREATE USER TYPE=SERVICE COMMENT='Service user for Openflow automation'; ``` 2. Store the private key for that user in a file to supply to the connector’s configuration. For more information, see [key-pair authentication](/user-guide/key-pair-auth). ```sql ALTER USER SET RSA_PUBLIC_KEY = ''; ``` 3. Create a database that stores the replicated data, and set up permissions for the Snowflake user to create objects in that database by granting USAGE and CREATE SCHEMA privileges. ```sql USE ROLE ACCOUNTADMIN; CREATE DATABASE IF NOT EXISTS ; GRANT USAGE ON DATABASE TO USER ; GRANT CREATE SCHEMA ON DATABASE TO USER ; ``` 4. Create a new warehouse or use an existing warehouse for the connector. To create a new warehouse: ```sql CREATE WAREHOUSE WITH WAREHOUSE_SIZE = 'MEDIUM' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE; GRANT USAGE, OPERATE ON WAREHOUSE TO USER ; ``` Start with the MEDIUM warehouse size, then experiment with size depending on the amount of tables being replicated, and the amount of data transferred. To determine if you should increase, monitor the connector and database while data replication is in progress. If you observe significant delays during incremental replication, experiment with a larger warehouse size. However large table numbers typically scale better using [multi-cluster warehouses](/user-guide/warehouses-multicluster) instead of increasing the warehouse size. 5. Create an external access integration to enable network access outside of Snowflake. If your runtime executes in Openflow - BYOC, you do not need to create an External Access Integration (EAI). Instead, configure your cloud network egress to allow TLS 443 access to the endpoints listed below. Required host:port endpoints are listed in [](#label-of-bq-req-domains). To allow the connector to call the required Google APIs from a Snowflake-hosted runtime, you must create a network rule and an external access integration (EAI). Then, grant the Snowflake role usage privileges on the EAI. To create the external access integration and network rule and grant access, perform the following steps: 1. Create a network rule to allow the connector to access the required Google APIs: ```sql USE ROLE ACCOUNTADMIN; USE DATABASE ; CREATE OR REPLACE NETWORK RULE openflow__network_rule TYPE = HOST_PORT MODE = EGRESS VALUE_LIST = ( 'bigquery.googleapis.com:443', 'bigquerystorage.googleapis.com:443', 'oauth2.googleapis.com:443' ); ``` 2. Create an External Access Integration that references the network rule: ```sql CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION openflow__eai ALLOWED_NETWORK_RULES = (openflow__network_rule) ENABLED = TRUE; ``` 3. Grant your Snowflake Role USAGE on the integration: ```sql GRANT USAGE ON INTEGRATION openflow__eai TO ROLE openflow_runtime_role_; ``` ## Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ## Configure the connector To configure the connector, perform the following steps: 1. Right-click on the added runtime and select **Parameters**. 2. Populate the required parameter values as described in [](#label-specify-bq-connector-flow-parameters). ### Specify flow parameters This section describes the flow parameters that you can configure based on the following parameter contexts: - [BigQuery Source Parameters](#bigquery-source-parameters): Used to define the configuration for reading data from BigQuery. - [BigQuery Destination Parameters](#bigquery-destination-parameters): Used to establish connection with Snowflake. - [BigQuery Ingestion Parameters](#bigquery-ingestion-parameters): Used to specify the tables and views to replicate. #### BigQuery Source Parameters
#### BigQuery Destination Parameters
#### BigQuery Ingestion Parameters
## Run the flow 1. Right-click on the plane and select **Enable all Controller Services**. 2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion. ## Next steps - For information on tasks you can perform after installing the connector, see [Use the connector](/user-guide/data-integration/openflow/connectors/google-big-query/use) - For information on monitoring the flow, see [Monitor the flow](/user-guide/data-integration/openflow/monitor) --- title: Setting up the Openflow Connector for Shopify source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/shopify/setup.md section: Loading & Unloading Data --- # Setting up the %shopifyof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/shopify/about) - [](/user-guide/data-integration/openflow/connectors/shopify/object-definitions) - [](/user-guide/data-integration/openflow/connectors/shopify/maintain) This topic describes the steps to set up the %shopifyof%. ## Prerequisites 1. Review [](/user-guide/data-integration/openflow/connectors/shopify/about). 2. Set up your runtime deployment. - [](/user-guide/data-integration/openflow/setup-openflow-byoc) - [](/user-guide/data-integration/openflow/setup-openflow-spcs) 3. If you are using %ofsfspcs-plural%, ensure that you have reviewed [the required domain configuration](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the [domains](#label-shopify-req-domains) required by the connector. 4. Ensure you have access to the Openflow admin role or a similar role you use to manage Openflow. 5. If you're creating a Snowflake service user to manage the connector, set up key pair authentication. For more information, see [key pair authentication](/user-guide/key-pair-auth). ## Required endpoints The following endpoint is required for the connector to function: - `.myshopify.com:443` (for example, `mystore.myshopify.com:443`) If you are using %ofbyoc-plural%, configure your cloud network egress to allow HTTPS (port 443) access to this endpoint. If you are using %ofsfspcs-plural%, you must create a network rule and an external access integration (EAI). For more information, see [Create a network rule (Openflow - Snowflake Deployments only)](#label-create-network-rule). ## Set up Shopify A Shopify store administrator must create a custom app and generate an Admin API access token for the connector to authenticate. 1. Log in to your Shopify admin at `https://admin.shopify.com/store/`. 2. Navigate to **Settings** %ra% **Apps** %ra% **Develop apps**. 3. Select **Create an app**. Provide an app name and select a developer. 4. On the **Configuration** tab, select **Configure Admin API scopes**. 5. Select the `read_*` scopes corresponding to the object types you want to replicate. For example: - `read_orders`: orders, transactions, fulfillments (access limited to the last 60 days by default; request the `read_all_orders` scope in the Partner Dashboard to access the full order history) - `read_products`: products, product variants, collections - `read_customers`: customers, segments - `read_inventory`: inventory items, locations - `read_merchant_managed_fulfillment_orders`: fulfillment orders For the full list of available scopes, see the [Shopify access scopes reference](https://shopify.dev/docs/api/usage/access-scopes). Grant only the scopes required for the objects you intend to replicate. You can update scopes later by editing the app configuration. Some GraphQL fields require write scopes to read (for example, `marketingUnsubscribeUrl` on the `Customer` object requires `write_customers`). If you don't grant the corresponding write scope, the Shopify API returns an error for that field. To avoid this, either omit the field from the `graphqlFields` list in the **Object Definitions Override** parameter, or add it to `ignoredFields`. Note that `ignoredFields` works on top-level field names only. For nested fields, you must remove them from the `graphqlFields` sub-selection. 6. Select **Save**. 7. Go to the **API credentials** tab and select **Install app**. Confirm the installation. 8. Copy the **Admin API access token** and store it in a secure location. You need this token when configuring the connector. The access token is displayed only once. If you lose it, you must uninstall and reinstall the app to generate a new token. For more information, see [Generate access tokens for custom apps](https://shopify.dev/docs/apps/build/authentication-authorization/access-tokens/generate-app-access-tokens-admin) in the Shopify developer documentation. ## Set up your Snowflake account As an Openflow administrator, perform the following tasks to set up your Snowflake account. ### Create a Snowflake service user (Openflow - BYOC Deployments only) This step is only required if you are deploying the connector in %ofbyoc-plural%. It isn't needed for %ofsfspcs-plural%. 1. Create a service user: ```sql USE ROLE USERADMIN; CREATE USER TYPE=SERVICE COMMENT='Service user for the Shopify connector'; ``` 2. Store the private key for that user in a file to supply to the connector's configuration. For more information, see [key pair authentication](/user-guide/key-pair-auth). ```sql ALTER USER SET RSA_PUBLIC_KEY = ''; ``` ### Create database, schema, and warehouse 1. Create the destination database: ```sql USE ROLE ACCOUNTADMIN; CREATE DATABASE IF NOT EXISTS ; ``` 2. Create the destination schema: ```sql CREATE SCHEMA IF NOT EXISTS .; ``` 3. Create a role for the connector and grant the required privileges: ```sql CREATE ROLE IF NOT EXISTS ; GRANT USAGE ON DATABASE TO ROLE ; GRANT USAGE ON SCHEMA . TO ROLE ; GRANT CREATE TABLE ON SCHEMA . TO ROLE ; ``` 4. Create a warehouse (or use an existing one) and grant usage privileges: ```sql CREATE WAREHOUSE IF NOT EXISTS WITH WAREHOUSE_SIZE = 'SMALL' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE; GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE ; ``` 5. If using %ofbyoc-plural%, assign the role to the service user: ```sql GRANT ROLE TO USER ; ALTER USER SET DEFAULT_ROLE = ; ``` ### Create a network rule (Openflow - Snowflake Deployments only) If your runtime executes in %ofbyoc-plural%, you don't need to create an External Access Integration (EAI). Instead, configure your cloud network egress to allow HTTPS (port 443) access to your Shopify store domain. To allow the connector to call the Shopify API from a Snowflake-hosted runtime, create a network rule and an external access integration (EAI), and then grant the Snowflake role usage privileges on the EAI. 1. Create a network rule: ```sql USE ROLE ACCOUNTADMIN; CREATE OR REPLACE NETWORK RULE openflow__shopify_network_rule TYPE = HOST_PORT MODE = EGRESS VALUE_LIST = ('.myshopify.com:443'); ``` 2. Create an External Access Integration: ```sql CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION openflow__shopify_eai ALLOWED_NETWORK_RULES = (openflow__shopify_network_rule) ENABLED = TRUE; ``` 3. Grant your Snowflake role USAGE on the integration: ```sql GRANT USAGE ON INTEGRATION openflow__shopify_eai TO ROLE openflow_runtime_role_; ``` ## Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ## Configure the connector To configure the connector, perform the following steps: 1. Right-click on the added connector process group and select **Parameters**. 2. Populate the required parameter values as described in the following sections. ### Shopify parameters
### Snowflake destination parameters
## Run the flow 1. Right-click on an empty area of the canvas and select **Enable all Controller Services**. 2. Right-click on the connector process group and select **Start**. The connector starts querying the Shopify Admin API and loading data into Snowflake. ## Next steps - For information on customizing which fields are extracted and registering custom object types, see [](/user-guide/data-integration/openflow/connectors/shopify/object-definitions). - For information on forcing a full reload of connector state, see [](/user-guide/data-integration/openflow/connectors/shopify/maintain). - For information on monitoring the flow, see [Monitor the flow](/user-guide/data-integration/openflow/monitor). --- title: Setting up the Openflow Connector for Veeva Vault source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/veeva-vault/setup.md section: Loading & Unloading Data --- # Setting up the %veevavaultof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/veeva-vault/about) - [](/user-guide/data-integration/openflow/connectors/veeva-vault/use) This topic describes the steps to set up the %veevavaultof%. ## Prerequisites 1. Review [](/user-guide/data-integration/openflow/connectors/veeva-vault/about). 2. Set up your runtime deployment. - [](/user-guide/data-integration/openflow/setup-openflow-byoc) - [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs) 3. If you are using %ofsfspcs-plural%, ensure that you have reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the [domains](#label-of-veeva-req-domains) required by the connector. 4. You have access to the Openflow admin role or a similar role you use to manage Openflow. 5. If you are creating a Snowflake service user to manage the connector, you have created key pair authentication. For more information, see [key-pair authentication](/user-guide/key-pair-auth). ## Required endpoints The following endpoint is required for the connector to function: - `:443` (for example, `myvault.veevavault.com:443`) If you are using %ofbyoc-plural%, configure your cloud network egress to allow TLS 443 access to this endpoint. If you are using %ofsfspcs-plural%, you must create a network rule and an external access integration (EAI). See [Create a network rule (Openflow Snowflake Deployments only)](#create-a-network-rule-openflow-snowflake-deployments-only) for details. ## Set up Veeva Vault The connector uses Direct Data API to retrieve data. Before you can use the connector, your Vault administrator must complete the following tasks. ### Enable Direct Data Direct Data must be enabled on your Vault. This is a Vault-level feature that allows external systems to retrieve data exports via the Direct Data API. To verify that Direct Data is enabled, your Vault administrator can check **Admin** %ra% **Settings** %ra% **General Settings** %ra% **Direct Data** in the Veeva Vault UI. For more information, see the [Direct Data API documentation](https://general.veevavault.dev/direct-data-api). ### Create a service account Create a dedicated Veeva Vault user account for the connector. Refer to the [Direct Data API permissions documentation](https://general.veevavault.dev/direct-data-api/references/direct-data-permissions) for configuring this account. Record the username and password for this service account. You need these values when configuring the connector. Snowflake recommends using a dedicated service account rather than a personal user account. This ensures that the connector continues to function if a personal account is disabled or its password is changed. ## Set up your Snowflake account As an Openflow administrator, perform the following tasks to set up your Snowflake account. ### Create a Snowflake service user (Openflow BYOC only) This step is only required if you are deploying the connector in %ofbyoc-plural%. It isn't needed for %ofsfspcs-plural%. 1. Create a service user: ```sql USE ROLE USERADMIN; CREATE USER TYPE=SERVICE COMMENT='Service user for the Veeva Vault connector'; ``` 2. Store the private key for that user in a file to supply to the connector's configuration. For more information, see [key-pair authentication](/user-guide/key-pair-auth). ```sql ALTER USER SET RSA_PUBLIC_KEY = ''; ``` ### Create database, schema, and warehouse 1. Create the destination database: ```sql USE ROLE ACCOUNTADMIN; CREATE DATABASE IF NOT EXISTS ; ``` 2. Create the destination schema: ```sql CREATE SCHEMA IF NOT EXISTS .; ``` 3. Create a role for the connector and grant the required privileges: ```sql CREATE ROLE IF NOT EXISTS ; GRANT USAGE ON DATABASE TO ROLE ; GRANT USAGE ON SCHEMA . TO ROLE ; GRANT CREATE TABLE ON SCHEMA . TO ROLE ; ``` 4. Create a warehouse (or use an existing one) and grant usage privileges: ```sql CREATE WAREHOUSE IF NOT EXISTS WITH WAREHOUSE_SIZE = 'SMALL' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE; GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE ; ``` 5. If using %ofbyoc-plural%, assign the role to the service user: ```sql GRANT ROLE TO USER ; ALTER USER SET DEFAULT_ROLE = ; ``` ### Create a network rule (Openflow Snowflake Deployments only) If your runtime executes in %ofbyoc-plural%, you don't need to create an External Access Integration (EAI). Instead, configure your cloud network egress to allow TLS 443 access to your Veeva Vault hostname. To allow the connector to call the Veeva Vault API from a Snowflake-hosted runtime, create a network rule and an external access integration (EAI), and then grant the Snowflake role usage privileges on the EAI. 1. Create a network rule: ```sql USE ROLE ACCOUNTADMIN; CREATE OR REPLACE NETWORK RULE openflow__veeva_network_rule TYPE = HOST_PORT MODE = EGRESS VALUE_LIST = (':443'); ``` 2. Create an External Access Integration: ```sql CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION openflow__veeva_eai ALLOWED_NETWORK_RULES = (openflow__veeva_network_rule) ENABLED = TRUE; ``` 3. Grant your Snowflake role USAGE on the integration: ```sql GRANT USAGE ON INTEGRATION openflow__veeva_eai TO ROLE openflow_runtime_role_; ``` ## Install the connector To install the connector, do the following as a data engineer: 1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**. 2. On the Openflow connectors page, find the connector and select **Add to runtime**. 3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**. Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data. 4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete. 5. Authenticate to the runtime with your Snowflake account credentials. The Openflow canvas appears with the connector process group added to it. ## Configure the connector To configure the connector, perform the following steps: 1. Right-click on the added connector process group and select **Parameters**. 2. Populate the required parameter values as described in the sections below. ### Veeva Vault parameters
### Snowflake destination parameters
### Schema evolution parameters
## Run the flow 1. Right-click on an empty area of the canvas and select **Enable all Controller Services**. 2. Right-click on the connector process group and select **Start**. The connector starts polling Veeva Vault for Direct Data files and loading data into Snowflake. ## Next steps - For information on tasks you can perform after installing the connector, see [Use the connector](/user-guide/data-integration/openflow/connectors/veeva-vault/use). - For information on monitoring the flow, see [Monitor the flow](/user-guide/data-integration/openflow/monitor). --- title: Setup tasks for SAP® Snowflake and SAP® BDC Connect for Snowflake source: https://docs.snowflake.com/en/user-guide/data-integration/zero-copy/sap-sql/setup-tasks.md section: Loading & Unloading Data --- # Setup tasks for %sapsnowflake% and %sapbdc% - [](/user-guide/data-integration/zero-copy/about-sap-snowflake) This topic describes the overall tasks required to set up, configure, and run either %sapsnowflake% or %sapbdc%. ## Prerequisites 1. Ensure that you have reviewed [](/user-guide/data-integration/zero-copy/about-sap-snowflake). ## Tasks Perform the following tasks to set up, configure, and run %sapsnowflake% or %sapbdc%.
--- title: SignContentPGP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/signcontentpgp.md section: Loading & Unloading Data --- # SignContentPGP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-pgp-nar ## Description Sign content using OpenPGP Private Keys ## Tags Encryption, GPG, OpenPGP, PGP, RFC 4880, Signing ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.pgp.DecryptContentPGP](/user-guide/data-integration/openflow/processors/decryptcontentpgp) - [org.apache.nifi.processors.pgp.EncryptContentPGP](/user-guide/data-integration/openflow/processors/encryptcontentpgp) - [org.apache.nifi.processors.pgp.VerifyContentPGP](/user-guide/data-integration/openflow/processors/verifycontentpgp) --- title: SimpleCsvFileLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/simplecsvfilelookupservice.md section: Loading & Unloading Data --- # SimpleCsvFileLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A reloadable CSV file-based lookup service. The first line of the csv file is considered as header. ## Tags cache, csv, enrich, join, key, lookup, reloadable, value ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted ## Restrictions
## System Resource Considerations This component does not specify system resource considerations. --- title: SimpleDatabaseLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/simpledatabaselookupservice.md section: Loading & Unloading Data --- # SimpleDatabaseLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A relational-database-based lookup service. When the lookup key is found in the database, the specified lookup value column is returned. Only one value will be returned for each lookup, duplicate database entries are ignored. ## Tags cache, database, enrich, join, key, lookup, rdbms, reloadable, value ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: SimpleKeyValueLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/simplekeyvaluelookupservice.md section: Loading & Unloading Data --- # SimpleKeyValueLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Allows users to add key/value pairs as User-defined Properties. Each property that is added can be looked up by Property Name. The coordinates that are passed to the lookup must contain the key 'key'. ## Tags enrich, key, lookup, value ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: SimpleRedisDistributedMapCacheClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/simpleredisdistributedmapcacheclientservice.md section: Loading & Unloading Data --- # SimpleRedisDistributedMapCacheClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description An implementation of DistributedMapCacheClient that uses Redis as the backing cache. This service is intended to be used when a non-atomic DistributedMapCacheClient is required. ## Tags cache, distributed, map, redis ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: SimpleScriptedLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/simplescriptedlookupservice.md section: Loading & Unloading Data --- # SimpleScriptedLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Allows the user to provide a scripted LookupService instance in order to enrich records from an incoming flow file. The script is expected to return an optional string value rather than an arbitrary object (record, e.g.). Also the scripted lookup service should implement StringLookupService, otherwise the getValueType() method must be implemented even though it will be ignored, as SimpleScriptedLookupService returns String as the value type on the script's behalf. ## Tags groovy, invoke, lookup, script ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted ## Restrictions
## System Resource Considerations This component does not specify system resource considerations. --- title: SlackRecordSink source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/slackrecordsink.md section: Loading & Unloading Data --- # SlackRecordSink This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Format and send Records to a configured Channel using the Slack Post Message API. The service requires a Slack App with a Bot User configured for access to a Slack workspace. The Bot User OAuth Bearer Token is required for posting messages to Slack. ## Tags record, sink, slack ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: SmbjClientProviderService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/smbjclientproviderservice.md section: Loading & Unloading Data --- # SmbjClientProviderService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides access to SMB Sessions with shared authentication credentials. ## Tags samba, smb, cifs, files ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: Snowflake Openflow Connector for Kafka source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kafka/about.md section: Loading & Unloading Data --- This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) # Snowflake Openflow Connector for Kafka This topic describes the basic concepts of the Openflow Connector for Kafka and its limitations. The Openflow Connector for Kafka reads data from Kafka topics and writes it into Snowflake tables using the [Snowpipe Streaming High Performance](/user-guide/snowpipe-streaming/snowpipe-streaming-high-performance-overview) architecture. Use this connector if you're looking to do the following: - Ingest real-time events from Apache Kafka into Snowflake for near real-time analytics - Ingest real-time events from Apache Kafka into Snowflake-managed Iceberg™ tables - Accelerate your ingestion even more by combining Openflow speed with the Interactive Tables feature - Do Single Message Transforms for data enrichments or filtering before data lands in Snowflake. ## Limitations - The connector doesn't support schema evolution for Apache Iceberg™ tables. - Autoscaling isn't supported. The number of Openflow runtime min and max nodes should be constant for the runtime where the Openflow Connector for Kafka is deployed. - The Kafka cluster must be running version 0.10.0.0 or later. Prior versions of Kafka aren't supported. ## Using different authentication options, data types or data manipulation The connector is configured to work with the JSON data type and the SASL_SSL authentication method. The connector can be modified and extended in many ways. See the dedicated sub-pages in the setup section for guidance on making necessary changes. ### Supported Data types The Openflow Connector for Kafka supports the following data types: - **JSON (available by default in the connector)** - Avro (extra configuration required) - Protobuf (extra configuration required) ### Supported Authentication Methods The Openflow Connector for Kafka supports the following authentication mechanisms: - SASL with the following SASL mechanisms: - PLAIN - SCRAM-SHA-256 - **SCRAM-SHA-512 (available by default in the connector)** - OAUTHBEARER - [SASL with AWS MSK IAM](/user-guide/data-integration/openflow/connectors/kafka/aws-msk-iam-auth) (extra configuration required via controller services) - mTLS (extra configuration required via controller services) ## Next steps [](/user-guide/data-integration/openflow/connectors/kafka/setup) --- title: Snowflake Openflow Connector for Kafka: Configuring AWS MSK IAM Authentication source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kafka/aws-msk-iam-auth.md section: Loading & Unloading Data --- This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) # Snowflake Openflow Connector for Kafka: Configuring AWS MSK IAM Authentication AWS MSK IAM authentication allows you to use AWS Identity and Access Management (IAM) to authenticate to Amazon Managed Streaming for Apache Kafka (MSK). ## Prerequisites - Your Kafka cluster must be Amazon MSK with IAM authentication enabled. - You need to provide IAM credentials in Openflow with BYOC (bring your own cloud) configurations, deployed in your cloud. - The IAM role or user must have the necessary MSK permissions. ## Step 1: Create AmazonMSKConnectionService From the Openflow canvas, access the Controller Services configuration: 1. Double-click on the connector's processing group. 2. Right-click on the canvas and select Controller Services. Add a new AmazonMSKConnectionService: 1. Select **+** to add a new controller service. 2. Select **AmazonMSKConnectionService** from the list. 3. Select **Add**. Configure the AmazonMSKConnectionService properties:
Verify the AmazonMSKConnectionService: 1. Select **Verify** for the service. 2. Confirm that the service status shows as **Verified**. ## Step 2: Configure ConsumeKafka Processor 1. In your Kafka connector locate the ConsumeKafka processor. 2. Configure the processor to use the new connection service: Set the **Kafka Connection Service** property to the AmazonMSKConnectionService you created in [](#label-openflow-kafka-aws-msk-iam-auth-step1). ## Step 3 (Optional): Remove Old Kafka Connection Service 1. In the Controller Services tab, locate the old Kafka3Connection service. 2. Disable and remove the old service: 1. Select **Disable** for the old service. 2. After it's disabled, select **Delete** to remove the old service. --- title: Snowflake Openflow version history source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/version-history.md section: Loading & Unloading Data --- # Snowflake Openflow version history This topic provides version history for [Snowflake Openflow](/user-guide/data-integration/openflow/about). To apply the latest updates to your deployment, runtimes, or connectors, see [Manage Openflow](/user-guide/data-integration/openflow/manage). [/^AWS Data Plane Agent\b/i, ['deployment']], [/^SPCS Data Plane Agent\b/i, ['deployment']], [/^Runtime Operator\b/i, ['deployment']], [/^Data Plane Service\b/i, ['deployment']], [/^Data Plane UI\b/i, ['deployment']], [/^(Openflow\s+)?Ingress Controller\b/i, ['deployment']], [/^(Openflow\s+)?Token Refresher\b/i, ['deployment']], [/^(Openflow\s+)?Runtime Gateway\b/i, ['deployment', 'runtime']], [/^Runtime Oracle Extensions\b/i, ['runtime']], [/^Runtime Extensions\b/i, ['runtime']], [/^Runtime Server\b/i, ['runtime']], [/^Connectors\b/i, ['runtime']], [/^Control Plane Core\b/i, ['controlplane']], [/^Control Plane UI\b/i, ['controlplane']], ]; const trimmed = (text || '').trim(); for (const [pattern, cats] of openflowVhPatterns) { if (pattern.test(trimmed)) return cats; } return null; }; if (!filter) return; let listeners = []; let componentBlocks = []; let dateSections = []; const wire = () => { // `rehypeSectionize` only wraps H2-level groups in
, // so the filter lives in the preamble section and every H3 on the page is a flat // sibling under the H2 date section it belongs to. Anchor up at the
// root so we can see the sibling date sections. const article = filter.closest('article') || filter.closest('[data-testid="article-content"]') || filter.ownerDocument.body; // Build per-H3 "component blocks": the H3 plus its following siblings up to the // next H3/H2 or the next label anchor (which belongs to the *next* block). article.querySelectorAll('h3').forEach((h3) => { const cats = classifyOpenflowVhHeading(h3.textContent); if (!cats) return; const nodes = [h3]; let next = h3.nextElementSibling; while (next) { if (next.tagName === 'H3' || next.tagName === 'H2') break; if ( next.tagName === 'A' && typeof next.id === 'string' && next.id.startsWith('label-') ) { break; } nodes.push(next); next = next.nextElementSibling; } componentBlocks.push({ categories: cats, nodes }); }); // Collect date sections (each H2's
) and remember // which component blocks live inside each one, so we can hide an entire date // when all its component blocks are filtered out. const dateBlocks = new Map(); article.querySelectorAll('section[data-section]').forEach((sec) => { if (!sec.querySelector(':scope > h2')) return; dateSections.push(sec); dateBlocks.set( sec, componentBlocks.filter((b) => sec.contains(b.nodes[0])), ); }); const checkboxes = filter.querySelectorAll( 'input[type="checkbox"][data-ofvh-cat]', ); const summary = filter.querySelector('.ofvh-summary'); const totalCategories = checkboxes.length; const totalComponents = componentBlocks.length; const apply = () => { const active = Object.create(null); let activeCount = 0; checkboxes.forEach((cb) => { if (cb.checked) { active[cb.dataset.ofvhCat] = true; activeCount++; } }); let visibleComponents = 0; componentBlocks.forEach((block) => { let show = false; for (const c of block.categories) { if (active[c]) { show = true; break; } } block.nodes.forEach((n) => { n.hidden = !show; }); if (show) visibleComponents++; }); let hiddenDates = 0; dateSections.forEach((sec) => { const blocks = dateBlocks.get(sec) || []; // Hide the date heading only if it has classified blocks AND all are hidden. // If a date has no classified blocks at all (shouldn't happen here, but be // defensive), leave it visible. const anyVisible = blocks.length === 0 || blocks.some((b) => !b.nodes[0].hidden); sec.hidden = !anyVisible; if (!anyVisible) hiddenDates++; }); if (summary) { if (activeCount === totalCategories) { summary.textContent = ''; } else if (activeCount === 0) { summary.textContent = 'All categories are hidden. Select at least one to see entries.'; } else { summary.textContent = 'Showing ' + visibleComponents + ' of ' + totalComponents + ' component entries.' + (hiddenDates ? ' ' + hiddenDates + ' empty date heading(s) hidden.' : ''); } } }; checkboxes.forEach((cb) => { const handler = () => apply(); cb.addEventListener('change', handler); listeners.push([cb, handler]); }); apply(); }; if (typeof queueMicrotask === 'function') { queueMicrotask(wire); } else { setTimeout(wire, 0); } return () => { listeners.forEach(([cb, handler]) => cb.removeEventListener('change', handler), ); componentBlocks.forEach((block) => { block.nodes.forEach((n) => { n.hidden = false; }); }); dateSections.forEach((sec) => { sec.hidden = false; }); listeners = []; componentBlocks = []; dateSections = []; }; };
Show entries for:
## June 4, 2026 ### Runtime Server 2026.6.4.18 - Security patches and dependency upgrades. - Fixed date conversion for consistent use of the Proleptic Gregorian Calendar. - Added a size limit to the Standard Content Viewer. - Upgraded the UI frontend to version 0.76.0: - Added support for import/export of flow definitions including component state. ### Runtime Extensions 2026.6.4.18 - Added Parsing Strategy to `JsonTreeReader` and `JsonPathReader` with a Lenient option. - CDC MySQL: Introduced the initial MySQL CDC Connector based on the same architecture as the PostgreSQL CDC connector. - Shopify: Introduced a new Shopify source connector. - CDC Databases: Added a `SNOWFLAKE_MANAGED` merge strategy for the FULL journal type (`ReducedJournalRead` mode), reducing compile time and bytes scanned. - CDC Oracle and SQL Server: Flattened DML record structure to skip the Enrich processor, simplifying the pipeline. - CDC Databases: Added aggregate table status metrics reporting so operators can monitor the health of multiple CDC tables in one view. - CDC MongoDB: Migrated to `DirectJsonRecordWriter`, improving record writing performance and consistency. - Snowpipe Streaming: Deprecated the legacy `PutSnowpipeStreaming2` processor in favor of `PublishSnowpipeStreaming`. - CDC Databases: Increased fetch size in `MultiDatabaseListTableNames` to improve performance when listing a large number of tables. - CDC Databases: Ported sub-chunk part counting to `WaitForSnapshotCompletion`, improving snapshot progress tracking. - Dataverse: Refactored pagination loop for more efficient flow file handling. - SharePoint: Fixed a `NullPointerException` in `FetchSharepointFile` when the `Retry-After` response header is absent. - CDC Databases: Fixed error handling in `getConnection()` within `MergeSnowflakeJournalTable` to prevent silent failures. - CDC SQL Server: Added graceful handling of Change Tracking expiration for inactive databases, preventing pipeline failures. - CDC SQL Server: Added proper non-retryable exception handling to prevent CDC tables from being stuck in an error loop. - CDC Oracle: Excluded index-organized table segments from Oracle table listing to prevent spurious tables from appearing in the connector. - CDC Databases: Added missing query status handling in explicit response processing to prevent unhandled state transitions. ### Connectors 2026.6.4.15 - Shopify: Added versioned flows for the new Shopify source connector. - Dataverse: Tables are now moved to a FAILED status when an error does not recover, preventing silent data loss and alerting operators to problems. - Dataverse: Improved merge scheduling by adding a `GateFlowFile` processor to handle scheduling, reducing unnecessary flow file releases. ## May 22, 2026 ### Runtime Extensions 2026.5.22.12 - CDC SQL Server: Added configurable properties to control which metrics are logged during change tracking capture. - Snowpipe Streaming: Fixed a `NoClassDefFoundError` when using HTTP/HTTPS proxies or Azure internal stages by restoring Netty proxy classes to the Snowflake JDBC NAR. - CDC MySQL: Added Iceberg type override support with unsigned integer handling and proper bit-width mapping for MySQL source columns. - Snowflake: Added a Snowflake Provenance Reporting Task that streams Openflow provenance events to Snowflake via Snowpipe Streaming v2. - Snowpipe Streaming: Reduced log message size for PutSnowpipeStreaming failures by logging FlowFile IDs instead of full object references. - CDC PostgreSQL: Added proper Iceberg type mapping for PostgreSQL integer types using bit-width and temporal precision fields. - CDC Databases: Added bitWidth and temporalPrecision fields to Column for improved schema fingerprinting (v2) without affecting existing v1 connectors. ## May 21, 2026 ### Runtime Server 2026.5.21.15 - Security patches and dependency upgrades. ### Runtime Extensions 2026.5.21.16 - CDC Oracle: Adds support for multiple databases in Oracle CDC connector. - CDC Databases: Adds an adaptive placeholder merge strategy that optimizes merge query performance by reducing journal reads. - CDC SQL Server: Throws a permanent failure when the change tracking version expires due to retention, providing a clear error and marking the table as failed. - Salesforce: Allows currency fields to be treated as float type in the Salesforce describe object operation. - Snowpipe Streaming: Adds wait time information to retry log messages for better troubleshooting. - All connectors: Fixes managed authentication for BYOC deployments in the Snowflake Connection Service. ### Connectors 2026.5.21.15 - Dataverse: Increases max retry attempts on the Dataverse API to improve resilience. - Jira: Fixes the handling of deleted issues in the Jira Core connector. - CDC SQL Server: Adds parameters for Metrics Enabled, Max Batch Size, and Read Timeout. - MongoDB: Changes Schema Access Strategy to infer schema in the MongoDB connector. - CDC Oracle: Strips table selection criteria attributes for Oracle connectors to fix upgrade issues. ## May 20, 2026 ### Control Plane Core 0.116.1 - Improved reliability of Deployment and Runtime actions. - Security patches and dependency upgrades. ### Data Plane Service 0.116.0 - Fixed rare case of Runtime Upgrade Failure determination happening too soon while the upgrade is still processing. - Security patches and dependency upgrades. ### Control Plane UI 0.81.0 - Changed "Snowflake Role" label to "Execute as role" terminology. - Fixed hint for runtimes in the connector install dialog. - Fixed error not shown when querying for the listing of Connectors. - Preserve original route when user is redirected to authenticate. - Fixed issue preventing the link in the hint from opening in connector install dialog. - Security patches and dependency upgrades. ### Data Plane UI 0.16.0 - Security patches and dependency upgrades. ### AWS Data Plane Agent 1.44.0 - Fixed internal certificate renewal process for Openflow Ingress Controller. All customers running BYOC Deployments 1.31.2 - 1.42.0 should upgrade before June 15, 2026. - Improved upgrade reliability by fixing an IAM issue with "DescribeAddonVersions" that temporarily marked deployments as "Upgrade Failed" before automatically recovering. - Fixed redaction of sensitive values in telemetry to allow Key values from the LogAttributes processor to pass through as-is. - Security patches and dependency upgrades. ### SPCS Data Plane Agent 1.30.0 - Improved upgrade reliability by identifying unhealthy PostgreSQL deployment blocking the upgrade process. - Security patches and dependency upgrades. ### Runtime Operator 0.61.0 - Security patches and dependency upgrades. ### Openflow Runtime Gateway 2026.5.13.18 - Security patches and dependency upgrades. ### Openflow Ingress Controller 2026.5.13-18 - Security patches and dependency upgrades. ### Openflow Token Refresher 1.12.0 - Security patches and dependency upgrades. ## May 19, 2026 ### Runtime Server 2026.5.19.16 - Increased the web request timeout from the default to 5 minutes, preventing timeout errors when the runtime is under heavy load or processing large configuration changes. - Fixed connector flow URI pattern matching in cluster response merging that could cause incorrect flow resolution when multiple connectors share similar URI patterns. - Fixed an issue where adding parameters to a Parameter Context with multiple suffixed variants incorrectly applied changes to the wrong context during flow upgrades. - Enabled parameter value expressions to reference parameters defined in inherited parameter contexts. ### Runtime Extensions 2026.5.19.17 - CDC SQL Server (CT): When a Change Tracking query fails for a specific table, the connector now applies a backoff delay to that individual table instead of failing the entire connector run, and includes the table name in the error message for easier troubleshooting. - CDC SQL Server: Fixed a race condition where removing a table from replication and re-adding it before the next scheduler cycle caused the table to silently stop being processed until a full connector restart. - CDC SQL Server (CDC): Proactively detects source-table schema changes (DDL) by polling cdc.ddl_history, and detects when CDC is disabled mid-replication; affected tables are now moved to FAILED state instead of looping indefinitely on transient errors or producing rows against a stale schema. - CDC SQL Server (CT): Adds the table name to the error message when change tracking is found to be disabled on a table, making it clear which table needs attention. - CDC MySQL: Adds defensive type validation during binary log parsing to prevent corrupted records when a MySQL column returns an unexpected data type (e.g., after an undetected schema change). - Snowpipe Streaming: Fixed URI path segment encoding so that database, schema, or table names containing special characters (spaces, hyphens, mixed case) no longer cause channel-open failures; uses RFC 3986 compliant encoding instead of URL form encoding. - Snowpipe Streaming: Added detailed error logging (including FlowFile attributes and exception details) immediately before routing records to the FAILURE relationship, improving diagnostics when ingestion errors occur. - CDC Oracle: Treats database connection reset exceptions as warnings instead of errors; the connection is rarely used (once daily for license checks and schema fetches), is often stale, and the connector recovers automatically on the next cycle — this prevents false "Unhealthy" status on the connector dashboard. - CDC Databases: Added a dedicated SCHEMA_NOT_FOUND failure reason when a source table disappears or its schema cannot be fetched after retries are exhausted, providing operators with a clear indication of why a table stopped replicating instead of a generic error. - SharePoint: Fixed handling of HTTP 410 (Gone) errors when the SharePoint delta link expires, and added additional logging around delta link resolution to help troubleshoot synchronization issues with large document libraries. - CDC SQL Server: Improved snapshot performance for partitioned tables by placing the partition column first in the ORDER BY and keyset WHERE clauses, enabling SQL Server partition elimination during keyset pagination and significantly reducing query execution time for large partitioned tables. ### Connectors 2026.5.19.16 - CDC Oracle & SQL Server (CDC): Added a stale-streams filter parameter that automatically excludes journal streams nearing their staleness threshold from CDC processing, preventing failures when streams expire during long-running replication cycles. - SharePoint: Added an auto re-sync parameter to all SharePoint CDC connector variants that automatically re-synchronizes the full document library when the delta link expires or becomes invalid, instead of requiring manual intervention. - CDC Databases: Fixed a connection leak where some connections were not routed through Private Link; ensures all database connections from the connector (including internal health-check and metadata queries) go through the configured private link endpoint. - CDC Oracle & SQL Server: Adopted the enhanced MultiDatabaseWaitForSnapshotCompletion processor in incremental-only mode, which provides better coordination between snapshot and incremental phases and reduces the window where tables can miss changes during the transition. - CDC Oracle: Added an Oversized Value Strategy parameter to Oracle connectors, allowing operators to choose how to handle column values that exceed Snowflake's maximum column size (truncate, route to failure, or skip the column). - MongoDB: Strips internal collection selection criteria attributes from FlowFile metadata before sending to Snowflake, preventing unnecessary attribute pollution in the destination. - MongoDB: Improved failure reason reporting to distinguish between different types of ingestion errors (schema mismatch, connection timeout, authentication failure) for better operator visibility. ## May 14, 2026 ### Runtime Server 2026.5.14.16 - Added support for loading asset configurations when connectors are created via SQL. ### Runtime Extensions 2026.5.14.16 - CDC SQL Server (CT version): Added source commit time, row count, and per-cycle phase instrumentation for SQL Server Change Tracking. - Dataverse: Added FAILED table status tracking and a processor for managing table ingestion state. - Added GateFlowFile processor for flow control. - CDC Databases: Made the CDC schema registry resilient to TableSchema class changes, preventing failures on connector upgrades. - CDC Databases: Fixed NullPointerException on dynamic-property initialization in RouteOnSnowflakeParameter processors. - CDC SQL Server: Fixed keyset pagination failures by properly casting int/smallint/tinyint primary key placeholders. - Kafka: Refactored Kafka3ConnectionService to use SSLContextProvider for cleaner SSL handling and possibility to use PEM-based authentication. - AWS: Added Token Request Endpoint property to AwsRdsIamDatabasePasswordProvider for custom STS endpoints. ### Connectors 2026.5.14.16 - CDC Oracle: Added Concurrent Snapshot Queries parameter to allow parallel snapshot fetching. - CDC Oracle: Adjusted Oracle connector flows to comply with FlowFile size limits. - CDC MySQL, PostgreSQL, SQL Server: Removed tables listing FlowFile attributes to reduce pressure on Provenance Repository when syncing thousands of tables. - CDC MongoDB: Improved stream staleness prevention to reduce unnecessary restarts. ## May 12, 2026 ### Runtime Server 2026.5.12.16 - Fixed potential corruption with modify-after-write on Local State Provider. - Flow import/export with stateful components state. ### Runtime Extensions 2026.5.12.16 - Snowpipe Streaming: Added PrivateLink support to Snowpipe Streaming v2 processors. - CDC Databases: Added STALE_AFTER flow-file attribute to GetSnowflakeJournalStreams and MultiDatabaseGetSnowflakeJournalStreams for stream staleness monitoring. - CDC Databases (multi-database): Added Incremental strategy to MultiDatabaseWaitForSnapshotCompletion processor. - CDC SQL Server: Widened CdcStreamPosition to BigInteger to support 10-byte SQL Server Log Sequence Numbers. - CDC Oracle: Reduced the number of row ID ranges generated by SplitOracleTable to improve snapshot performance. - CDC Oracle: Added oversized value support to FetchRowsByRowId processor. - Dataverse: Fixed backpressure on success queue deadlocking FetchMicrosoftDataverseTable by only listing idle tables during long snapshots. - CDC PostgreSQL: Changed publication verification fall-through outcome from FAILED to SKIPPED to prevent false failures. ### Connectors 2026.5.12.16 - CDC MySQL, PostgreSQL, SQL Server: Added filter for streams nearing staleness within 7 days. - CDC MySQL: Replaced WaitForTableState processor with enhanced version in Incremental flow. - MongoDB: Added object identifier resolution. ## May 8, 2026 ### Runtime Server 2026.5.8.5 - CDC SQL Server: INFO-level logging for MultiDatabaseCaptureChangeSqlServer are now captured in the event table. - Fixed Repository Record creation for S2S and Load-Balanced Connections. ### Runtime Extensions 2026.5.8.8 - CDC PostgreSQL: Added Table Storage Format (Standard/Iceberg) option to the Postgres CDC connector wizard. - CDC SQL Server: Added logging for CT query performance tracking if Metrics Enabled property is set to true. - Dataverse: Enabled reingestion of all tables by allowing empty Tables Filter Value. - Dataverse: Added ability to drop individual table state records for selective reingestion. - CDC SQL Server: Reports capture metrics on empty fetches for better observability. - CDC PostgreSQL: Separated replication and non-replication connections to prevent connection pool exhaustion. - Snowpipe Streaming: Fixed GCS object transfer on SPCS for Snowpipe Streaming v2. - CDC SQL Server: Added FlowFile size limit to MultiDatabaseFetchRowsByRowId to prevent memory issues. - BigQuery: Automatically retries on transient gRPC exceptions instead of failing the flow. - BigQuery: Fixed case-insensitive database merge query failures. ### Connectors 2026.5.7.18 - Jira: Fixed duplicate issues when project has changed. - CDC SQL Server: Bumped concurrent tasks to 2 on MultiDatabaseEnrichCdcStream for improved throughput. - CDC SQL Server: Added Concurrent Select Queries parameter for incremental loads in multi-database mode. ## May 5, 2026 ### Control Plane Core 0.114.0 - Updated SQL Server connector CDC preview documentation link. - Replaced legacy Jira connector with Core and Agile versions. - Fixed intermittent issue with the available Deployment list missing some Deployment options when creating a new Runtime. - Security patches and dependency upgrades. ### Data Plane Service 0.113.0 - Improved reliability of Runtime Upgrade by handling a case where the Runtime StatefulSet is stuck waiting on unhealthy pods prior to upgrade. - Security patches and dependency upgrades. ### Control Plane UI 0.80.0 - Upgraded to Stellar 0.31.3 for consistency across Snowflake products. ### AWS Data Plane Agent 1.42.0 - Security patches and dependency upgrades. ### SPCS Data Plane Agent 1.28.0 - Security patches and dependency upgrades. ### Runtime Operator 0.59.0 - Improved reliability of Runtime scale down by disallowing a node to reconnect to the cluster while waiting on decommissioning. - Security patches and dependency upgrades. ### Openflow Runtime Gateway 2026.5.1.10 - Security patches and dependency upgrades. ### Openflow Ingress Controller 2026.5.1-10 - Security patches and dependency upgrades. ### Openflow Token Refresher 1.11.0 - Security patches and dependency upgrades. ### Runtime Server 2026.5.5.17 - Fixed handling of PROPERTY_PARAMETERIZATION_REMOVED as a local change during versioned Process Group upgrades. - Fixed lineage start index tracking in Session.create(). ### Runtime Extensions 2026.5.5.19 - CDC Oracle: Added support for user-declared logical keys, allowing custom replication key columns instead of relying on auto-detected primary keys. - CDC Databases (multi-DB): Prioritized newly added tables in the table scheduler so they begin replicating sooner. - Veeva Vault: Switched to STREAM-based change tracking in MergeVeevaVaultStagingTable for more efficient incremental processing. - CDC SQL Server: Fixed an off-by-one error at the LATEST starting position that could cause the first change event to be missed. ### Connectors 2026.5.5.16 - CDC Oracle: Added logical-key (user-declared) Oracle connector flows. - CDC PostgreSQL: Added Iceberg table support for PostgreSQL CDC connector. - Veeva Vault: Set Offset Tracking Resolution to DISABLED for PublishSnowpipeStreaming in Veeva Vault connector. - CDC MySQL: Switched to Snowflake journal record structure for MySQL Capture processor. - Salesforce Bulk API: Added Object Identifier Resolution parameter to Describe SFDC Object processors. - Confluence: Switched to documentId as the stage file name to prevent duplicate pages on connector restart. - BigQuery: Improved BigQuery incremental failure handling. ## May 1, 2026 ### Runtime Server 2026.5.1.1 - Fixed node offload handling for Processors like PublishSnowpipeStreaming and MergeContent that create and manage multiple process sessions. - Added metrics collection for Jira Processors. ### Runtime Extensions 2026.5.1.1 - Added support for Iceberg Tables to UpdateSnowflakeTable Processor. - Improved merge handling in MergeVeevaVaultStagingTable Processor. - Added DML Record Structure property to CaptureChangeMySQL Processor. ## April 28, 2026 ### AWS Data Plane Agent 1.41.0 - Enabled logs from the Openflow Agent in your Snowflake Event Table, improving support and reducing triage time for support cases. - Fixed an upgrade issue for older deployments that use Custom Ingress with multiple Custom Ingress Security Groups. - Improved support for adding observability agents in the EKS cluster alongside Openflow services. - Security patches and dependency upgrades. ### Runtime Server 2026.4.28.17 - Security patches and dependency upgrades. ### Runtime Extensions 2026.4.28.15 - CDC PostgreSQL: Added configuration verification to CaptureChangePostgreSQL. - CDC Databases: Added SNOWFLAKE_OPENFLOW application tag to CDC Merge queries. - CDC Google BigQuery: Added support for up to 7 days of history for CDC with Google BigQuery. - CDC Oracle: Improved Oracle table splitting performance. ## April 24, 2026 ### Runtime Server 2026.4.24.16 - Excluded Parameter Description from Flow Version change determination. - Added configurable Content Claim Truncation to FileSystemRepository. - Added Registered Flow ID Version Path to MDC Attributes. - Preserved prioritizer order in Git flow Registry serialization. ### Runtime Extensions 2026.4.24.16 - CDC SQL Server: Added CDC-based change capture processor as an alternative to Change Tracking. - CDC SQL Server: Added Snapshot Isolation support for consistent reads during change capture and table snapshots. - Jira v2: Added Agile components including boards, board configuration, board projects, board sprints, and filters. - CDC Oracle: Added Oversized Values handling, allowing truncation or rejection of values exceeding a configurable size limit. - CDC Databases: Added Incremental strategy to WaitForSnapshotCompletion so high-traffic tables no longer block other tables. - Veeva: Added connector and components to sync data from Veeva Vault. - CDC SQL Server: Fixed empty-string primary key handling where CAST caused infinite retry loops during keyset pagination. - CDC Databases (multi-DB): Fixed journal stream prefix collision bug where tables sharing the same bare name were incorrectly matched. - Dataverse: Fixed deadlock in FetchMicrosoftDataverseTable where a table stuck in FETCHING state after a non-retryable error would never recover. - Snowpipe Streaming: Fixed MergeSnowflakeJournalTable incorrectly handling failed merge queries, causing silent data loss. ### Connectors 2026.4.24.16 - Jira: Added Jira v2 core connector with improved entity coverage, custom fields, user groups, worklogs, issue relations, and project filtering. - Jira: Added Jira Agile connector for boards, sprints, sprint-to-board, and sprint-to-issue ingestion. - CDC MySQL & PostgreSQL: Added destination schema mapping support for routing source schemas to custom Snowflake destination schemas. - CDC SQL Server: Added multi-database CDC connector flow definition using Change Data Capture mode. - CDC PostgreSQL: Switched to flattened DML record output, approximately doubling ingestion throughput. - CDC PostgreSQL: Replaced WaitForTableState with WaitForSnapshotCompletion in incremental mode to reduce unnecessary queuing during snapshot phases. - CDC Databases: Fixed scheduling stuck issue for connectors with more than 10,000 tables by increasing backpressure queue limit. - Dataverse: Replaced PutSQL with ExecuteSQL for merge step, simplified flow, and added automatic cleanup of deprecated columns. - Veeva: Added connector and components to sync data from Veeva Vault. - CDC PostgreSQL: Reverted UpdateTableState property changes that could cause data loss during snapshot with concurrent channels. ## April 17, 2026 ### AWS Data Plane Agent 1.38.0 - Security patches and dependency upgrades. ### SPCS Data Plane Agent 1.27.0 - Security patches and dependency upgrades. ### Control Plane Core 0.111.0 - Security patches and dependency upgrades. - Improved performance when listing and interacting with Deployments and Runtimes. - HubSpot connector is now available in public preview. ### Data Plane Service 0.110.0 - Security patches and dependency upgrades. ### Ingress Controller 2026.4.14-17 - Security patches and dependency upgrades. ### Openflow Runtime Gateway 2026.4.14.17 - Security patches and dependency upgrades. ### Openflow Token Refresher 1.10.0 - Security patches and dependency upgrades. ### Control Plane UI 0.78.0 - Security patches and dependency upgrades. ### Data Plane UI 0.15.0 - Security patches and dependency upgrades. ## April 16, 2026 ### Runtime Extensions 2026.4.16.18 - CDC Oracle: Auto-detects UNIQUE key constraints as replication keys when no primary key is defined. - CDC SQL Server: Fixed `sysname` columns causing infinite schema-mismatch loop. - CDC SQL Server: Fixed duplicate DDL emission for unchanged tables by disambiguating `EARLIEST` position. - CDC Databases: Made `CdcSchemaRegistry` resilient to internal `TableSchema` class changes during upgrades. - Removed preview labels from UpdateSnowflake* processors. ### Connectors 2026.4.16.16 - CDC MySQL: Added Oversized Value Strategy parameter. - Dataverse: Fixed inability to change type of _SNOWFLAKE_DELETED column. - CDC SQL Server: Increased incremental load batch size to 100K rows. ## April 14, 2026 ### Runtime Server 2026.4.14.16 - Fixed local change detection for versioned flows when updating a property that was not set previously. - Runtime UI: Allows users to resume a suspended runtime in recovery mode. ### Runtime Extensions 2026.4.14.16 - Jira (Atlassian): Added Jira v2 core components including new processors for ingesting comments, changelogs, deleted issues, projects, permissions, users, worklogs, and other Jira entities. - CDC Databases: Introduced a configurable Oversized Value Limit property (default 16 MB) on CDC and snapshot processors. - CDC Oracle: Removed unnecessary Oracle database privileges from configuration scripts. - Snowpipe Streaming: Removed Preview tags from PublishSnowpipeStreaming processors, marking them as generally available. ### Connectors 2026.4.14.15 - Kafka: Disabled flow-file-based offset tracking to prevent data loss during downscaling. - Kafka: Added a new high-performance Kafka connector flow with PublishSnowpipeStreaming. - CDC Oracle and SQL Server: Exposed table exclusion parameter for multi-database connectors. - CDC Oracle: Added Snowpipe Streaming v2 routing with automatic v1 fallback. - Salesforce: Explicitly set warehouse in MERGE pre-query to prevent failures when no default warehouse is configured. ## April 13, 2026 ### AWS Data Plane Agent 1.37.0 - Improved custom ingress to simultaneously support load balancer security groups managed by both Openflow and deployment-specific configurations. - Removed duplicate ingress rules for default custom ingress security group. ## April 10, 2026 ### AWS Data Plane Agent 1.36.0 - Security patches and dependency upgrades. - Improved resiliency of new Deployments and upgrades related to how AWS IAM permissions are created and refreshed. - Improved cost efficiency of telemetry by removing unused or low-value metrics from being exported to Event Tables. ### SPCS Data Plane Agent 1.26.0 - Security patches and dependency upgrades. - Improved cost efficiency of telemetry by removing unused or low-value metrics from being exported to Event Tables. ### Control Plane Core 0.109.1 - Security patches and dependency upgrades. - Oracle Embedded License Connector added to Featured Connectors. ### Data Plane Service 0.109.0 - Security patches and dependency upgrades. ### Ingress Controller 2026.4.7 - Security patches and dependency upgrades. ### Runtime Operator 0.58.0 - Security patches and dependency upgrades. ### Control Plane UI 0.77.0 - Added third-party icons for Atlassian, Salesforce, and Microsoft SQL Server connector cards. - Added support for resuming suspended Runtimes in recovery mode. - Hide gateway version in Runtime upgrade dialog when appropriate. - Improved warnings and guidance when users lack permissions to create a Deployment. - Improved Connector installation process to no longer wait for available Runtimes to load before opening the dialog. ### Data Plane UI 0.14.0 - Security patches and dependency upgrades. ## April 9, 2026 ### Runtime Server 2026.4.9.16 - Fixed Parameter and Parameter Context descriptions being lost during versioned flow upgrades. - Fixed record path functions (`toBytes`, `toDate`, `toString`, and `format`) to return the correct types. - Snowpipe Streaming: Exports metrics from `PublishChangeDataSnowpipeStreaming` to the event table for better observability. ### Runtime Extensions 2026.4.9.16 - CDC SQL Server: Improved error visibility during connection setup by fixing exception masking in `CatalogHelper` and unifying `setCatalog` usage. - CDC SQL Server and Oracle: Switched to `DirectJsonRecordWriter` for storing change data as JSON in VARIANT columns, improving Snowflake ingestion efficiency. - CDC MySQL: Fixed composite primary key column ordering by reading from `KEY_COLUMN_USAGE`, ensuring correct row identification during snapshot. ### Connectors 2026.4.9.15 - Dataverse: Updated merge journal process to include `SNOWFLAKE_ID` in the Dataverse schema. - Box, Confluence, Google Drive, SharePoint, and Slack: Fixed Cortex-enabled connectors to preserve existing CORTEX SEARCH SERVICE configuration instead of overwriting it. - CDC MySQL, PostgreSQL, Oracle, and SQL Server: Increased run duration on CPU-bound processors in incremental flows to reduce backpressure. - CDC Oracle: Added staleness prevention to keep pipelines active during periods of low data volume. ## April 7, 2026 ### Runtime Extensions 2026.4.7.16 - Kafka: Fixed duplicate message delivery in ConsumeKafka when consumers rejoin a consumer group during rebalance. - CDC MySQL: Fixed data corruption when a MySQL server restart reassigns table IDs to different tables, preventing stale schema mappings from causing type mismatch errors during data ingestion. - CDC Oracle: Fixed Oracle XStream CDC failing to read the LCR version when the XStream outbound server is configured on a different database instance (PDB vs CDB). - CDC Databases: Reduced unnecessary Snowpipe Streaming query retries by penalizing the MergeSnowflakeJournalTable processor when no new data is available or the connection is disconnected, improving throughput. - CDC SQL Server: Added table draining to ensure tables with large change backlogs are fully consumed before the connector moves to the next table. ### Connectors 2026.4.7.16 - CDC MySQL and PostgreSQL: Added a `Re-snapshot Table Exclusions` parameter to allow specific tables to be excluded from replication, enabling re-snapshotting use cases. ## April 6, 2026 ### AWS Data Plane Agent 1.34.0 - Improved deployment upgrade time for customers with many Runtimes. - Improved speed and reliability of upgrades from Openflow Deployments running EKS 1.32 to EKS 1.35. ## April 2, 2026 ### Runtime Server 2026.4.2.16 - Increased the CDC connector metrics table row limit for observability dashboards from 30,000 to 40,000. - Fixed inherited Parameter Context synchronization on versioned Process Group upgrades when new parameters are added. - Fixed the provenance repository to honor the configured maximum attribute character size when reading entries. - Fixed component bundle resolution and rollback behavior on versioned flow changes. - Updated to Runtime UI 0.70.0. ### Runtime Extensions 2026.4.2.16 - Added Google Cloud Storage Provider for Iceberg. - Fixed empty Private Key check for PGP Secret Key. - CDC Databases: Added a new `MultiDatabaseGetSnowflakeJournalStreams` processor that supports multi-source CDC replication by mapping 3-part source table names (database + schema + table) to Snowflake destination schemas using a configurable naming pattern. - CDC PostgreSQL: Added a "Flatten DML Records" option to `CaptureChangePostgreSQL` that writes change events in the final flat format at capture time, eliminating the intermediate file read-and-rewrite step in `EnrichCdcStream` and reducing disk I/O. - Snowpipe Streaming 2: Added a "Destination Type" property to `PublishSnowpipeStreaming` that allows users to target either a named Pipe or a Table directly, with automatic migration to preserve existing Pipe-based configurations. - CDC Databases: Added an "Excluded Comma Separated Source Table Names" property to `ListTableNames` (and its multi-database equivalent) that lets users exclude specific tables from replication. - BigQuery: Fixed a property migration bug in `CreateReadSession` that caused incorrect processor configuration when upgrading from older flow versions. - CDC PostgreSQL: Fixed `FetchTableSnapshot` failures on tables containing `bytea` columns by using a more compatible JDBC method to read binary data. - CDC Databases: Updated the `DESTINATION_SCHEMA_NAME_PATTERN` placeholders from `{database}`, `{schema}` to `${source.database.name}`, `${source.schema.name}`, and `${source.table.name}`. A fixed schema name is now valid (the validator constraint requiring at least one placeholder has been removed). - All connectors: Added a `Validation Mode` property to the `SetAttributesValidatingReferences` processor, allowing configuration of how attribute reference validation is enforced. ### Connectors 2026.4.2.16 - BigQuery: Reduced the concurrency of parallel streaming jobs (PSS) to prevent resource contention issues. - SharePoint: Fixed a file removal pattern bug for customers with the `ENABLE_FIX_209969` account parameter set to false, where the pattern would fail to match and remove processed files. - BigQuery: Prevented CDC and view ingestion from starting when the required temporary dataset parameter is not configured, avoiding accidental table failures. - CDC MySQL, PostgreSQL, and SQL Server: Configured the source database connection pool with validation-on-borrow and periodic eviction of idle connections, preventing misleading "connection reset" errors caused by stale connections being reused after a server-side timeout. - Dataverse: Added a `_SNOWFLAKE_ID` column to replicated records by mapping the source primary key, allowing downstream consumers to uniquely identify each record in Snowflake. - CDC SQL Server, MySQL, and PostgreSQL: Added the new stream staleness prevention mechanism to the connector. - Jira: Fixed an incorrect merge query in the Jira connector. ## April 1, 2026 ### AWS Data Plane Agent 1.33.0 - Upgraded to AWS EKS 1.35. - Added the EKS kube-proxy add-on for automated, managed upgrades of networking components. - Improved auto-healing when EKS node groups are down or offline for extended periods. - Improved cost efficiency of telemetry collection by ignoring low-value metrics. ## March 31, 2026 ### AWS Data Plane Agent 1.31.3 - Security patches and dependency upgrades. ### SPCS Data Plane Agent 1.24.1 - Security patches and dependency upgrades. ### Control Plane Core 0.108.2 - Security patches and dependency upgrades. - Fixed Oracle Connector License syncing for accounts with renamed organizations. ### Data Plane Service 0.108.2 - Security patches and dependency upgrades. ## March 30, 2026 ### Runtime Server 2026.3.27.21 - Added `isValidDate` and `isValidInstant` Expression Language functions. - Fixed inherited parameter context preservation during `KEEP_EXISTING` versioned flow deployment. - **Behavior change:** When upgrading a running versioned flow to a new version, new components added in the new version are automatically started. ### Runtime Extensions 2026.3.27.21 - AWS: Fixed AWS connection pool shutdown on EKS with STS credential refresh. - Google Drive: Reverted to correct default scopes in Google Drive components, with a new property to use the Google Cloud Platform scope when using Workload Identity Federation with impersonation. - Kinesis: Handled `ResourceNotFoundException` in `ConsumeKinesis` when shards are not found or removed. - Oracle: Added support for LCR positions V1 and separate connection and XStream attach, adding support for 12.1 and 12.0. - Dataverse: Unknown Dataverse attribute types fall back to STRING. - Salesforce: Fixed Salesforce formula field translation producing invalid SQL for date arithmetic. - Snowflake: Fixed `PublishSnowpipeStreaming` skipping FlowFiles after pipe recreation due to a stale offset. - Dataverse: Added a parameter to configure maximum fetched column size in `FetchMicrosoftDataverseTable`. - Dataverse: Added table-level removal to the Dataverse connector. - Snowflake: Fixed Workload Identity Federation token header format for Snowpipe Streaming 2. ### Connectors 2026.3.26.19 - CDC database connectors: Added metrics collection for observability dashboards. - CDC SQL Server: Switched from Snowpipe Streaming v1 to v2. ## March 27, 2026 ### AWS Data Plane Agent 1.31.2 - Replaced Ingress-Nginx with Openflow Ingress Controller. - Fixed an issue with load balancer security group rules when multiple deployments share the same private security group. - Security patches and dependency upgrades. - Improved support for adding many Runtimes to a Deployment at once. - Removed the need for ingress on port 80. All Deployment and Runtime ingress uses port 443. ### SPCS Data Plane Agent 1.24.0 - Security patches and dependency upgrades. ### Control Plane Core 0.108.0 - Security patches and dependency upgrades. - Improved handling for users with large sets of roles when they need to provide this list of roles for creating and managing resources. - Salesforce Bulk API connector is now generally available (GA). ### Control Plane UI 0.74.0 - Upgraded to stellar 0.28.0. - The Create Runtime and EAIs dialog can open while EAIs are still loading. ### Data Plane Service 0.108.0 - Security patches and dependency upgrades. ### Openflow Runtime Gateway 2026.3.25.14 - Security patches and dependency upgrades. ## March 24, 2026 ### Runtime Server 2026.3.24.18 - Improved layout of vertical space in the Runtime UI for longer lists of tables and schemas. ### Runtime Extensions 2026.3.24.20 - SQL Server: CDC components are now generally available (GA). - Oracle: Fixed `TIMESTAMPTZ` mapping for named time zones. - Added processors to support Snowpipe Streaming v2 in CDC database connectors. - Improved `UpdateSnowflakeTable` caching and query batching for better performance. ### Connectors 2026.3.24.18 - CDC database connectors: Added metrics collection for observability dashboards. - MySQL: Switched CDC connector from Snowpipe Streaming v1 to v2. - PostgreSQL: Switched CDC connector from Snowpipe Streaming v1 to v2. - Salesforce: Salesforce Bulk API connector is now generally available (GA) and includes a parameter to control warehouse cost optimization. ## March 20, 2026 ### Control Plane Core 0.107.1 - Improved role picker for users with a large number of available roles. ### Runtime Server 2026.3.19.18 - Fixed Content Repository defragmentation. - Improved reloading behavior for Scripted Record Reader and Writer processors. ### Runtime Extensions 2026.3.19.20 - PostgreSQL and MySQL: Set explicit field sizes for `VARCHAR` and `BINARY` columns to support larger values. - PostgreSQL: Fixed cursor-based fetching by disabling `autoCommit` to prevent out-of-memory errors on large rows. - Salesforce: Salesforce Bulk API components are now generally available (GA). - Salesforce: Fixed Salesforce Upsert Lookup failing when field values contain a `+` sign. - SQL Server: Improved handling of source database failover to prevent table sync failures. - SQL Server: Improved snapshot performance for clustered and partitioned tables. ### Connectors 2026.3.19.15 - Dataverse: Added Table State Service to the Dataverse connector flow. - Oracle: Parallelized Snowpipe Streaming v1 snapshot by primary key for improved performance. - Slack: Added option to ignore channels from historical load in the Slack connector. ## March 17, 2026 ### AWS Data Plane Agent 1.30.0 - Added support for recovering a deployment after Snowflake organization or account rename via an `update-account.sh` script. - Improved Runtime metrics collection for larger data flows and Connectors handling large numbers of tables. - Preparing for dedicated Openflow Ingress Controller to replace Nginx for BYOC runtime traffic. - Added memory limiter and metrics routing in the OTEL collector for stability and separate pipelines. - Security patches and dependency upgrades, including OTEL Collector 0.146.1. ### SPCS Data Plane Agent 1.23.0 - Improved Runtime metrics collection for larger data flows and Connectors handling large numbers of tables. - Security patches and dependency upgrades. ### Control Plane UI 0.73.0 - Upgraded to stellar icons. - Fixed a condition that could have prevented the splash screen from hiding when an unhandled error occurs. ### Control Plane Core 0.107.0 - Reliability improvements for Deployments when an account or organization is renamed. ### Data Plane Service 0.107.0 - Security patches and dependency upgrades. ### Runtime Operator 0.57.0 - Security patches and dependency upgrades. ### Openflow Ingress Controller 2026.3.16-17 - Security patches and dependency upgrades. - Preparing for replacement of Nginx as the ingress controller for BYOC runtime traffic ### Runtime Server 2026.3.17.13 - New Runtime UI 0.68.0. - Fixed splash screen which may stay visible under specific conditions. - Improved content viewer to improve MIME type support. - Track Content and truncate large resource claims in FileSystemRepository. - Performance improvements for OpenTelemetry data collection. ### Runtime Extensions 2026.3.17.13 - Updated log configuration to capture INFO level logs for DescribeSFDCObject processor. - Added Session Header handling to Snowpipe Streaming 2. - Reduced default batch size and handle query timeout for SQL Server table name fetching. - Kinesis: Significantly improved the `ConsumeKinesis` processor, removing use of the Kinesis Client Library. - MySQL: Fixed `NullPointerException` in `CaptureChangeMySQL.disconnectBinlogClient` when `tableMapStore` is `null`. - MySQL: Added TLS support for JDBC in `CaptureChangeMySQL`. - SQL Server: Fixed `VARCHAR/NVARCHAR` sorting issue that may cause duplicate rows during batched paging. - SQL Server: Reduced default batch size and improved query timeout handling table name fetching. - Added `MultiDatabaseRouteOnSnowflakeParameter` and `MultiDatabaseExitRouteOnSnowflakeParameter` processors. - Added `GetActiveSnowflakeStreams` processor for stream staleness prevention. ## March 13, 2026 ### Control Plane Core 0.106.0 - Oracle connector is now generally available (GA). - Registered preview of new PostgreSQL CDC SOM connector in the control plane catalog. - Added data plane configuration options for CDC Snowpipe Streaming v2 rollout across MySQL, PostgreSQL, SQL Server, and Oracle connectors. ## March 12, 2026 ### Runtime Server 2026.3.12.13 - Fixed 409 Conflict in Azure DevOps and Bitbucket flow registry clients for multiple Flows with shared branch. - Fixed Flow Comparison showing changes for nested child components when using nesting flow versioning. ### Runtime Extensions 2026.3.12.15 - Iceberg: Added Storage Class to S3FileIOProvider. - Salesforce: Use FQN database/schema for Salesforce Merge queries. - Salesforce: Improved Salesforce formula parsing and logging. - Salesforce: Fixed ListSFDCObjects becoming invalid after upgrade with dynamic relationships. - MySQL: Escalate binlog communication failure log from WARN to ERROR. ### Connectors 2026.3.12.15 - Salesforce: Use FQN database/schema for Salesforce Merge queries. - Salesforce: Do not emit bulletin on suspend warehouse attempts. ## March 11, 2026 ### Runtime Server 2026.3.10.21 - New Runtime UI 0.67.0. - Fixed flicker of overlapping connection warning in connector canvas. - Aligned reusable canvas renderers for borders around PGs and RPGs. - Fixed support for floating point numbers in connection flow file expiration. - Restored special treatment of trigger serially processors (no concurrent tasks). ### Runtime Extensions 2026.3.10.20 - AWS Secrets Manager Parameter Provider now supports plain text secrets. - ListS3: Fixed V1 pagination failing when delimiter is not set, which caused an infinite loop. - MultiDatabaseFetchTableSnapshot: Added FlowFile size limiting. - Slack: ConsumeSlackHistory now has an option to ignore channels from historical load. - MySQL: Fixed PutSnowpipeStreaming storing JSON in VARIANT columns as strings by directly writing JSON records in CaptureChangeMySQL. - Dataverse: Added `SNOWFLAKE_ID` column to the schema. - Snowpipe Streaming: Improved channel error handling in PublishSnowpipeStreaming. - Snowpipe Streaming v2: Fixed error when streaming `timestamp_tz` with seconds in offset. - CDC Databases: Fixed tables transitioning to FAILED during snapshot load by adding buffer for lineageStartDate comparison. ### Connectors 2026.3.10.20 - CDC SQL Server: Added Oversized Value Strategy parameter. - Salesforce Bulk API: Added initial support for formulas. - Salesforce Bulk API: Warehouse suspension now occurs immediately after all merge queries are executed. ## March 6, 2026 ### Runtime Extensions 2026.3.6.12 - UpdateSnowflakeView: Added support for raw SQL. - PostgreSQL: Fixed PutSnowpipeStreaming storing JSON in VARIANT columns as strings by directly writing JSON records in CaptureChangePostgreSQL and FetchTableSnapshot. - HubSpot: Added missing CRM object types and fixed API compatibility. - CDC Oracle: Fixed partitioned tables support during snapshot. - Snowpipe Streaming v1: Fixed empty FlowFile handling in PutSnowpipeStreaming when configured to use exactly-once delivery. ## March 5, 2026 ### AWS Data Plane Agent 1.26.0 - Security patches and dependency upgrades. ### SPCS Data Plane Agent 1.20.0 - Security patches and dependency upgrades. - Added per-connector error metrics, improving speed and reliability of the Openflow Observability dashboard. ### Control Plane UI 0.72.0 - Introduced a Connector listing for managing Connector Snowflake Objects (hidden by a feature flag until verified and ready). - Updated the Connector installation process for Connector Snowflake Objects (hidden by a feature flag until verified and ready). - Added MongoDB Connector Definition icon. - Updated the Oracle Connector terms dialog to account for the new independent (BYOL) Oracle connector. - Updated the user-facing action for "Rename" to "Set display name". ### Control Plane Core 0.105.0 - Snowflake deployments can now heal from `UPGRADE_FAILED` state if they report a healthy status and version. - Added MongoDB connector flow. - Added support for `OPENFLOW_INGRESS_NAME` parameter when creating the URL to access Snowflake Deployments. ### Data Plane Service 0.105.0 - **Behavior change:** Runtime Python processor properties are now set based on Runtime node size (Small: disabled, Medium: <=2, Large: <=4). - Security patches and dependency upgrades. ### Runtime Operator 0.56.0 - **Behavior change:** Python processors are now disabled by default to improve Runtime stability. Python processor usage is controlled by Runtime size (Small: disallowed, Medium: <=2, Large: <=4). - Security patches and dependency upgrades. ## March 3, 2026 ### AWS Data Plane Agent 1.25.0 - Reduced downtime for Openflow Runtimes when upgrading the AMI for BYOC Deployments. - Added support for upcoming EKS 1.35 upgrade, though BYOC is still using EKS 1.34. ### Runtime Server 2026.3.4.15 - Expression Language: Added `compactDelimitedList()` and `trimDelimitedList()` functions. - New Runtime UI 0.66.0. - Hidden environmental changes in show/revert local changes. - Connections now avoid overlapping, and warnings are shown for existing overlaps. ### Runtime Extensions 2026.3.4.16 - Snowpipe Streaming: Added optional Role property. - CDC Databases: Fixed JSON column filtering in incremental load. - CDC Databases: Fixed `clearSession()` removing already-transferred FlowFiles in FetchTableSnapshot. - CDC Databases: Fixed `SEEN_AT` value being interpreted as seconds instead of milliseconds in incremental mode. - CDC Databases: Minimized the risk of filling up the waiting queue in EnforceOrder processor. - CDC Databases: Added "Oversized Value Strategy" to MultiDatabaseFetchTableSnapshot. - CDC Oracle: Fixed verification in CaptureChangeOracle. - CDC SQL Server: Added "Oversized Value Strategy" to MultiDatabaseCaptureChangeSqlServer processor. - CDC SQL Server: Fixed DESC primary key handling in MultidatabaseFetchTableSnapshot. - Salesforce: Fixed two bulletins in SubmitQueryJob for non-supported objects. - Snowpipe Streaming: Added "Disabled" option for Offset Token Resolution in PublishSnowpipeStreaming. - Snowpipe Streaming: Added channel error message on invalid rows log. ### Connectors 2026.3.4.15 - Added customer-facing metrics for MySQL connectors. - Added customer-facing metrics for PostgreSQL connectors. - Added customer-facing metrics for Oracle connectors. - Confluence: Fixed user emails with quotes breaking the connector. - Salesforce Bulk API: Added WaitForBulkJobs for warehouse usage cost optimization. ## February 26, 2026 ### AWS Data Plane Agent 1.24.0 - Improved upgrade speed and reliability from EKS 1.32 to 1.34, fixing the temporary "Upgrade Failed" status for BYOC and BYO-VPC deployments. ### Runtime Server 2026.2.26.15 - Expression Language: Added `unique()` function for removing duplicates from delimited strings. ### Runtime Extensions 2026.2.26.16 - CDC Oracle: Added configurable starting position in CaptureChangeOracle to control where CDC begins reading. - CDC Oracle: Added SSL/TLS connection support for CaptureChangeOracle. - CDC Oracle: Removed preview tags from multi-database Oracle CDC processors (now generally available). - CDC SQL Server: Fixed ChangeTrackingPosition parsing. - CDC Databases: Removed stale entries from IncrementGroupAttribute processor to prevent unbounded state growth. - Salesforce Bulk API: Fixed Merge Query failing when containing reserved keywords. - Salesforce Bulk API: Moved row deduplication in the Merge Query to fix an error and remove the need for the pre-SQL query DELETE. - Salesforce: Populated `sErrorMessage` when duplicate error occurs in `UpsertSFDCObjects` processor. ### Connectors 2026.2.26.15 - CDC Oracle: Added SSL/TLS connection support in Oracle connector. - CDC Oracle: Added schema name mapping in Oracle connector. - CDC Oracle: Parameterized starting position properties in Oracle connector. - CDC Oracle: Fixed missing log in concurrent snapshot. - Salesforce Bulk API: Ignore changes on not null constraints to prevent ingestion failures. ## February 25, 2026 ### Runtime Server 2026.2.24.16 - Security patches and dependency upgrades. - Improved observability for Connectors and custom groovy scripts. ### Runtime Extensions 2026.2.24.20 - Salesforce Bulk API: Added SubmitDeleteJob processor to delete data using Bulk API. - Snowpipe Streaming: Added PublishSnowpipeStreaming processor. - CDC SQL Server: MultiDatabaseCaptureChangeSqlServer now has parameterized concurrency level. - CDC Oracle: Fixed CaptureChangeOracle processor blocking during license validation. - CDC Oracle: Added source state verification in CaptureChangeOracle to detect source database issues. - CDC PostgreSQL: Added support for enum primary key in PostgreSQL. - CDC PostgreSQL: Map PostgreSQL `DOUBLE PRECISION` and `MONEY` types to `RecordFieldType.DOUBLE`. - Snowpipe Streaming v2: Added Snowflake Managed Authentication to PutSnowpipeStreaming2. - CDC PostgreSQL: Added Password Provider support to CaptureChangePostgreSQL, which gives support for AWS IAM Authentication with AWS RDS. ### Connectors 2026.2.24.20 - Salesforce Bulk API: Added `CLUSTER BY ("ID")` on table creation for better query performance. - Salesforce Bulk API: Disabled NOT NULL constraints on Alter Table processors to prevent ingestion failures. - Salesforce Bulk API: Added description for 'Enable Journal Tables' parameter. - Slack: Connector performance optimizations. - CDC SQL Server: Fixed EnforceOrder processor being triggered every second instead of on flow file arrival. ## February 20, 2026 ### AWS Data Plane Agent 1.23.0 - Fixed CloudFormation template formatting that could cause false drift detection by Terraform. - Fixed a rare issue with custom ingress and PrivateLink where EKS control plane nodes couldn't communicate with worker nodes. ### Control Plane UI 0.70.0 - Runtime diagnostic bundles are now sorted consistently. ### Control Plane Core 0.104.0 - Deployments and runtimes remain accessible after an organization or account name change. - Removed the temporary restriction that limited Snowflake deployment upgrades to deployment owners whose active role matched the deployment owner role. ### Data Plane Service 0.104.0 - Deployments and runtimes remain accessible after an organization or account name change. ### Runtime Server 2026.2.19.16 - Fixed an issue where the flow version changed unexpectedly when the flow contains a ghosted parameter provider. - New Runtime UI 0.65.0. - The provenance lineage view now displays the component type alongside the event type. - Diagnostic bundles are now sorted and ordered consistently. ### Runtime Extensions 2026.2.19.20 - Azure: Added support for Azure federated identity credentials. - Google Ads: GetGoogleAdsReport now supports batch ingestion with configurable date range batching. - CDC Oracle: Fixed ALTER TABLE parsing for integer-type columns (INT, SMALLINT, INTEGER, DEC, DECIMAL, NUMERIC) that incorrectly defaulted the scale to 19 instead of 0 when precision wasn't specified. - Fixed S3 processors using the global endpoint for `us-east-1`. - Fixed an error in DBCPConnectionPool when a dynamic property has a null value. - Kafka: ConsumeKafka now includes a `kafka.timestamp` attribute on FlowFiles emitted with the `Record` processing strategy. - Kinesis: ConsumeKinesis now supports a `Demarcator` processing strategy. - Snowpipe Streaming v1: PutSnowpipeStreaming now includes a `Binary Encoding Format` property for HEX binary string data. - CDC SQL Server: MultiDatabaseCaptureChangeSqlServer now uses dynamic backoff when there are no new changes. - CDC Multi-Database: MultiDatabaseFetchTableSnapshot can now run multiple select statements concurrently. - CDC SQL Server: Fixed an ingestion failure when a table is re-added with a different schema. - CDC Oracle: Fixed handling of license changes in a duplicated database. - CDC Databases: Improved error handling for DML operations in the EnrichCdcStream and MultiDatabaseEnrichCdcStream processors. - SharePoint: Fixed file path decoding for folders containing percent signs. - CDC Databases: Warnings are now logged when oversized values are set to null, making it easier to identify data truncation. - CDC Databases: Added a `CLEARING_FLOWFILE_FAILED` failure reason for table state tracking. - **Behavior change:** Removed the Vectara, Pinecone, RAG evaluation, Milvus, and Cohere bundles. ### Connectors 2026.2.19.20 - CDC Oracle: The default snapshot fetching strategy is now `CONCURRENT_BY_ROWID` instead of `SEQUENTIAL_BY_PRIMARY_KEY`, improving snapshot performance. - CDC SQL Server: Added customer-facing metrics for the SQL Server multi-database connector. - Slack: Thread broadcast replies are now filtered from Slack collection to prevent duplicate messages. - Salesforce Bulk API: The staging table is now truncated instead of deleted, preventing channel invalidation errors with Snowpipe Streaming. - Salesforce Bulk API: Object filters are no longer case-sensitive. - Salesforce Bulk API: Merge queries are no longer executed when no data has been captured. - Salesforce Bulk API: Added an `Enable Journal Tables` parameter (default: false) that creates a `JOURNAL_Object` table where data changes are appended. - CDC SQL Server: Snapshots now use multiple channels per table to improve throughput. - CDC PostgreSQL: The oversized value strategy is now configurable in the PostgreSQL connector. ## February 13, 2026 ### AWS Data Plane Agent 1.20.0 - Fixed an upgrade issue for older BYOC deployments where permissions failures occurred for tags on IAM OpenID Connect providers. - BYOC deployments now more clearly report their `Upgrading` status. ## February 11, 2026 ### Control Plane Core 0.102.0 - BYOC deployments now automatically restore access to runtimes when their AWS load balancers are recreated with a new DNS. - Improved upgrade reliability for deployments and runtimes. ### Openflow Runtime Gateway 2026.2.10.21 - Fixed connector installation failures in Snowflake deployments with PrivateLink enabled. ### AWS Data Plane Agent 1.19.0 - Fixed an upgrade issue for older BYOC deployments caused by an `eks:ListTagsForResource` permissions failure. ### Runtime Server 2026.2.10.18 - Python: Fixed an issue where NAR deletion could block indefinitely while a Python processor was initializing. - Fixed Parameter Provider version fallback when importing a flow. - Fixed Parameter Context binding for new process groups during version upgrades. ### Runtime Extensions 2026.2.11.9 - Kafka: Fixed an issue where ConsumeKafka could create duplicate messages during a consumer group rebalance. - Parquet: Fixed a ParquetReader error (`ClassCastException`) for `java.time` logical types. - MongoDB: Added components for the upcoming private preview of the MongoDB CDC connector. - Slack: Fixed duplicate messages caused by thread broadcast replies. - MySQL: Added an oversized data property to the CaptureChangeMySQL processor. - MySQL & PostgreSQL: Fixed FetchTableSnapshot incorrectly flagging interim FlowFiles as the final snapshot. - MySQL & PostgreSQL: You can now configure how values larger than 16 MB are handled when they exceed the supported limit. - Confluence Data Center: Added support for the export page permission. ### Connectors 2026.2.10.18 - Box: Removed the concurrency limit on stage inserts, improving overall performance. - MultiDB MS SQL Server: Added schema name mapping. ## February 6, 2026 ### Runtime Operator 0.54.0 - Fixed asset synchronization in runtimes when parameter providers are used. ### AWS Data Plane Agent 1.18.0 - Fixed an issue where migrating secrets during an upgrade caused failures for AWS deployments between versions 0.55.0 and 1.1.0. ## February 4, 2026 ### Runtime Server 2026.2.3.19 - Python: Fixed an issue where imported properties couldn't be used as `PropertyDependency` parameters in Python processors. - Records: Added timestamp truncation support in the RecordPath DSL. - New Runtime UI 0.64.0. ### Runtime Extensions 2026.2.4.10 - Iceberg: Added `Endpoint URL` and `Path Style Access` properties to the S3 FileIO Iceberg Provider. - Avro: Added a `Fast Reader Enabled` property to the Avro Reader. - CDC Databases: MultiDatabaseFetchTableSnapshot now numbers outgoing FlowFiles with a 1-based `chunk.index` attribute. - CDC Databases: The EnrichCdcStream and MultiDatabaseEnrichCdcStream processors now write `min(seenAt)` to FlowFile attributes. - CDC Databases: FlowFile attributes now include the number of rows inserted and updated during journal merge. - CDC Oracle: Oracle DML/DDL FlowFiles now include index attributes, consistent with other CDC database components. - CDC MySQL: Fixed replication failures for zero-date datetime values (such as 0000-00-00) by aligning snapshot and CDC mapping. - Salesforce Bulk API: Base64 fields (Blobs) are now automatically skipped for synced objects because this type isn't supported by the Bulk API. ### Connectors 2026.2.3.18 - Kafka: New Kafka to Snowflake connector with Kafka OAuth authentication support. - CDC Databases: Non-CDC processors in CDC connectors now include a table state change reason. - Salesforce Bulk API: Reduced the default `Max Batch Size` in PutSnowpipeStreaming to lower memory pressure for records with large fields. - Salesforce Bulk API: Added a parameter to disable incremental offloading, allowing full object syncs each execution to account for formula fields. - Salesforce Bulk API: Added support for non-Bulk API compatible objects such as Knowledge data. ## February 3, 2026 ### Control Plane Core 0.101.2 - Temporarily restricting Snowflake deployment upgrades to users whose active role matches the deployment owner role until a related issue is resolved. ## February 2, 2026 ### Control Plane Core 0.101.1 - Temporarily limiting Snowflake deployment upgrades to the deployment owner while an issue preventing roles with `OPERATE` privilege from upgrading is resolved. ## January 30, 2026 ### Control Plane UI 0.69.0 - Fixed an issue where some actions weren't reevaluated on the current page after an active role change. - BYOC deployments running the latest version now show their current status while processing actions like creating, upgrading, and deleting, including reporting failures when they occur. - Added a `Download validator` button to the deployment creation dialog. ### Control Plane Core 0.101.0 - Fixed an issue where Snowflake deployments briefly showed a `Not Healthy` status while creating, just before becoming active. - BYOC deployments running the latest version now show their current status while processing actions like creating, upgrading, and deleting, including reporting failures when they occur. - Improved the logic for showing the Private Link option when creating SPCS deployments to avoid failures when the option isn't fully supported. - Added an API to generate and download CloudFormation templates for BYOC and BYO-VPC validators. ### Data Plane Service 0.101.0 - Fixed runtime creation on newly active Snowflake deployments. Previously, the latest available runtime versions weren't always used. ### AWS Data Plane Agent 1.16.0 - Snowflake-hosted container images are now pulled directly from Snowflake registries into the deployment EC2 agent host and EKS cluster. Upgrade existing Openflow runtimes to switch entirely to Snowflake-hosted images. - The agent now reports its current status to Openflow while processing user-requested actions, so it can be reflected in the Control Plane UI. - Security patches and dependency upgrades. - Added quick validation tools for BYOC and BYO-VPC deployments that report common errors to resolve before installing a full Openflow cluster. - Improved reliability of deployment upgrades by automatically resolving issues where services were blocked from starting. - Improved reliability of deleting deployments that had been upgraded multiple times. ### Runtime Server 2026.1.29.22 - Upgraded JDK to 21.0.10. - Upgraded Apache NiFi API to 2.6.0, adding support for the `Record Gauge` method in ProcessSession. ### Runtime Extensions 2026.1.29.23 - Added the UpdateGauge processor with configurable `Gauge Name` and `Gauge Value` recording. - Improved JSON Schema validation in GenerateJSON to address potential edge cases for nested fields. - Added the PutIcebergRecord processor and Iceberg REST Catalog controller services, supporting both AWS and Azure storage FileIO providers. - Deprecated the PutIcebergTable processor in favor of PutIcebergRecord. ## January 23, 2026 ### Runtime Server 2026.1.22.19 - Resolved an issue where simultaneous commits to a Git-based Flow Registry Client could cause one user's changes to overwrite another's. - New Runtime UI 0.63.0. ### Runtime Extensions 2026.1.22.19 - Salesforce Bulk API: Fixed an edge case where the initial snapshot might not create the destination table as expected. - BigQuery: Processor properties now reference FlowFile attributes, making it easier to understand component behavior. - Snowpipe Streaming v2: PutSnowpipeStreaming2 now tracks request IDs and automatically terminates empty relationships. - CDC SQL Server: You can now set a maximum FlowFile size in CaptureChangeSQLServer. - Jira: Components now include a verification feature to confirm that your configuration is correct. - Confluence: Components now include a verification feature to confirm that your configuration is correct. ## January 21, 2026 ### Runtime Server 2026.1.20.19 - You can now configure custom SSL certificates in GitHub and Gitlab Flow Registry Clients. ### Runtime Extensions 2026.1.20.21 - Enhanced PerformSnowflakeCortexOCR with page splitting and filtering features. - BigQuery: Fixed time travel timestamp handling in TriggerBigQueryCdcOnState processor. - Jira: Better handling of API rate limiting. - CDC Oracle: You can now set a maximum FlowFile size in CaptureChangeOracle. - CDC MySQL: Added logging in CaptureChangeMysql processor to log the retention period for binlog on start. - Confluence: The connector can now ingest file attachments and embedded images. - CDC MySQL and PostgreSQL: FetchTableSnapshot now includes partition chunk attributes to enable multi-channel streaming. ### Connectors 2026.1.20.18 - All Connectors: The default Snowflake Authentication Strategy is now SNOWFLAKE_MANAGED, a token-based method that works in both SPCS and BYOC deployments. - Salesforce Bulk API: Added new parameter, Initial Load Chunking. This option lets you split large initial data loads into time-based chunks (MONTHLY, QUARTERLY, YEARLY) to avoid timeouts and API limits. When set, the initial data load is split into multiple jobs based on the interval. On the first run for an object, the connector queries Salesforce to find the oldest record and uses that as the starting point. Each subsequent job queries the next time chunk until caught up to the current time. Once caught up, the processor continues with normal incremental offload behavior. - Oracle: Initial snapshot loads can now run with multiple concurrent threads for faster performance. - SharePoint: The connector now logs when it encounters and processes empty files. - Confluence: A new connector version is available that does not fetch access control lists (ACLs). ## January 16, 2026 ### Runtime Server 2026.1.15.20 - New Runtime UI 0.62.0. - The copy button in the Bulletin tooltip has been moved so it's always visible. ### Runtime Extensions 2026.1.15.20 - SQL: Added support for Pre-Queries and Post-Queries in PutDatabaseRecord processor. - CDC PostgreSQL: You can now set a maximum FlowFile size in CaptureChangePostgreSQL. - CDC PostgreSQL and MySQL: FlowFiles now include start.row.index and last.row.index attributes. - CDC MySQL: CaptureChangeMySQL now reads the event position from the header instead of from the binlog client. - CDC Connectors: Splitting FetchTableSnapshot output FlowFiles into chunks of MAX_OUTPUT_FLOWFILE_SIZE size. - Snowpipe Streaming: PutSnowpipeStreaming2 now has dedicated handling for empty FlowFiles. - Salesforce Bulk API: Added support for Objects without SystemModStamp field. - Salesforce Bulk API: You can now configure how the initial snapshot is split into time-based chunks. ### Connectors 2026.1.15.18 - Salesforce Bulk API: Added support for Objects with Tracking History enabled. - Salesforce Bulk API: Added support for Objects without SystemModStamp field. - CDC Connectors: Clearer log messages when a table enters a failed replication state. - Google Ads: New "Login Customer ID" parameter lets you specify which manager account (MCC) to fetch reports for. - Dataverse: The COPY GRANTS option is now applied to destination tables. ## January 15, 2026 ### AWS Data Plane Agent 1.15.0 - Resolved an issue where some IAM policies were not deleted when a Deployment was deleted. ## January 14, 2026 ### Runtime Server 2026.1.13.18 - Resolved an issue with how validation was triggered when Flow Registry Clients were configured. ### Runtime Extensions 2026.1.13.19 - Google Ads: The connector now works with manager accounts and their subaccounts. - Oracle: Added new processors designed to accelerate initial snapshot loads. - Snowpipe Streaming: PutSnowpipeStreaming2 now includes a counter for each destination. - SQL: PutDatabaseRecord now uses setBytes binding for BINARY SQL types. ### Connectors 2026.1.13.16 - Slack: Improved handling of file attachments with Slack messages. - Unstructured Connectors: Resolved Null Pointer Exceptions that occurred when parameters were left empty. - Google Drive: You can now specify multiple folders by using a comma-separated list in the "Folder Name" parameter. - Google Drive: New Simple Ingest and Cortex connectors that don't require domain-wide delegation. - Streaming Destination Modules: PutSnowpipeStreaming now limits channel concurrency for streaming destinations. ## January 12, 2026 ### Control Plane UI 0.68.0 - You no longer need OWNERSHIP privilege on the Snowflake Role when configuring BYOC and SPCS Runtimes. - You no longer need CREATE USER privilege to create a BYOC Runtime. - **Behavior change:** Starting with AWS Data Plane Agent 0.37.0, you must specify a Snowflake Role when creating a Runtime. ## January 8, 2026 ### Control Plane UI 0.67.0 - The Deployment details dialog now correctly shows Private Link and End User Auth over Private Link settings. - The SAP connector card now displays an updated icon. - The Runtime and Deployment details dialogs now display the SQL name when available. - The Create Runtime dialog now requires a Snowflake role. It no longer requires CREATE USER privilege. ### Data Plane Service 0.98.0 - The system now polls less frequently for new Runtime versions, reducing query costs. - Runtime Upgrades are now more reliable because all related components are discovered and upgraded together. ### AWS Data Plane Agent 1.13.0 - Resolved an upgrade failure affecting older Deployments that pulled helm charts from AWS OCI Repository. ### SPCS Data Plane Agent 1.11.0 - The deployment creation sequence has been optimized to reduce wait time. ## January 6, 2026 ### Runtime Server 2026.1.5.14 - When you clear bulletins on a process group, bulletins for its scoped controller services are also cleared. - Registry Clients no longer log confusing WARN messages when you commit the first version of a flow. ### Runtime Extensions 2026.1.5.19 - Oracle: Archive logs are now properly removed even when database traffic isn't captured by XStream Out server. - JIRA: Resolved a resource leak triggered by certain HTTP error codes and improved log messages. - Azure components: Fixed NoClassDefFoundError: io/netty/handler/codec/quic/Quic. - Kafka: The verification process is improved and now returns information about the Kafka Connection Controller Service. - MS SQL Server: Database names with special characters are now properly quoted when available tables are fetched. ### Connectors 2026.1.5.13 - All Database CDC Connectors: The snapshot completion log now shows the correct total number of rows ingested. - All Unstructured Connectors: The Cortex service name parameter is now correctly applied to documents. - MySQL & PostgreSQL: You can now configure concurrency settings for Snapshot loads. - JIRA: Performance is improved by reducing small FlowFiles and batching data sent via Snowpipe Streaming. - Google Drive: Inserts via Snowpipe Streaming can now run in parallel instead of sequentially. ## December 19, 2025 ### Runtime Oracle Extensions 2025.12.19.8 - Fixed an issue validating Oracle licenses that prevented the OracleCapture processor from starting. - Improved change detection for large schemas ## December 17, 2025 ### Runtime Server 2025.12.16.19 - Improved how invalid controller services are handled when you enable or disable them. - Included Registry Clients in the Runtime documentation. ### Runtime Extensions 2025.12.16.19 - PostgreSQL: Fixed ordering of composite key columns. ### Connectors 2025.12.16.19 - Salesforce Bulk API: Added a new parameter to control case sensitivity for object identifiers created in Snowflake. By default, column names remain case sensitive for backward compatibility. This default may change at public preview or general availability. - Confluence Data Center: New connector to integrate with Confluence Data Center edition. ## December 16, 2025 ### AWS Data Plane Agent 1.12.0 - Fixed an issue where BYOC deployment upgrades failed due to a mismatch between the machine image and Kubernetes cluster versions. - Fixed an issue where BYOC deployment upgrades failed with the error message "OCI Registry Login Failed". ## December 11, 2025 ### Control Plane Core 0.95.0 - Fixed an issue where the Runtime Run As Role couldn't be set for roles containing Snowflake-restricted characters, such as hyphens. ### Runtime Server 2025.12.11.21 - Improved behavior when enabling controller services that are invalid and shouldn't be enabled. - New Runtime UI 0.59.0. - Registry clients now support property verification. ### Runtime Extensions 2025.12.11.21 - AWS Secrets Manager: Parameter Provider now considers non-string values as valid parameters. - RenameRecordField processor now properly handles multiple records per FlowFile. - Kinesis: Fixed an issue where the ConsumeKinesis processor throttled new records even when buffers were empty. - Snowflake: Added a default network timeout to the Snowflake Connection Service. - Confluence: Fixed handling of page deletion. - Confluence Data Center: Fixed the HTTP response decoder for the client. - MySQL: Improved logging for table mapping when consuming binlog events. - CDC Databases Connectors: Observability dashboards now display the failure reason when a table's replication status changes to failed. ### Connectors 2025.12.11.18 - SQL Server: Exposed new parameters (Re-read Tables in State and Starting Change Tracking Position) for starting position. - Oracle: Set CASE_INSENSITIVE as the default for created Snowflake objects. - Jira: Added support for App Forge authentication method. - Confluence: Added support for App Forge authentication method. - Oracle: Fixed missing service name in XStream URL in default parameter values. - Oracle: Added support for internationalization. - Unstructured Connectors: Added a parameter to specify the Cortex Search Service name. - Slack Connectors: Added a parameter to control whether user names are resolved. ## December 9, 2025 ### Control Plane Core 0.94.0 - Added support for accessing and using Openflow with an organization or account that has been renamed. ## December 8, 2025 ### AWS Data Plane Agent 1.11.0 - Fixed issue upgrading older deployments with non-critical "inconsistent result after apply" error message. ## December 5, 2025 ### Runtime Server 2025.12.4.19 - New Runtime UI 0.58.0. - Added new action to clear bulletins. - Improved error handling when launching the Status History dialog. ### Runtime Extensions 2025.12.4.19 - Kinesis: Fixed checkpoint committed records in ConsumeKinesis that could previously cause data loss. - PostgreSQL: Fixed issue where CaptureChangePostgreSQL ignored events when data was loaded via COPY FROM STDIN. ### Connectors 2025.12.4.17 - CDC SQL Server MultiDB: Added and exposed support for case sensitivity for created Snowflake objects. ## December 3, 2025 ### AWS Data Plane Agent 1.10.0 - Added support for encrypting EBS volumes across the entire Openflow Deployment. ### SPCS Data Plane Agent 1.9.0 - Snowflake Deployments encountering internal certificate authority mismatch issues are now auto-healed on upgrade. ### Control Plane Core 0.93.0 - Retained visibility and use of resources when an account name or organization is changed. - Improved resource utilization efficiency for Small size runtimes in Snowflake Deployments, allowing 3 runtime pods per node instead of 2. - Added Manage endpoints action for SPCS deployments (requires account parameter). - Improved external access integration (EAI) list to only show those EAIs the user has access to when creating a runtime in a Snowflake Deployment. ### Data Plane Service 0.93.0 - Improved resiliency of automatic diagnostic bundling and cleanup behavior when a runtime fails to create. - Added management capabilities for Openflow endpoints in a deployment accessible via new API methods. - Extended wait time for runtime upgrade failures in SPCS deployments to avoid premature timeout and failure. ### Control Plane UI 0.65.0 - Added Manage endpoints action for SPCS deployments (requires account parameter). ### Data Plane UI 0.11.0 - Added Openflow endpoints management view for SPCS deployments (requires account parameter). ### Openflow Ingress Controller 2025.12.2-17 - Added support for routing to Openflow endpoints attached to Openflow runtimes. - Fixed client IP address forwarding when evaluating Snowflake privileges for Openflow runtimes. - Fixed request header propagation to support deployments with Private Link enabled. ### Runtime Server 2025.12.3.16 - Added support for discovering listen ports from Openflow runtime processors to provide users as available targets for Openflow endpoints. - Controller Services: Fixed validation and enabling that could take too long and cause the runtime to not start. ### Runtime Extensions 2025.12.3.16 - Kinesis: Introduced Shared Throughput consumer in ConsumeKinesis and removed concurrency limits in the HTTP client. - Kafka: Added support for specifying custom SASL Extensions. - EventHub: Added support for OAuth authentication in EventHub processors. - AWS RDS: Added support for AWS RDS IAM Authentication in the DBCP Connection Pool to access databases over JDBC. - Listen* Processors (Examples: ListenHTTP, HandleHttpRequest, ListenOTLP): Added support for new ListenComponent and ListenPortDefinition NiFi APIs to allow discovery of listen ports for use with Openflow endpoints. - OpenflowRuntimeSSLContextProvider: Added new control service for use with Listen* Processors to integrate with Openflow endpoints. - Oracle: Fixed handling of case sensitivity on column names when using lower casing. - Oracle: Fixed support for internationalization. - MySQL: Fixed filtering of Azure-specific system tables. - BigQuery: Added new components for the Google BigQuery Change Data Capture (CDC) connector. - SQL Server: Added ability to choose the starting position when reading the stream. - Confluence Data Center: Improved support for Audit Records ingestion. - Confluence: Improved performance to retrieve Confluence page IDs. - Confluence: Added support for Forge App authentication method. - Google Drive: Improved recursive listing efficiency when listing the content of a drive. - Slack: Fixed fetching information of users for large workspaces with a large number of users. ### Connectors 2025.12.3.15 - Google Drive: Fixed potential NullPointerException when Google Drive Folder parameter is not set. - Slack: Added check to verify files have content before uploading to Snowflake. - CDC PostgreSQL: Increased backpressure settings to better support large number of synced tables. - Dataverse: Improved the query for the deletes in the Journal Table. ## December 2, 2025 ### Control Plane Core 0.92.0 - Fixed a thread contention issue in Snowflake Deployments that could cause some Runtime actions triggered from Control Plane to time out and fail. ### Data Plane Service 0.92.0 - Fixed a thread contention issue in Snowflake Deployments that could cause some Runtime actions triggered from Control Plane to time out and fail. ### Openflow Ingress Controller 2025.11.20-18 - Added support for Programmatic Access Token authentication and authorization. ### Openflow Runtime Gateway 2025.11.19.22 - Added support for Programmatic Access Token authentication and authorization. ## November 21, 2025 ### AWS Data Plane Agent 1.8.0 - Restores support for private Openflow BYOC Deployments by removing all dependencies on URLs outside of Snowflake and AWS, addressing an issue introduced in 1.6.0 ### Control Plane Core 0.91.0 - Fixed a rare issue that prevented a Runtime from being activated after it had been suspended ### Data Plane Service 0.91.0 - Fixed an issue causing connector installations to fail on new Runtimes when bulletins are present ## November 20, 2025 ### Runtime Server 2025.11.20.20 - Improved visibility of Runtime operations with new metrics for Connectors ### Runtime Extensions 2025.11.20.19 - New components to interact with SAP Business Data Cloud and mapping of CSNs into Snowflake Semantic Views - CDC MySQL - Improved reliability by clearing out the table map prior to CDC reconnects - CDC Databases - Fixed a potential deadlock issue with MergeSnowflakeJournalTable when "poll query result" is cleared during operation - CDC MariaDB - Added support for MariaDB in the MySQL components - CDC PostgreSQL - Added support for primary keys of type `numeric` ### Connectors 2025.11.20.17 - CDC Databases - Adjusted backpressure thresholds on some connections when processing a lot of data - Salesforce - Gracefully handle scenarios where we insert duplicate rows in the staging table - CDC Databases - Exposed parameter to enable private connectivity in the PutSnowpipeStreaming processor for data ingest - CDC PostgreSQL - Adjusted yield duration on CaptureChangePostgreSQL to not overuse replication connections ## November 19, 2025 ### Runtime Extensions 2025.11.18.22 - SQL Server - Performance improvement in Snapshot query - Dataverse - Schemas are no longer filtered when no column filtering value is provided - Telemetry - "Bytes Received" is now available for many Snowflake processors after fixing the file size for provenance events - Azure components - Fix ConsumeAzureEventHub by excluding netty-codec-http3 dependency - Google Cloud - Added support for Workload Identity Federation - Azure Blob Storage - Added support for uploading file larger than 200GB ### Connectors 2025.11.18.17 - Google Ads - Set the new Authentication Strategy property of the GCP Credentials Controller Service - Multi Database SQL Server - Fix `source.table.fqn` value handling - Salesforce - Add logging for successful sync operations to ease monitoring via the events table ### AWS Data Plane Agent 1.6.0 - Improves security by removing unused inbound ports on Load Balancers configured for "Custom Ingress." You can further limit access with your own Security Group for these Openflow Deployments. - Improves security and eased configuration of Runtimes with an optional Deployment-level IAM Role to securely access AWS resources like RDS, MSK, Kinesis, and S3. You can now attach IAM Policies to Openflow's "NodeInstanceRole" that are granted to all Runtimes in that Openflow Deployment. - Upgrades EKS Cluster from 1.32 to 1.34 for long-term maintenance and security patching - Resolves an issue where restarting some EC2 nodes frequently caused the Openflow Deployment to freeze - Security patches and upgrades to third party libraries ### SPCS Data Plane Agent 1.4.0 - A missing event table will no longer cause failure when creating an Openflow Snowflake Deployment. - Fixed certificate based issues accessing Runtimes and deploying Connectors into Deployments older than 60 days - Security patches and upgrades to third party libraries ### Control Plane UI 0.64.0 - Adding support for the new Cleaning Up Runtime state - Adding support for terms accepted trial not started or active trial ### Control Plane Core 0.90.0 - If a Runtime fails to create, it will automatically generate a diagnostics bundle and clean up any partially created resources in the cluster. ### Data Plane Service 0.90.0 - If a Runtime fails to create, it will automatically generate a diagnostics bundle and clean up any partially created resources in the cluster. ### Openflow Ingress Controller 2025.11.12-18 - Security patches and upgrades to third party libraries ## November 15, 2025 ### Runtime Extensions 2025.11.16.2 - Improved reliability for high volume deployments by relocating state tracking for replication position and journal versioning in CDC Connectors ## November 14, 2025 ### Runtime Server 2025.11.14.17 - Easier debugging with an updated Bulletin Board that can expand stack traces - Fixed bug when rendering Documentation for extensions that lack tags - Viewing component state now supports showing 5,000 local entries and 5,000 cluster entries, up from 500 each ### Runtime Extensions 2025.11.14.17 - Reduced costs by removing the validation query in the Snowflake Connection Service ### Connectors 2025.11.14.14 - Google Drive and Google Sheet - Set the new Authentication Strategy property of the GCP Credentials Controller Service ## November 13, 2025 ### Runtime Extensions 2025.11.13.19 - Added support for Web Identity authentication to AWS MSK IAM Connection Service in Kafka components - Added support for Web Identity authentication to AWS Credentials Controller Service for all AWS components - Added Flow Registry Client support for Bit Bucket Data Center edition - Fixed Worker ID generation in ConsumeKinesis and added provenance data - Added support for nested paths in HashiCorpVaultParameterProvider - Dataverse - Added retry-after mechanism - Added Snowflake Secrets Parameter Provider - CDC Database Connectors - Improved reliability and performance on state management - JIRA - Enriched issues with email addresses - HubSpot - Fixed handling of 414 error code responses while fetching objects ### Connectors 2025.11.12.21 - Dataverse - Add Column JSON Filtering parameter - PostgreSQL - Added FIFO FlowFile prioritizer on queues in Postgres Snapshot Load - MySQL - Expose parameter for the starting position of the replication - Dataverse - Added updated_ad and deleted columns - Salesforce - Switched the CSV Reader to RFC 4180 - Salesforce - Fixed configuration to capture soft deletes ## October 31, 2025 ### AWS Data Plane Agent 1.1.0 - Security patches and upgrades to third party libraries ### SPCS Data Plane Agent 1.1.0 - Security patches and upgrades to third party libraries ### Control Plane Core 0.88.0 - Enable Openflow Oracle Connector in Snowflake Deployments ### Data Plane Service 0.88.0 - Security patches and upgrades to third party libraries ### Runtime Operator 0.45.0 - Security patches and upgrades to third party libraries ### Runtime Extensions 2025.10.31.13 - Improved reliability of high volume CDC Connectors ## October 30, 2025 ### Runtime Extensions 2025.10.30.21 - CDC Database Connectors - New components for multi-databases support are now included in the runtime image - JIRA - Added support for Forge App authentication method - New OAuth2 controller service to get Snowflake issued JWTs for Workload Identity Federation ### Connectors 2025.10.30.20 - PostgreSQL - Added First In First Out (FIFO) connection prioritizer in PostgreSQL Snapshot load - CDC Database Connectors - Disabled load balancing in the incremental flow to ensure single node processing of the data - Dataverse - Added parameter for the new JSON Column Filtering property ## October 29, 2025 ### Runtime Server 2025.10.28.18 - Fixed bug in Summary table formatting Process Group task time ### Runtime Extensions 2025.10.28.20 - Kinesis - Support Output Strategy property in ConsumeKinesis processor - Kinesis - Added the new Kinesis components leveraging the latest AWS client library - SQL Server - Added support for multiple databases - MySQL - Added the possibility to specify the binlog starting position for reading the CDC stream - PostgreSQL - Added support for negative scale in numeric types - SQL Server - Improved the ordering of the ORDER BY clause - Snowpipe Streaming - Improved Input Buffer Handling in PutSnowpipeStreaming2 - PostgreSQL - Improved performances in FetchTableSnapshot on large tables with composite primary key - MySQL - Fixed incorrectly replicated DATEs pre 1582-10-15 (Julian calendar) ### Connectors 2025.10.28.9 - Oracle - support for multiple logical databases - MySQL, PostgreSQL, SQL Server - no longer writing the unused avro.schema FlowFile attribute - Jira - support for fetching Worklogs ## October 28, 2025 ### AWS Data Plane Agent 0.61.0 - Security patches and upgrades to third party libraries ## October 27, 2025 ### Control Plane UI 0.63.0 - Security patches and upgrades to third party libraries ## October 24, 2025 ### Control Plane Core 0.86.0 - Fixed issue where a user can't log into Openflow if their most recently selected active role was revoked. - Disable creating a Snowflake Deployment if the user's role is not granted CREATE COMPUTE POOL privilege. ### Control Plane UI 0.62.0 - Improved display for Upgrade Failed, Inactive, and Activate Failed states - Always show the current "Run As" role even if it's not in the current user's set of account roles - Fixed issue with "Run As" role validation in Create Runtime dialog ## October 23, 2025 ### Runtime Server 2025.10.23.16 - Bulletin icons now reflect the severity of the message - Parameters can now be edited by double clicking on the row in the Parameter Context - Included state of system diagnostics API call in the loading skeleton and spinner in the Cluster Listing - Improved awareness of errors through the global banner when extension types fail to load - Updated styling for unset, blank, empty styles throughout the Runtime UI ### Runtime Extensions 2025.10.23.11 - Fixed incorrect handling of Drop Table actions in UpdateSnowflakeTable processor - Oracle - Improved performance by moving metadata generation in FetchSnapshot processor - Oracle - Fixed handling of column filters DDL - Dataverse - Added optional configuration to filter columns being fetched - Cortex - Improved error message when there is an issue calling Cortex in PromptSnowflakeCortex processor - MySQL - Fixed the filtering out of the "user" table - Salesforce Data Cloud - Added support for detecting deletions of Data Shares and linked Objects in the shares - MySQL - Fixed skipping compressed transaction DDLs and DMLs spanning over the transaction - JIRA - Enrich Jira Worklogs processor - Confluence - Support for Confluence Data Center edition - Added Offset Tracking Resolution to PutSnowpipeStreaming2 processor - Sharepoint - Fixed pagination handling when listing more than 200 items - Salesforce - Added optional lookup key in UpsertSFDCObjects processor allowing user to specify a field other than ID for retrieving the record to upsert ### Connectors 2025.10.22.17 - Excel - Added missing SPCS related configuration options - HubSpot - Added support for new object types: Notes, Orders and Carts - Salesforce - Added missing configuration for authentication strategy for usage of the connector in Openflow Snowflake Deployments - PostgreSQL - Migrated the connector to standard identifiers for better management of case sensitivity on object naming - Oracle - Removed the addition of Snowflake Specific Columns to leverage FetchSnapshot processor instead and improve performances - Sharepoint - New Simple Ingest connector that does not fetch the ACLs associated to the data - Salesforce - Added support for specifying the object fields that should be included/excluded when retrieving the data ## October 20, 2025 ### Control Plane Core 0.84.1 - Released Control Plane Core version 0.84.1. ## October 17, 2025 ### AWS Data Plane Agent 0.60.0 - Fixed certificate issues that blocked access to runtimes and connector deployments in deployments older than 60 days. ### Control Plane Core 0.84.0 - Fixed input validation issues when filtering in role selection menus. - Fixed an issue where links to runtimes were shown to users without access privileges. - Fixed an issue where users with only USAGE privilege on a runtime couldn't create connectors in that runtime. - Fixed an issue for users accessing Openflow over PrivateLink with a network policy enforcing the VPCE ID. - Added support for suspending and activating runtimes in Snowflake deployments. - Snowflake deployments now display their current version immediately after creation is initiated. ### Control Plane UI 0.60.0 - Warns users before navigating to a deployment or runtime where VPN connectivity may be required when using custom ingress. - Keeps the selection panel open for multi-select components after a selection is made. - Enforces user permissions for viewing the runtime canvas and hides links if permissions are missing. - Improves setup experience by considering total counts of runtimes and deployments, not just those in the ACTIVE state. - Makes the Snowflake role optional when creating a runtime in a BYOC deployment. - Improves text overflow handling for connector cards. ### Openflow Ingress Controller 2025.10.16-17 - Fixed an issue that prevented access to runtimes over PrivateLink. - Fixed an issue where a new runtime couldn't be accessed if its name matched that of a previously deleted runtime. ## October 15, 2025 ### Runtime Extensions 2025.10.14.22 - Added Snowflake Managed Authentication Strategy to SnowflakeConnectionService and PutSnowpipeStreaming. ### Runtime Oracle Extensions 2025.10.14.22 - Improved snapshot query performance by correcting ORDER BY column sorting. ### Runtime Server 2025.10.14.12 - Fixed missing Process Group identifier information in Processor and Controller Service log records. ## October 08, 2025 ### AWS Data Plane Agent 0.59.0 - Added support for workarounds when using self-managed certificates in AWS. - Fixed issues that caused BYOC deployment upgrades to get stuck with invalid image references and job cleanup. - Restored support for adding customer-managed IAM policies to Openflow's IAM roles. ## October 03, 2025 ### Connectors 2025.9.30.17 - Updated the Dataverse connector to set empty collation for the Dataverse journal table. ### Runtime Extensions 2025.10.2.19 - Added better support for case sensitivity on Snowflake objects in `MergeSnowflakeJournalTable`. - Improved HubSpot pagination handling when retrieving more than 10,000 records. - Unstructured Processing - `PerformSnowflakeCortexOCR` now uses the `AI_PARSE_DOCUMENT` function instead of `PARSE_DOCUMENT`. - Added better support for case sensitivity on Snowflake objects in PutSnowpipeStreaming. - PostgreSQL - Fixed unsigned handling of type OIDs in the CaptureChangePostgreSQL processor. ### Runtime Server 2025.9.30.19 - New Runtime UI 0.53.0. - Fixed a regression that prevented tabbed dialogs from remembering the previously active tab. - Fixed balto icon regressions and selected radio button display issues. - Fixed an issue where Parameter Context update requests weren't deleted when users canceled the request. - Fixed an issue that caused double scroll bars to appear in the asset upload dialog. - Fixed an issue where the selected asset count could get out of sync. ## September 26, 2025 ### AWS Data Plane Agent 0.52.0 - Improved efficiency of private IP addresses used by EKS cluster nodes, reducing the total number required for scaling out to many Runtime nodes. - Fixed issue with Runtime logs that incorrectly redacted some component IDs. ### Connectors 2025.9.25.17 - Confluence connector - Better failure handling and retries when facing API rate limits. ### Control Plane Core 0.80.0 - Support for deploying Oracle Runtime Extensions to Runtimes in BYOC Deployments for PrPr customers who have accepted the Terms of Service. - Fixed an issue where Snowflake deployment moved into an active state prematurely during an upgrade. - Fixed a rare issue where Snowflake deployment deletions could get stuck and need manual intervention. ### Control Plane UI 0.57.0 - Introduced new deployment upgrade dialog that shows the version mapping. ### Data Plane Service 0.80.0 - Support for deploying Oracle Runtime Extensions to Runtimes in BYOC Deployments for PrPr customers who have accepted the Terms of Service. ### Runtime Extensions 2025.9.25.19 - CDC database connectors: Removed Record Reader from MergeSnowflakeJournalTable processor. - All connectors log the Query ID whenever a connector executes a query in Snowflake. ### Runtime Oracle Extensions 2025.9.23.19 - PrPr release of Oracle Extension for Openflow Runtimes. ### Runtime Server 2025.9.25.19 - Improved the Openflow Connectors upgrade user experience. ## September 23, 2025 ### Connectors 2025.9.23.17 - PostgreSQL connector now includes a new parameter so you can set the replication slot name. - The PostgreSQL, MySQL, and SQL Server connectors now support column names that include special characters. ### Runtime Extensions 2025.9.23.19 - Added compression to rows added using the Insert Rows method through PutSnowpipeStreaming2. - MySQL: Added support for compressed bin log events. - Added new processors, UpdateSnowflakeSchema and UpdateSnowflakeStream, to better manage object lifecycles and support case sensitivity. - HubSpot: Added support for new "Notes," "Orders," and "Carts" object types. - Slack: Fixed Null Pointer Exception when trying to verify the configuration of ConsumeSlackConservations processor. ### Runtime Server 2025.9.23.19 - Using latest Apache NiFi 2.6.0 release. - Improved the flow upgrade user experience by improving Flow Differences Filters to handle renameProperty, removeProperty, and createControllerService. - New Runtime UI 0.52.0. - Fixed bug allowing default values for dynamic properties. - Improved the performance of the searchable select used in the Property combo editor. ## September 19, 2025 ### AWS Data Plane Agent 0.50.0 - Openflow now supports VPCs with DHCP Option Sets, making it easier to connect to private data sources. - You can now secure Openflow deployments with PrivateLink, while still allowing browser-based authentication to runtimes without PrivateLink. - Fixed an issue during upgrades where IAM inline policies failed by exceeding maximum character limits. ### Control Plane Core 0.78.0 - Improved error messages for Snowflake deployment failures to show the root causes. - Fixed a case where BYOC deployment ends up in Not Healthy state but can't be deleted from Openflow Control Plane. ### Control Plane UI 0.55.0 - Removed unnecessary title on **Runtime and Deployment state** columns. ### Openflow Runtime Gateway 2025.9.18.22 - Improved cookie session handling to allow users to remain logged in, even when Runtime is open in an inactive browser tab. ## September 18, 2025 ### Connectors 2025.9.17.18 - Addition of the 2 new Oracle CDC connectors. - Confluence connector - The introduction of a new controller service to handle API rate limits will show the connector as a process group with local changes. This can be ignored and will be resolved when upgrading the connector to the next version, when available. ### Runtime Extensions 2025.9.18.18 - Introduced the `UpdateSnowflakeTable` processor, which is like `UpdateSnowflakeDatabase`, but designed for tables and improved case sensitivity. ## September 16, 2025 ### Connectors 2025.9.16.18 - SQL Server connector: Exposed the new SQL Server query interval property as a parameter. - The new controller service for API rate limits in the Jira connector causes the connector to appear as a process group with local changes. You can safely ignore this; it will be fixed in a future connector upgrade. ### Control Plane UI 0.54.0 - Allow users to optionally configure whether end users authenticate over PrivateLink. - The **Estimated time to completion** shown when creating Snowflake Deployments and Runtimes is now more accurate. ### Openflow Ingress Controller 2025.9.15-14 - Initial release offering privilege isolation for Openflow runtime authentication and authorization to Snowflake deployments. ### Runtime Extensions 2025.9.16.20 - Added support for DATETIME columns with PutBigQuery processor. - You can now specify the HTTP protocol version in **StandardWebClientServiceProvider**. - Better logging and increased timeouts for FetchSharepointFile processor. - Added the option to set the replication slot name in CaptureChangePostgreSQL processor. - You can now use `-infinity` and `+infinity` with Postgres TIMESTAMPTZ values. - New controller service StandardAtlassianRequestRateManager to deal with API rate limits for the Jira connector. - Fixed exceptions thrown from ListMicrosoftDataverseTables when table schema isn't returned by API. ### Runtime Operator 0.40.0 - Support deploying the new Openflow ingress controller for PuPr release of Snowflake deployments. ### Runtime Server 2025.9.16.19 - New Runtime UI 0.51.0. - You can now delete individual entries in the component state if the component allows it. - Improved tooltips for Property and Parameter values, especially when values are long or reference external resources. ## September 15, 2025 ### AWS Data Plane Agent 0.41.1 - Fixed an issue from AWS Data Plane Agent 0.39.0 that blocked the first install of an Openflow deployment into a new AWS region. ## September 11, 2025 ### Control Plane Core 0.73.0 - Fixed issue preventing runtime deletion in Snowflake deployments when a network policy is present. ### Data Plane Service 0.73.0 - Fixed an issue that prevented runtime deletion in Snowflake deployments when a network policy was present. - Fixed an issue that prevented new versions of runtime extensions from being used when runtimes were created or upgraded. ### Runtime Extensions 2025.9.11.18 - CaptureChangeSQLServer: A new setting, `Table Changes Query Interval`, is introduced to reduce the resource pressure on the source database. Now, the processor queries the source database every 10 seconds (`10 sec`) by default. To restore the original behavior, change the setting to `0 sec`. ## September 10, 2025 ### AWS Data Plane Agent 0.40.0 - Resolved an issue where deployments were left partially upgraded after AWS Data Plane Agent 0.39.0 was used. ### Connectors 2025.9.9.18 - Unstructured connectors: Improved reporting on `ChunkText` failures. ### Runtime Extensions 2025.9.10.7 - Microsoft Dataverse: Fixed handling of schemas that include the `Edm.Date` type. - Fixed attribute prefix handling in the XML Reader. - Fixed MongoDB controller service for certain authentication methods when information is provided through the URI. - Added Azure DevOps Flow Registry Client for Git integration with Azure DevOps to version flows. ### Runtime Server 2025.9.9.20 - Added the ability to change the version of a ghosted component if a bundle with the same coordinates and a different version exists. ## September 8, 2025 ### AWS Data Plane Agent 0.37.0 - Added support for AWS Data Plane Agent deployments that have DHCP Option Sets configured on the account. - Upgraded all EKS nodes from Amazon Linux 2 to Amazon Linux 2023. ### AWS Data Plane Agent 0.38.0 - Added support for AWS accounts that require encrypted EBS volumes by default, even if an unencrypted EBS volume is requested. Customers can enable this by adding IAM Policies to the `*-eks-role IAM Role` that grant access to their KMS keys. ### Control Plane Core 0.72.0 - Error messages are now clearer and more informative when runtime-related failures occur. - Fixed a rare case where an older deployment version disallowed creating a runtime with the same name as a previously deleted runtime. ### Control Plane UI 0.52.0 - Deployment listing and details now include the deployment version number. - Control Plane logout page now offers a link back to Snowsight - Searchable select control (used in **Create Runtime** and **Manage Access**) now offers improved behavior when text overflows available space. - Fixed a bug that temporarily showed duplicate roles when revoking privileges through the **Manage Access** dialog. ### Data Plane Service 0.70.0 - Added support for AWS Data Plane Agent deployments that have DHCP Option Sets configured on the account. - Allowed customers to delete a runtime and create a new one with the same name shortly thereafter. ## September 5, 2025 ### Connectors 2025.9.4.19 - Confluence Connector: Refresh frequency is now set to 1 minute and is no longer exposed as a parameter. ### Runtime Extensions 2025.9.4.20 - Resolved an incompatibility between the Github Registry Client and the latest Jackson release. - Fixed attribute prefix handling in XML Reader - Added `StandardProtobufReader` controller service for Protobuf record processing - `ListTableName` won't fail the entire FlowFile if partial input is incorrect. ### Runtime Server 2025.9.4.20 - Introduced Runtime UI 0.50.0 - Added a new logout page that provides users options for logging back in or navigating to the Control Plane. - Enhanced the searchable select control to display options more clearly when text exceeds available space. - Fixed casing and icon issues when inputting attributes during extension verification. - Fixed header styling applied to additionalDetails markdown files. ## September 2, 2025 ### AWS Data Plane Agent 0.35.0 - Support for AWS Tags with dots in the Tag key. ### Connectors 2025.9.2.16 - MySQL CDC: Always create a new table (and fail if the table already exists) when replication mode is set to `full`. ### Runtime Extensions 2025.9.2.17 - The GitLab Flow Registry Client now supports versioning flows larger than 2MB. - Fixed issue in the MongoDB Controller Service preventing users to authenticate using X509. - Fixed irrelevant error logs about schema hash in `UpdateSnowflakeDatabase` processor. - Confluence: Fixed a bug that prevented users from being added to authorized users even though they had permissions to the space from the group level. - Fixed `NoSuchElementException` thrown in ChunkText processor and better failure handling with dedicated relationship. - HubSpot: Fixed bug preventing the List processors to properly go through all the pages. ### Runtime Server 2025.9.2.20 - New Runtime UI 0.48.0. - Upgraded to latest version of Codemirror and updated usage throughout the application. ## August 28, 2025 ### Connectors 2025.8.28.17 - MS SQL CDC Connector: Added support for incremental only mode. - HubSpot connector: Fixed table creation on invalid object type. ### Runtime Extensions 2025.8.28.19 - Added StandardProtobufReader Controller Service for Protobuf record processing ## August 27, 2025 ### AWS Data Plane Agent 0.33.0 - Fixes health checks for Load Balancer Target Groups, so everything shows green in the AWS Console. ### Control Plane Core v0.68.0 - Supports a finer-grained privilege model for deployments and runtimes including MONITOR and OPERATE privileges. ### Control Plane UI v0.51.0 - Supports a finer-grained privilege model for deployments and runtimes including MONITOR and OPERATE privileges. ### Runtime Operator 0.39.0 - Supports a finer-grained privilege model for deployments and runtimes including MONITOR and OPERATE privileges. ## August 26, 2025 ### Runtime Extensions 2025.8.26.18 - MS SQL Server: Fixes handling of datetime when used as a primary key. ## August 21, 2025 ### Connectors 2025.8.21.16 - PostgreSQL connector: Supports TOASTed values. ### Runtime Extensions 2025.8.21.17 - Uses Google Ads API v21 (Note, v18 is no longer supported). ### Runtime Server 2025.8.21.17 - New Runtime UI 0.47.0. ## August 20, 2025 ### Connectors 2025.8.19.17 - Slack connectors: Fixes handling of attachments by appending the File ID to the filename for the files stored in the stage. ### Runtime Extensions 2025.8.20.10 - Adds Google Cloud support to PutSnowpipeStreaming2. - Adds support for Incremental Only mode in PostgreSQL CDC connector. - Fixes error when trying to verify configuration in List Azure processors. ### Runtime Server 2025.8.19.18 - Supports unquoted parameter references with spaces in their names within an expression language. ## August 15, 2025 ### Control Plane Core 0.64.0 - Resolves an issue that sometimes caused runtime deletion to fail in Snowflake deployments. ### Runtime Operator 0.38.0 - Resolves an issue facilitating runtime autoscaling in Snowflake deployments. ### Runtime Server 2025.8.14.18 - Improves readability in Provenance Event dialog. ## August 13, 2025 ### Control Plane Core 0.62.0 - New AWS BYO-VPC deployments now adds the “Private Security Group” to the EKS cluster, making it easier to configure connections to data sources. - Resolves an issue for new Deployments with a private security group configuration that couldn't pull images from Snowflake over PrivateLink. ### Control Plane UI 0.49.0 - Runtime and Deployment action menus now have separators to help group actions. - Account roles show in a searchable selection with virtual scrolling. ### Data Plane Service 0.62.0 - Runtime flows no longer disappear after suspend and reactivate due to a conflicting auto scaling operation. ### Runtime Extensions 2025.8.12.20 - Adds FlowFile attributes support for Database and Schema properties in PutSnowflakeInternalStageFile. - New GetConfluenceSpaces processor. - PostgreSQL CDC now properly handles DATE, TIME, TIMESTAMP primary keys. ### Runtime Server 2025.8.12.20 - New Runtime UI 0.45.0: Minor improvements to the Component State dialog to improve readability of state entries. ## August 12, 2025 ### AWS Data Plane Agent 0.32.0 - Fixes issue destroying BYOC deployments that was introduced with 0.29.0. - Fixes issue from 0.29.0 release where BYOC deployments in AWS Regions with longer names may fail due to IAM Policy length limitations. ## August 7, 2025 ### AWS Data Plane Agent 0.30.0 - Upgrades the AMI of EKS nodes when the deployment is upgraded. - Removes unnecessary IPv6 Security Group rules for ingress and egress. ### Runtime Extensions 2025.8.7.20 - Improves ConsumeKafka by introducing an Inject Offset Output strategy to add a field kafkaOffset to the records. - Adds the preview tag for Salesforce, Confluence and HubSpot components. - Better configuration validation in UpdateSnowflakeDatabase to avoid using empty parameters. - Adds GetConfluencePageContent and GetConfluencePageIds processors for Confluence. - Fixes UpdateSnowflakeDatabase to properly redirect to the failure relationship when schema is not specified or does not exist. - Improves error handling of non-authorized calls in HubSpot processors. ### Runtime Server 2025.8.7.20 - New Runtime UI 0.44.0: Improves ConsumeKinesisStream by introducing a schema difference handling strategy to specify how records using the same schema should be grouped. - Fixes issue in rendering the canvas that surfaced on initial page load. ## August 6, 2025 ### Runtime Extensions 2025.8.5.19 - Adds Pipe Info Counter and Channel Error Message to PutSnowpipeStreaming2. - MySQL connector: Supports enabling the connector in Incremental mode only. - HubSpot connector: Improves handling of non-supported object types and fixed processing ordering of the events. ### Runtime Server 2025.8.5.19 - New Runtime UI 0.43.0: The Runtime UI now supports labeling extensions in Preview. The badge is shown in the create dialog, on the canvas, in the operate palette, in the edit dialog, and in listings for extensions not on the canvas. ## August 5, 2025 ### AWS Data Plane Agent 0.29.0 - Private deployments: All images and binaries are provided by Snowflake instead of various internet sources. - Custom Ingress for "Bring Your Own VPC" deployments: Supports enterprise customers who use VPNs to access their cloud infrastructure and self-managed TLS certificates. - Adds end-to-end support for PrivateLink. Previously, data and management communications were available over PrivateLink. Now, the deployment can install over PrivateLink, too. ### Control Plane Core 0.60.0 - Adds improvements necessary to support BYOC private deployments. - Improves handling of outbound grants when transferring ownership of a runtime or deployment. - Trial accounts are now permitted to use Openflow with relevant parameter enabled. - Fixes an issue that disrupted use of Control Plane for customers with a large number of Snowflake roles. ### Control Plane UI 0.48.0 - In runtime and deployment listings, more actions in the menus are disabled rather than hidden. - Removes a link to accept terms. This change prevents problems when the user doesn't have an active Snowsight session. - When a new version is detected, prompts the user to reload the CP UI. - Disallows changing ownership of runtimes in Snowflake deployments. - Fixes bug that required a Snowflake role, even when the field was hidden. ### Data Plane Service 0.60.0 - Includes improvements necessary to support BYOC private deployments. - Fixes an issue that disrupted Connector deployment for customers with a large number of Snowflake roles. ### Openflow Runtime Gateway 2025.8.1.14 - Fixes an issue with certificate refresh upon renewal which prevented users from logging into older runtimes. ## July 31, 2025 ### Connectors 2025.7.31.17 - Jira: Improved readability of the flow. The scheduling is now exposed via a parameter. ### Runtime Extensions 2025.7.31.18 - Adds File Fragment Size and Count to PutSnowpipeStreaming2. - Introduces new Confluence processors for the upcoming connector GetConfluenceGroupUsers, GetConfluencePagePermissions, GetConfluenceSpacePermissions, ListConfluenceGroups. - Adds support for TOASTed value in PostgreSQL CDC. - Fixes initial rendering of canvas when fonts may load slowly. - Fixes parameter removal in Parameter Contexts owned by a Parameter Provider. ### Runtime Server 2025.7.31.18 - New Runtime UI 0.42.0: Improves formatting in Status History dialog when values are lengthy. ## July 29, 2025 ### Runtime Server 2025.7.29.9 - Fixes an issue with scaling that left some nodes in a disconnected state. ## July 24, 2025 ### Connectors 2025.7.24.17 - Kafka Connectors: Fixes referenced readers when writing to Iceberg formatted tables. ### Runtime Extensions 2025.7.24.18 - Fixes S3 Location Type in PutSnowpipeStreaming2. ### Runtime Server 2025.7.24.18 - Adds support for users to reset all Counters in a single action. - Fixes an issue that caused upgrade failure for runtimes with more than 1 node present. ## July 23, 2025 ### Control Plane Core 0.58.0 - Adds support for selecting an active role to use in the application, rather than relying on a default role and secondary role inheritance. - Adds support for considering Snowflake role hierarchy during authorization controls. ### Control Plane UI 0.47.0 - Adds support for selecting an active role to use in the application, rather than relying on a default role and secondary role inheritance. ### Data Plane Service 0.58.0 - Adds support for considering Snowflake role hierarchy during authorization controls. ### Openflow Runtime Gateway 2025.7.22.20 - Adds support for considering Snowflake role hierarchy during authorization controls. ## July 22, 2025 ### Runtime Extensions 2025.7.22.19 - A new controller service better supports Slack API rate limits. - Fixes SnowflakeSignJWT controller service. ## July 16, 2025 ### AWS Data Plane Agent 0.25.1 - Fixes upgrades to pull and use the latest host scripts. This change enables Openflow to more easily make changes to the agent itself during an upgrade. ## July 15, 2025 ### Connectors 2025.7.15.14 - Confluence JIRA connector: Improves type mapping for the JIRA issues. Uses the new processor for managing lifecycle of views. - Slack connectors: Changes defaults for run schedule properties to avoid rate limiting errors. ### Control Plane Core 0.53.0 - Adds support for generating and downloading runtime diagnostic bundles. ### Control Plane UI 0.46.0 - Adds support for generating and downloading runtime diagnostic bundles. ### Data Plane Service 0.53.0 - Adds support for generating and downloading runtime diagnostic bundles. ### Runtime Extensions 2025.7.15.16 - Adds the PutSnowpipeStreaming2 processor using SSv2. ### Runtime Server 2025.7.15.16 - New Runtime UI 0.40.0: Fixes a bug that prevented tooltips from closing on the canvas. ## July 10, 2025 ### Control Plane UI 0.45.2 - Adds support for PrivateLink redirects for the Launch Openflow button. - Fixes an issue where logout doesn't log the user out if the user revisits soon after. ## July 9, 2025 ### Connectors 2025.7.8.14 - PostgresSQL, SQL Server and MySQL Connectors: Change to Journal creation process group to remove the false positive error bulletin for PutSnowpipeStreaming when it was asked to create channels on non yet existing tables/streams. ### Control Plane Core 0.52.0 - Users must have proper privileges before they can list or view a runtime. ### Control Plane UI 0.45.1 - Fixes a bug that caused runtime and deployment listings not to show and prevented creation of new resources. ### Runtime Extensions 2025.7.9.14 - Git Registry clients have the option to ignore parameter changes when versioning a new version of a flow. - New HubSpot processor to retrieve the schema of HubSpot objects. - New processor UpdateSnowflakeView to manage lifecycle of Snowflake views. - New controller service RemoveFieldRecordReader to drop fields on read. - Supports PostgreSQL Aurora. - CaptureChangeSQLServer generates a valid query when the primary key consists of multiple columns. - UpdateSnowflakeDatabase now checks only column types when required. ### Runtime Server 2025.7.9.14 - New Runtime UI 0.39.0 - Improves colors in canvas for Process Group version control status. - Improves styling for better alignment with Balto colors. - Assets are no longer prevented from being re-uploaded in the Manage Assets dialog. - When using form control to increment a numeric value, output from a dirty Edit Processor form is no longer prevented. ## July 3, 2025 ### AWS Data Plane Agent 0.22.2 - Upgrades no longer get stuck when upgrading due to a missing Data Plane UI 0.7.0 image. ## July 1, 2025 ### AWS Data Plane Agent 0.22.1 - New deployments no longer fail to install due to mid-handling failure code when checking for the presence of AWS ECR repositories. ### Runtime Extensions 2025.7.1.18 - Google Ads: Limits the numbers of calls to Google Ads API when validating the components to avoid rate limit errors. ## June 28, 2025 ### Runtime Extensions 2025.6.27.21 - Fixes NullPointerException in PutSnowpipeStreaming when empty flow files are being processed and Delivery Guarantee is set to `Exactly once`. ## June 27, 2025 ### Control Plane Core 0.51.0 - New terms of service flow: Customers can use Control Plane to create Snowflake-managed deployments without accepting BYOC and Connector terms. ### Control Plane UI 0.43.0 - New terms of service flow: Customers can use Control Plane to create Snowflake-managed deployments without accepting BYOC and Connector terms. ## June 26, 2025 ### Connectors 2025.6.26.15 - Kafka Connectors: Ignore column type mismatch in UpdateSnowflakeDatabase for Kafka connectors is more resilient in case of issue with schema inference. - Google Drive & SharePoint Connectors: Improves the flow to avoid a race condition where group synchronization kicks off but PERMS_GROUPS has not been created yet - Kafka Connectors: Warehouse is no longer needed. The corresponding parameter is removed. ### Runtime Extensions 2025.6.26.16 - Tables without primary keys are retried instead of failed. - New Alter Strategy in UpdateSnowflakeDatabase processor has the option to ignore column type changes. - Fixes fetching of HubSpot archived records. ## June 24, 2025 ### Control Plane Core 0.50.0 - New deployments send status updates to Openflow Control Plane indicating when upgrades are present. - The PrPr tag is included on some new connectors. ### Control Plane UI 0.42.0 - New deployments now surface when an upgrade is available, with link to documentation. earlier deployments can also use this functionality after a migration to a newer version. ### Data Plane Service 0.50.0 - Fixes Create runtime failures where the minimum node count is greater than one. ### Data Plane UI 0.7.0 - Active role now displays in the current user menu. ## June 20, 2025 ### AWS Data Plane Agent 0.21.0 - Deployments created with AWS Data Plane Agent 0.20.0 are no longer prevented from adopting future updates to EC2 Agent Host scripts. ## June 18, 2025 ### AWS Data Plane Agent 0.19.0 - Supports tagging all AWS resources created and managed by Openflow. Enables deployments governed by security controls like AWS SCP and cost controls like AWS MAP. ### AWS Data Plane Agent 0.20.0 - New Openflow BYOC deployments and upgrades of existing deployments are no longer blocked by an "Unsupported block type" error. ### Connectors 2025.6.17.15 - JIRA: Multi-projects support flattened views in Snowflake destination. ### Runtime Server 2025.6.17.16 - Process Group metrics are now visible when using the Stateless engine. - The toolbar renders properly when font size is scaled in the browser settings. - The UnpackContent shows the TAR option again. ## June 12, 2025 ### Connectors 2025.6.12.19 - Google Sheets: Improves failure handling by retrying when ingesting data into Snowflake. - Workday: Uses TRUNCATE instead of REPLACE when possible on the destination table. - Sharepoint / Google Drive: Improves failure handling with proper retry / logging in case of failures. - SQL Server: Prevents stream staleness. - Box: Properly reflects permissions when groups are removed from files permissions in Box. - Google Drive (Simple Ingest) - Fixes handling of files being deleted. - Workday: Fixes clustering configuration to have the first processor run on the primary node only. ### Control Plane UI 0.41.0 - Skeleton loaders are now shown in the deployment and runtime listings when permissions are evaluated. - Skeleton loaders are now shown in **Create Runtime** and **Add Connector to Runtime** dialogs while options are loaded and permissions are evaluated. ### Runtime Extensions 2025.6.12.21 - Adds the possibility to specify multiple projects to fetch JIRA issues when using 'Simple Search'. - Improves handling of all fields in the JIRA connector. Improves mapping into destination table by using an individual column per field. - Adds support for the PuPr of Snowflake Structured Maps/Arrays/Objects. - Google Sheets connector now supports Boolean and numbers to be used in the same column. - MySQL: Properly handles a changes in the column filtering parameter during replication. - MySQL: Fixes potential connection leakage when being disconnected from the binlog. - SQL Server: Fixes column ordering handling in the Journal Log table. ### Runtime Server 2025.6.12.21 - Error reporting now shows in banners instead of toast notifications. - Adds support for different ranges in the Status History dialog by selecting different start timestamps. - Introduces a Process Group column to the Parameter Context table to more efficiently see bound Process Groups. ## June 8, 2025 ### Runtime Extensions 2025.6.6.16 - Upgrades Snowflake JDBC Driver to 3.24.2 - Resolves an issue that prevented newer runtimes from installing the latest Microsoft Dataverse Connector. - Removes Microsoft SQL Server replication of logical databases. ### Runtime Gateway 2025.6.8.2 - Adds support for logging in to Openflow runtimes using role names with dashes. ### Runtime Server 2025.6.6.19 - Adds pre-configured version control support for custom flows. - Gracefully shuts down processors and controller services for stateless process groups. ## May 31, 2025 ### Runtime Extensions 2025.5.31.15 - Add kafka.max.offset attribute to Records produced by ConsumeKafka --- title: SnowflakeConnectionService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/snowflakeconnectionservice.md section: Loading & Unloading Data --- # SnowflakeConnectionService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides pooled database connections to Snowflake services ## Tags connection, database, jdbc, openflow, snowflake ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: SnowflakeDatabaseDialectService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/snowflakedatabasedialectservice.md section: Loading & Unloading Data --- # SnowflakeDatabaseDialectService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Database Dialect Service supporting Snowflake. Supported Statement Types: ALTER, CREATE, SELECT, UPSERT (MERGE INTO) ## Tags Database, JDBC, Relational, SQL ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: SnowflakeDetectDuplicate 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/snowflakedetectduplicate.md section: Loading & Unloading Data --- # SnowflakeDetectDuplicate 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowflake-processors-nar ## Description Checks if a FlowFile 's hash (provided as a FlowFile attribute) is already in a Snowflake table, and routes the FlowFile to' duplicate 'if found,'distinct 'if not found, or' failure' on errors. ## Tags database, detect, duplicates, hash, snowflake ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: SnowflakeSignJWTService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/snowflakesignjwtservice.md section: Loading & Unloading Data --- # SnowflakeSignJWTService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides OAuth2 access token using a JWT signed with a secret stored in Snowflake. The JWT is signed using the SYSTEM$SIGN_JWT_USING_SECRET function, which requires a valid Snowflake connection. ## Tags jwt, preview, snowflake ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: SnowflakeTableSchemaRegistry source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/snowflaketableschemaregistry.md section: Loading & Unloading Data --- # SnowflakeTableSchemaRegistry This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Uses Snowflake tables as the source of schema — utilises Snowpipe Streaming REST API. Requires a fully qualified table name as the schema name. ## Tags openflow, registry, schema, snowflake ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: SplitAvro 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/splitavro.md section: Loading & Unloading Data --- # SplitAvro 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-avro-nar ## Description Splits a binary encoded Avro datafile into smaller files based on the configured Output Size. The Output Strategy determines if the smaller files will be Avro datafiles, or bare Avro records with metadata in the FlowFile attributes. The output will always be binary encoded. ## Tags avro, split ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: SplitContent 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/splitcontent.md section: Loading & Unloading Data --- # SplitContent 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Splits incoming FlowFiles by a specified byte sequence ## Tags binary, content, split ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.MergeContent](/user-guide/data-integration/openflow/processors/mergecontent) --- title: SplitExcel 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/splitexcel.md section: Loading & Unloading Data --- # SplitExcel 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-poi-nar ## Description This processor splits a multi sheet Microsoft Excel spreadsheet into multiple Microsoft Excel spreadsheets where each sheet from the original file is converted to an individual spreadsheet in its own flow file. Currently this processor is only capable of processing .xlsx (XSSF 2007 OOXML file format) Excel documents and not older .xls (HSSF '97(-2007) file format) documents. Please note all original cell styles are dropped and formulas are removed leaving only the calculated values. Even a single sheet Microsoft Excel spreadsheet is converted to its own flow file with all the original cell styles dropped and formulas removed. ## Tags split, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: SplitJson 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/splitjson.md section: Loading & Unloading Data --- # SplitJson 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Splits a JSON File into multiple, separate FlowFiles for an array element specified by a JsonPath expression. Each generated FlowFile is comprised of an element of the specified array and transferred to relationship 'split,' with the original file transferred to the 'original' relationship. If the specified JsonPath is not found or does not evaluate to an array element, the original file is routed to 'failure' and no files are generated. ## Tags json, jsonpath, split ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: SplitRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/splitrecord.md section: Loading & Unloading Data --- # SplitRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Splits up an input FlowFile that is in a record-oriented data format into multiple smaller FlowFiles ## Tags avro, csv, freeform, generic, json, log, logs, schema, split, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: SplitText 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/splittext.md section: Loading & Unloading Data --- # SplitText 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Splits a text file into multiple smaller text files on line boundaries limited by maximum number of lines or total size of fragment. Each output split file will contain no more than the configured number of lines or bytes. If both Line Split Count and Maximum Fragment Size are specified, the split occurs at whichever limit is reached first. If the first line of a fragment exceeds the Maximum Fragment Size, that line will be output in a single split file which exceeds the configured maximum size limit. This component also allows one to specify that each split should include a header lines. Header lines can be computed by either specifying the amount of lines that should constitute a header or by using header marker to match against the read lines. If such match happens then the corresponding line will be treated as header. Keep in mind that upon the first failure of header marker match, no more matches will be performed and the rest of the data will be parsed as regular lines for a given split. If after computation of the header there are no more data, the resulting split will consists of only header lines. ## Tags split, text ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.MergeContent](/user-guide/data-integration/openflow/processors/mergecontent) --- title: SplitXml 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/splitxml.md section: Loading & Unloading Data --- # SplitXml 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Splits an XML File into multiple separate FlowFiles, each comprising a child or descendant of the original root element ## Tags split, xml ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: StandardAnthropicLLMService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardanthropicllmservice.md section: Loading & Unloading Data --- # StandardAnthropicLLMService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A Controller Service that provides integration with Anthropic's Claude AI models through their Messages API. Supports configurable parameters including model selection, response generation settings (temperature, top_p, top_k), token limits, and retry behavior. ## Tags ai, anthropic, api, claude, language model, llm, openflow ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardAtlassianRequestRateManager source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardatlassianrequestratemanager.md section: Loading & Unloading Data --- # StandardAtlassianRequestRateManager This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides rate limiting coordination for Atlassian API calls across processors to prevent cascading rate limit issues. Throttles when limit is reached (HTTP 429). ## Tags api, atlassian, confluence, jira, limit, openflow, rate ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardAzureCredentialsControllerService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardazurecredentialscontrollerservice.md section: Loading & Unloading Data --- # StandardAzureCredentialsControllerService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provide credentials to use with an Azure client. ## Tags azure, credentials, provider, security, session ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardConfluenceClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardconfluenceclientservice.md section: Loading & Unloading Data --- # StandardConfluenceClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides connection service to Confluence APIs ## Tags Preview, atlassian, confluence ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardDatabricksWorkspaceClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standarddatabricksworkspaceclientservice.md section: Loading & Unloading Data --- # StandardDatabricksWorkspaceClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Databricks client. ## Tags databricks, openflow ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardDropboxCredentialService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standarddropboxcredentialservice.md section: Loading & Unloading Data --- # StandardDropboxCredentialService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Defines credentials for Dropbox processors. ## Tags credentials, dropbox, provider ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardFileResourceService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardfileresourceservice.md section: Loading & Unloading Data --- # StandardFileResourceService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a file resource for other components. The file needs to be available locally by Nifi (e.g. local disk or mounted storage). NiFi needs to have read permission to the file. ## Tags file, resource ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted ## Restrictions
## System Resource Considerations This component does not specify system resource considerations. --- title: StandardHashiCorpVaultClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardhashicorpvaultclientservice.md section: Loading & Unloading Data --- # StandardHashiCorpVaultClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A controller service for interacting with HashiCorp Vault. ## Tags client, hashicorp, vault ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardHttpContextMap source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardhttpcontextmap.md section: Loading & Unloading Data --- # StandardHttpContextMap This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides the ability to store and retrieve HTTP requests and responses external to a Processor, so that multiple Processors can interact with the same HTTP request. ## Tags http, request, response ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardHubSpotClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardhubspotclientservice.md section: Loading & Unloading Data --- # StandardHubSpotClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description HubSpot Controller Service to integrate with HubSpot HTTP api. ## Tags Preview, hubSpot ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardJsonSchemaRegistry source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardjsonschemaregistry.md section: Loading & Unloading Data --- # StandardJsonSchemaRegistry This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a service for registering and accessing JSON schemas. One can register a schema as a dynamic property where 'name' represents the schema name and 'value' represents the textual representation of the actual schema following the syntax and semantics of the JSON Schema format. Empty schemas and schemas only consisting of whitespace are not acceptable schemas. The registry is heterogeneous registry as it can store schemas of different schema draft versions. By default the registry is configured to store schemas of Draft 2020-12. When a schema is added, the version which is currently is set, is what the schema is saved as. ## Tags json, registry, schema ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardKustoIngestService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardkustoingestservice.md section: Loading & Unloading Data --- # StandardKustoIngestService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Sends batches of flowfile content or stream flowfile content to an Azure ADX cluster. ## Tags ADX, Azure, Data, Explorer, Kusto, azure, ingest ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardKustoQueryService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardkustoqueryservice.md section: Loading & Unloading Data --- # StandardKustoQueryService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Standard implementation of Kusto Query Service for Azure Data Explorer ## Tags ADX, Azure, Data, Explorer, Kusto ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardMilvusConnectionService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardmilvusconnectionservice.md section: Loading & Unloading Data --- # StandardMilvusConnectionService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides connection service to a Milvus instance ## Tags connection, database, milvus, openflow, vector ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardOauth2AccessTokenProvider source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardoauth2accesstokenprovider.md section: Loading & Unloading Data --- # StandardOauth2AccessTokenProvider This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides OAuth 2.0 access tokens that can be used as Bearer authorization header in HTTP requests. Can use either Resource Owner Password Credentials Grant or Client Credentials Grant. Client authentication can be done with either HTTP Basic authentication or in the request body. ## Tags access token, authorization, http, oauth2, provider ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardOCRService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardocrservice.md section: Loading & Unloading Data --- # StandardOCRService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides integration to Openflow OCR Service ## Tags extract, image, ocr, openflow, tesseract, text ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardOpenAILLMService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardopenaillmservice.md section: Loading & Unloading Data --- # StandardOpenAILLMService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A Controller Service that provides integration with OpenAI's Chat Completion API. Supports configurable parameters including model selection, temperature, top_p, max tokens, and retry behavior. Handles API authentication, request retries with exponential backoff, and error handling. ## Tags ai, chat completion, chatgpt, large language model, llm, openai, openflow ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardPGPPrivateKeyService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardpgpprivatekeyservice.md section: Loading & Unloading Data --- # StandardPGPPrivateKeyService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description PGP Private Key Service provides Private Keys loaded from files or properties ## Tags Encryption, GPG, Key, OpenPGP, PGP, Private, RFC 4880 ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardPGPPublicKeyService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardpgppublickeyservice.md section: Loading & Unloading Data --- # StandardPGPPublicKeyService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description PGP Public Key Service providing Public Keys loaded from files ## Tags Encryption, GPG, Key, OpenPGP, PGP, Private, RFC 4880 ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardPrivateKeyService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardprivatekeyservice.md section: Loading & Unloading Data --- # StandardPrivateKeyService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Private Key Service provides access to a Private Key loaded from configured sources ## Tags PEM, PKCS8 ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardProtobufReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardprotobufreader.md section: Loading & Unloading Data --- # StandardProtobufReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Parses Protocol Buffers messages from binary format into NiFi Records. Supports multiple schema access strategies including inline schema text, schema registry lookup, and schema reference readers. Protobuf reader needs to know the Proto schema message name in order to deserialize the binary payload correctly. The name of this message can be determined statically using 'Message Name' property, or dynamically, using a Message Name Resolver service. ## Tags parser, protobuf, reader, record ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardProxyConfigurationService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardproxyconfigurationservice.md section: Loading & Unloading Data --- # StandardProxyConfigurationService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a set of configurations for different NiFi components to use a proxy server. ## Tags Proxy ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardRestrictedSSLContextService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardrestrictedsslcontextservice.md section: Loading & Unloading Data --- # StandardRestrictedSSLContextService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Restricted implementation of the SSLContextService. Provides the ability to configure keystore and/or truststore properties once and reuse that configuration throughout the application, but only allows a restricted set of TLS/SSL protocols to be chosen (no SSL protocols are supported). The set of protocols selectable will evolve over time as new protocols emerge and older protocols are deprecated. This service is recommended over StandardSSLContextService if a component doesn't expect to communicate with legacy systems since it is unlikely that legacy systems will support these protocols. ## Tags certificate, jks, keystore, p12, pkcs, pkcs12, secure, ssl, tls, truststore ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardS3EncryptionService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standards3encryptionservice.md section: Loading & Unloading Data --- # StandardS3EncryptionService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Adds configurable encryption to S3 Put and S3 Fetch operations. ## Tags aws, decrypt, decryption, encrypt, encryption, key, s3, service ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardSalesforceBulkJobsStateService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardsalesforcebulkjobsstateservice.md section: Loading & Unloading Data --- # StandardSalesforceBulkJobsStateService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Stores Salesforce Bulk Jobs state per object type at cluster scope ## Tags bulk, preview, salesforce, state ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardSalesforceClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardsalesforceclientservice.md section: Loading & Unloading Data --- # StandardSalesforceClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides connection service to Salesforce APIs ## Tags preview, salesforce ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardSalesforceDataCloudClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardsalesforcedatacloudclientservice.md section: Loading & Unloading Data --- # StandardSalesforceDataCloudClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides connection service to Salesforce Data Cloud APIs ## Tags daas, data cloud, preview, salesforce ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardSlackRateLimiterService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardslackratelimiterservice.md section: Loading & Unloading Data --- # StandardSlackRateLimiterService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides rate limiting coordination for Slack API calls across processors to prevent cascading rate limit issues ## Tags api, limit, openflow, rate, slack ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardSSLContextService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardsslcontextservice.md section: Loading & Unloading Data --- # StandardSSLContextService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Standard implementation of the SSLContextService. Provides the ability to configure keystore and/or truststore properties once and reuse that configuration throughout the application. This service can be used to communicate with both legacy and modern systems. If you only need to communicate with non-legacy systems, then the StandardRestrictedSSLContextService is recommended as it only allows a specific set of SSL protocols to be chosen. ## Tags certificate, jks, keystore, p12, pkcs, pkcs12, secure, ssl, tls, truststore ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardTableStateService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardtablestateservice.md section: Loading & Unloading Data --- # StandardTableStateService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A controller Service that provides and manages table state. The state is cached and refreshed only when one of set table state method is invoked. This caching method requires that getting or setting state for a given table must be done on the same node. The Tables processing can be partitioned between NiFi nodes, but the get and set state operations for a single table must be associated with a single NiFi node. ## Tags ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardVectaraClientService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardvectaraclientservice.md section: Loading & Unloading Data --- # StandardVectaraClientService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Vectara Controller Service to integrate with Vectara HTTP Api. ## Tags ai, llm, openflow, rag, vectara ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StandardWebClientServiceProvider source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/standardwebclientserviceprovider.md section: Loading & Unloading Data --- # StandardWebClientServiceProvider This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Web Client Service Provider with support for configuring standard HTTP connection properties ## Tags Client, HTTP, Web ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: StartAwsPollyJob 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/startawspollyjob.md section: Loading & Unloading Data --- # StartAwsPollyJob 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Trigger a AWS Polly job. It should be followed by GetAwsPollyJobStatus processor in order to monitor job status. ## Tags AWS, Amazon, ML, Machine Learning, Polly ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.aws.ml.polly.GetAwsPollyJobStatus](/user-guide/data-integration/openflow/processors/getawspollyjobstatus) --- title: StartAwsTextractJob 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/startawstextractjob.md section: Loading & Unloading Data --- # StartAwsTextractJob 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Trigger a AWS Textract job. It should be followed by GetAwsTextractJobStatus processor in order to monitor job status. ## Tags AWS, Amazon, ML, Machine Learning, Textract ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.aws.ml.textract.GetAwsTextractJobStatus](/user-guide/data-integration/openflow/processors/getawstextractjobstatus) --- title: StartAwsTranscribeJob 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/startawstranscribejob.md section: Loading & Unloading Data --- # StartAwsTranscribeJob 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Trigger a AWS Transcribe job. It should be followed by GetAwsTranscribeStatus processor in order to monitor job status. ## Tags AWS, Amazon, ML, Machine Learning, Transcribe ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.aws.ml.transcribe.GetAwsTranscribeJobStatus](/user-guide/data-integration/openflow/processors/getawstranscribejobstatus) --- title: StartAwsTranslateJob 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/startawstranslatejob.md section: Loading & Unloading Data --- # StartAwsTranslateJob 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Trigger a AWS Translate job. It should be followed by GetAwsTranslateJobStatus processor in order to monitor job status. ## Tags AWS, Amazon, ML, Machine Learning, Translate ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.aws.ml.translate.GetAwsTranslateJobStatus](/user-guide/data-integration/openflow/processors/getawstranslatejobstatus) --- title: StartGcpVisionAnnotateFilesOperation 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/startgcpvisionannotatefilesoperation.md section: Loading & Unloading Data --- # StartGcpVisionAnnotateFilesOperation 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Trigger a Vision operation on file input. It should be followed by GetGcpVisionAnnotateFilesOperationStatus processor in order to monitor operation status. ## Tags Cloud, Google, Machine Learning, Vision ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.gcp.vision.GetGcpVisionAnnotateFilesOperationStatus](/user-guide/data-integration/openflow/processors/getgcpvisionannotatefilesoperationstatus) --- title: StartGcpVisionAnnotateImagesOperation 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/startgcpvisionannotateimagesoperation.md section: Loading & Unloading Data --- # StartGcpVisionAnnotateImagesOperation 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-gcp-nar ## Description Trigger a Vision operation on image input. It should be followed by GetGcpVisionAnnotateImagesOperationStatus processor in order to monitor operation status. ## Tags Cloud, Google, Machine Learning, Vision ## Input Requirement ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.gcp.vision.GetGcpVisionAnnotateImagesOperationStatus](/user-guide/data-integration/openflow/processors/getgcpvisionannotateimagesoperationstatus) --- title: StateManagedCdcSchemaRegistry source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/statemanagedcdcschemaregistry.md section: Loading & Unloading Data --- # StateManagedCdcSchemaRegistry This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Uses the in-built NiFi State Management to store the hashes of table schemas. This allows for a relatively high performance, low latency, low memory utilization mechanism for storing and comparing table schemas with no external dependencies. ## Tags CDC, Database, Schema, Snowflake ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: SubmitQueryJob 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/submitqueryjob.md section: Loading & Unloading Data --- # SubmitQueryJob 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-salesforce-processors-nar ## Description Submits a Query Job to Salesforce using the Bulk API 2.0. In SIMPLE mode, per-object state (previousLast/currentLast and status) is stored in the configured controller service. In ADVANCED mode, a single 'last' timestamp is stored at processor scope to support incremental queries across objects. ## Tags bulk, job, preview, query, salesforce ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## Use cases | Submits a Query Job to Salesforce using the Bulk API 2.0. | | --------------------------------------------------------- | ## See also - [com.snowflake.openflow.runtime.processors.salesforce.AbortQueryJob](/user-guide/data-integration/openflow/processors/abortqueryjob) - [com.snowflake.openflow.runtime.processors.salesforce.DeleteQueryJob](/user-guide/data-integration/openflow/processors/deletequeryjob) - [com.snowflake.openflow.runtime.processors.salesforce.GetQueryJobResult](/user-guide/data-integration/openflow/processors/getqueryjobresult) - [com.snowflake.openflow.runtime.processors.salesforce.GetQueryJobStatus](/user-guide/data-integration/openflow/processors/getqueryjobstatus) --- title: SummarizeText 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/summarizetext.md section: Loading & Unloading Data --- # SummarizeText 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-llm-processors-nar ## Description This processor uses a Large Language Model (LLM) to summarize the content of a FlowFile. It sends the content to an LLM service and writes the summary back to the FlowFile or as an attribute. ## Tags ai, llm, openflow, summarization, text processing ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: Syslog5424Reader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/syslog5424reader.md section: Loading & Unloading Data --- # Syslog5424Reader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a mechanism for reading RFC 5424 compliant Syslog data, such as log files, and structuring the data so that it can be processed. ## Tags logfiles, logs, parse, reader, record, syslog, syslog 5424, text ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: SyslogReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/syslogreader.md section: Loading & Unloading Data --- # SyslogReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Attempts to parses the contents of a Syslog message in accordance to RFC5424 and RFC3164. In the case of RFC5424 formatted messages, structured data is not supported, and will be returned as part of the message. Note: Be mindfull that RFC3164 is informational and a wide range of different implementations are present in the wild. ## Tags logfiles, logs, parse, reader, record, syslog, text ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: TagS3Object 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/tags3object.md section: Loading & Unloading Data --- # TagS3Object 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-aws-nar ## Description Adds or updates a tag on an Amazon S3 Object. ## Tags AWS, Amazon, Archive, S3, Tag ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.aws.s3.CopyS3Object](/user-guide/data-integration/openflow/processors/copys3object) - [org.apache.nifi.processors.aws.s3.DeleteS3Object](/user-guide/data-integration/openflow/processors/deletes3object) - [org.apache.nifi.processors.aws.s3.FetchS3Object](/user-guide/data-integration/openflow/processors/fetchs3object) - [org.apache.nifi.processors.aws.s3.GetS3ObjectMetadata](/user-guide/data-integration/openflow/processors/gets3objectmetadata) - [org.apache.nifi.processors.aws.s3.GetS3ObjectTags](/user-guide/data-integration/openflow/processors/gets3objecttags) - [org.apache.nifi.processors.aws.s3.ListS3](/user-guide/data-integration/openflow/processors/lists3) - [org.apache.nifi.processors.aws.s3.PutS3Object](/user-guide/data-integration/openflow/processors/puts3object) --- title: TailFile 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/tailfile.md section: Loading & Unloading Data --- # TailFile 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description "Tails" a file, or a list of files, ingesting data from the file as it is written to the file. The file is expected to be textual. Data is ingested only when a new line is encountered (carriage return or new-line character or combination). If the file to tail is periodically "rolled over", as is generally the case with log files, an optional Rolling Filename Pattern can be used to retrieve data from files that have rolled over, even if the rollover occurred while NiFi was not running (provided that the data still exists upon restart of NiFi). It is generally advisable to set the Run Schedule to a few seconds, rather than running with the default value of 0 secs, as this Processor will consume a lot of resources if scheduled very aggressively. At this time, this Processor does not support ingesting files that have been compressed when 'rolled over'. ## Tags file, log, source, tail, text ## Input Requirement FORBIDDEN ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Restrictions
## Relationships
## Writes attributes
--- title: TransformXml 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/transformxml.md section: Loading & Unloading Data --- # TransformXml 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Applies the provided XSLT file to the FlowFile XML payload. A new FlowFile is created with transformed content and is routed to the 'success' relationship. If the XSL transform fails, the original FlowFile is routed to the 'failure' relationship ## Tags transform, xml, xslt ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: Troubleshoot Openflow source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/troubleshoot.md section: Loading & Unloading Data --- # Troubleshoot Openflow This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/setup-openflow-byoc) - [](/user-guide/data-integration/openflow/setup-openflow-byoc-custom-ingress) - [](/user-guide/data-integration/openflow/monitor) - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) This topic describes the steps to troubleshoot the Openflow components. ## Openflow BYOC troubleshooting ### BYOC custom ingress troubleshooting For help with BYOC custom ingress, see [Custom ingress troubleshooting](#label-openflow-byoc-custom-ingress-troubleshooting). ### General BYOC troubleshooting If any part of a deployment, connector, or runtime is causing problems, you can use a built-in tool to generate a diagnostic bundle. This bundle includes the information necessary to keep your Openflow BYOC deployment secure while allowing the [Snowflake Support](https://docs.snowflake.com/user-guide/contacting-support) team to troubleshoot the issue. To share the diagnostic bundle with [Snowflake Support](https://docs.snowflake.com/user-guide/contacting-support), attach it to your support case. 1. From the AWS Console UI for EC2, right-click on the **openflow-agent-\{deployment-key\}** instance with your Deployment Key. 2. In the context menu, click the **Connect** button. 3. Switch from **EC2 Instance Connect** to **Connect using EC2 Instance Connect Endpoint**. Leave the default **EC2 Instance Connect Endpoint** in place. 4. Click the **Connect** button. A new browser tab or window will appear with a command-line interface. 5. Run `./diagnostics.sh` from this browser-based CLI. Follow a few simple prompts to confirm that you want to create the bundle, and then optionally create a shareable link. The diagnostic utility will upload the file to an S3 bucket created for the Deployment using the Deployment Key. For example, `s3://byoc-tf-state-{deployment-key}/diagnostics/openflow_20250131123456.tar.gz` With the pre-signed URL, you can safely share temporary access to the diagnostic bundle with the Snowflake team for up to 1 hour. Your S3 bucket and all of its contents remain private. --- title: Troubleshooting the Openflow Connector for Amazon Kinesis Data Streams source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kinesis/troubleshoot.md section: Loading & Unloading Data --- # Troubleshooting the %kinesis% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) - [](/user-guide/data-integration/openflow/connectors/kinesis/about) - [](/user-guide/data-integration/openflow/connectors/kinesis/setup) - [](/user-guide/data-integration/openflow/connectors/kinesis/maintenance) - [](/user-guide/data-integration/openflow/connectors/kinesis/performance-tuning) This topic describes how to troubleshoot common issues with the %kinesis%. ## Common errors ### Error: UnknownHostException **Error message** ```text java.net.UnknownHostException: dynamodb.eu-west-1.amazonaws.com ``` **Cause** If the runtime is using a Snowflake Deployment, the network rule is most likely misconfigured. **Solution** Make sure the required AWS domains are allowlisted in your network rule. For the list of required domains, see [](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list). ### Error: Connect timed out to DynamoDB **Error message** ```text software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Connect to https://dynamodb.us-east-1.amazonaws.com:443 failed: Connect timed out ``` **Cause** This occurs when the AWS PrivateLink configuration uses `PRIVATE_HOST_PORT` for `dynamodb.us-east-1.amazonaws.com`. DNS resolves to the PrivateLink endpoint IP, but the TCP connection can't be established because Amazon DynamoDB doesn't support Private DNS for its PrivateLink endpoints. **Solution** Use `HOST_PORT` for DynamoDB instead of `PRIVATE_HOST_PORT`. Only checkpoint metadata flows through DynamoDB; stream records continue through the Kinesis PrivateLink endpoints. For the full PrivateLink configuration, see [](#label-kinesis-configure-aws-privatelink). --- title: Troubleshooting the Openflow Connector for Oracle source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/troubleshoot.md section: Loading & Unloading Data --- # Troubleshooting the %oracleofc% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). The %oracleofc% is also subject to additional terms of service beyond the standard connector terms of service. For more information, see the [Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/). - [](/user-guide/data-integration/openflow/connectors/oracle/about) - [](/user-guide/data-integration/openflow/connectors/oracle/setup-oracledb) - [](/user-guide/data-integration/openflow/connectors/oracle/setup-connector) - [](/user-guide/data-integration/openflow/connectors/oracle/maintenance) This topic describes how to troubleshoot common issues with the %oracleofc%. ## A table was added to replication but doesn't appear in Snowflake The table's fully qualified name (FQN) might be incorrectly specified in the connector configuration. **Solution** - Check the format of the FQN in `Oracle Ingestion Parameters`. It should be `..` (note the database prefix). - Check the database name in `Oracle Source Parameters` %raa% `Oracle Connection URL`. While FQNs support specifying the name of the database, currently data must reside in the same database instance as the one used for this connection. - Verify that you have provided the full database name including the domain name in the connector configuration. For example, use `MYDB.EXAMPLE.COM` instead of just `MYDB`. To find the correct database name, run the following query on your Oracle database: ```sql SELECT property_value FROM database_properties WHERE property_name = 'GLOBAL_DB_NAME'; ``` In general, `property_value` is the same as the service name of the database. However, the returned database name might include an appended domain name (for example, for service name `FOO`, the query might return `FOO.EXAMPLE.COM`). In that case, use the full name with the domain (double-quoted, because it contains dots). ## A table fails because the connector can't find a replication key The connector reports that a table can't be replicated because no primary key, unique constraint, or unique index qualifies as a replication key. The connector evaluates candidates as described in [](#label-oracle-replication-key-selection). **Solution** 1. Confirm the table has no primary key: ```sql SELECT constraint_name, status FROM all_constraints WHERE owner = 'YOUR_SCHEMA' AND table_name = 'YOUR_TABLE' AND constraint_type = 'P'; ``` 2. Find unique constraints that the connector skipped, and check why they didn't qualify (`STATUS != ENABLED`, `DEFERRED != IMMEDIATE`, or any column nullable): ```sql SELECT c.constraint_name, c.status, c.deferred, acc.column_name, atc.nullable FROM all_constraints c JOIN all_cons_columns acc ON acc.owner = c.owner AND acc.constraint_name = c.constraint_name JOIN all_tab_cols atc ON atc.owner = acc.owner AND atc.table_name = acc.table_name AND atc.column_name = acc.column_name WHERE c.owner = 'YOUR_SCHEMA' AND c.table_name = 'YOUR_TABLE' AND c.constraint_type = 'U'; ``` 3. Find unique indexes that the connector skipped, and check why they didn't qualify (`STATUS = UNUSABLE`, `INDEX_TYPE != NORMAL`, the index backs a constraint, or any column is nullable): ```sql SELECT i.index_name, i.uniqueness, i.status, i.index_type, ic.column_name, atc.nullable FROM all_indexes i JOIN all_ind_columns ic ON i.owner = ic.index_owner AND i.index_name = ic.index_name JOIN all_tab_cols atc ON atc.owner = ic.table_owner AND atc.table_name = ic.table_name AND atc.column_name = ic.column_name WHERE ic.table_owner = 'YOUR_SCHEMA' AND ic.table_name = 'YOUR_TABLE' AND i.uniqueness = 'UNIQUE'; ``` Resolve the issue by one of the following: - Add a primary key, or modify an existing constraint or index so it qualifies (enable it, change `DEFERRABLE INITIALLY DEFERRED` to `DEFERRABLE INITIALLY IMMEDIATE`, rebuild an unusable index, or add `NOT NULL` to the relevant columns). - Specify a logical key for the table. For more information, see [](#label-oracle-logical-key). After you make the change, restart replication for the affected table: see [](#label-of-oracle-restart-table-replication). ## The CDC processor doesn't start after editing Table Key Configuration JSON The **Read Oracle CDC Stream** processor stays invalid or fails to start after you configure or update the **Table Key Configuration JSON** value on a `MultiDatabaseJsonTableKeyConfigService` controller service. The controller service itself fails to enable because the JSON value is malformed, which keeps the CDC processor invalid because it depends on the service. **Solution** 1. Open the controller service properties and review the validation message on the **Table Key Configuration JSON** field. 2. Correct the JSON. For the expected format, see [](#label-oracle-logical-key). 3. Enable the controller service. Once it's enabled, start the CDC processor. ## A logical-key column was dropped or renamed on the source A table that uses a user-declared logical key is marked `FAILED` after a column listed in `logicalKey` is dropped or renamed on the source. The connector can't continue replicating the table because the configured key columns no longer match the live schema. **Solution** 1. Update the **Table Key Configuration JSON** value on the `MultiDatabaseJsonTableKeyConfigService` controller service so that `logicalKey` uses the current column names. Disable and re-enable the service for the change to take effect. 2. Restart replication for the affected table: see [](#label-of-oracle-restart-table-replication). ## A logical-key configuration references a column that doesn't exist A table stays in the `NEW` state and the connector log shows a message such as "`Logical key column '' does not exist in table schema`". The **Table Key Configuration JSON** value lists a column name that the connector can't find on the source table. **Solution** 1. Confirm the column exists and check its name in `ALL_TAB_COLS`: ```sql SELECT column_name FROM all_tab_cols WHERE owner = 'YOUR_SCHEMA' AND table_name = 'YOUR_TABLE' AND user_generated = 'YES'; ``` 2. Correct the column name in the **Table Key Configuration JSON** value. Disable and re-enable the controller service for the change to take effect. You don't need to remove and re-add the table: the connector retries schema initialization on the next poll, and replication resumes from `NEW`. ## A logical-key column contains duplicate values in the source A column declared in the **Table Key Configuration JSON** as part of the logical key doesn't actually contain unique values in the source table. The connector doesn't validate data-level uniqueness of logical-key values, so this condition doesn't produce an error. **Impact** The connector's MERGE operation deduplicates rows by logical-key value using a last-write-wins strategy. When multiple source rows share the same logical-key value: - During the snapshot, only one row per key value reaches the destination. The other rows are silently dropped. - During incremental replication, change events for different source rows that share a key value overwrite each other in the destination. This results in silent data loss with no error in the connector log. **Solution** 1. Verify whether the logical-key columns contain duplicates: ```sql SELECT COUNT(*) AS total_rows, COUNT(DISTINCT ) AS distinct_keys FROM .
Parameter Description Required
Destination Database The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. Yes
Destination Schema The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME` - `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
Yes
Snowflake Authentication Strategy When using: - **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured [runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN. - **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy. Yes
Snowflake Account Identifier When using: - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. Yes
Snowflake Private Key When using: - **Session Token Authentication Strategy**: Must be blank. -
**KEY_PAIR**: Must be the RSA private key used for authentication.
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
No
Snowflake Private Key File When using: - **Session token authentication strategy**: The private key file must be blank. - **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the **Reference asset** checkbox. No
Snowflake Private Key Password When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the password associated with the Snowflake private key file. No
Snowflake Role When using - **Session Token Authentication Strategy**: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime. - **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user. Yes
Snowflake Username When using - **Session Token Authentication Strategy**: Must be blank. - **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance. Yes
Oversized Value Strategy Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table. - **Set Null**: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
No
Snowflake Warehouse Snowflake warehouse used to run queries. Yes
Parameter Description
Authorization Type Choose between *OAUTH* or *BASIC_AUTH*. If *OAUTH* is chosen, then *OAuth Client ID, OAuth Client Secret, OAuth Refresh Token* and *OAuth Token Endpoint* must be defined. If *BASIC_AUTH* is chosen, then *Workday Username* and *Workday Password* must be defined.
OAuth Client ID The client ID of an application registered in Workday.
OAuth Client Secret The client secret related to the Client ID.
OAuth Refresh Token The refresh token is obtained by a user during the app registration process. It is used together with the client ID and the client secret to get an access token.
OAuth Token Endpoint The token endpoint is obtained by a user during the app registration process.
Workday Username The username is used to log into a Workday account. Must be set only when *BASIC_AUTH* is chosen.
Workday Password The password is associated with the Workday username. Must be set only when *BASIC_AUTH* is chosen.
Parameter Description
Destination Table The destination table where report data pulled from Workday is stored. It is created by the connector if it does not exist.
Report URL A RaaS API URL to a report created in Workday.
Run Schedule Run schedule on which data is retrieved from Workday and saved in Snowflake. This value is a time duration specified by a number followed by a time unit. For example, **1 second** or **5 mins**.
Display Name API Name Default Value Allowable Values Description
Communications Timeout * Communications Timeout 30 secs Specifies how long to wait when communicating with the remote server before determining that there is a communications failure if data cannot be sent or received
SSL Context Service SSL Context Service If specified, indicates the SSL Context Service that is used to communicate with the remote server. If not specified, communications will not be encrypted
Server Hostname * Server Hostname The name of the server that is running the DistributedSetCacheServer service
Server Port * Server Port 4557 The port on the remote server that is to be used when communicating with the DistributedSetCacheServer service
Display Name API Name Default Value Allowable Values Description
Eviction Strategy * Eviction Strategy Least Frequently Used - Least Frequently Used - Least Recently Used - First In, First Out Determines which strategy should be used to evict values from the cache to make room for new entries
Maximum Cache Entries * Maximum Cache Entries 10000 The maximum number of cache entries that the cache can hold
Persistence Directory Persistence Directory If specified, the cache will be persisted in the given directory; if not specified, the cache will be in-memory only
Port * Port 4557 The port to listen on for incoming connections
SSL Context Service SSL Context Service If specified, this service will be used to create an SSL Context that will be used to secure communications; if not specified, communications will not be secure
Maximum Read Size maximum-read-size 1 MB The maximum number of network bytes to read for a single cache item
Parameter Description
BigQuery Project Name The unique identifier of the Google Cloud Project that contains BigQuery datasets and tables. Where to find: open BigQuery Studio (Google Cloud Console > BigQuery) and in the left Explorer pane hover over your project to see the Project ID. **Example:** `example-team-gcp`
GCP Service Account JSON The entire content of the JSON key file for the Google Cloud Platform Service Account used for authentication. Ensure the service account has the necessary IAM permissions to perform BigQuery operations, such as the BigQuery Job User and BigQuery Data Viewer roles. Where to get it: Google Cloud Console > IAM & Admin > Service Accounts > select the service account > Keys tab > Add key > Create new key > JSON. This downloads a .json file—open it and paste the entire file content (including braces) into this field.
Parameter Description
Snowflake Authentication Strategy When using SPCS, use SNOWFLAKE_SESSION_TOKEN as the value for Authentication Strategy. When using BYOC, use KEY_PAIR as the value for Authentication Strategy. **Example:** `KEY_PAIR`
Snowflake Account Identifier When using: - Session Token Authentication Strategy: Must be blank. - KEY_PAIR: Snowflake account name where data will be persisted.
Destination Database The name of the destination database to replicate into. Mixed case is supported.
Snowflake Private Key File When using: - Session token authentication strategy: The private key file must be blank. - KEY_PAIR: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the Reference asset checkbox.
Snowflake Private Key Password When using: - Session Token Authentication Strategy: Must be blank. - KEY_PAIR: Provide the password associated with the Snowflake Private Key File.
Snowflake Role When using: - Session Token Authentication Strategy: Use your Snowflake Role. You can find your Snowflake Role in the Openflow UI, by navigating to View Details for your Runtime. - KEY_PAIR Authentication Strategy: Use a valid role configured for your service user.
Snowflake Username When using: - Session Token Authentication Strategy: Must be blank. - KEY_PAIR: Provide the user name used to connect to the Snowflake instance.
Snowflake Warehouse The name of the warehouse to use by the connector.
Parameter Description
BigQuery Regions Specifies a comma-separated list of the locations to query for BigQuery datasets. You can combine both regional and multi-regional locations in the same list. **Example:** `us,eu,us-west1`
Included Dataset Names Comma-separated list of datasets to replicate (queried across all selected regions). **Example:** `sales_data,marketing_leads`
Included Dataset Names Regex Regular expression for specifying dataset names to replicate (queried across all selected regions). Combined with the Included Dataset Names to include any matching dataset. Note: REGEXP expression should match Google's RE2 syntax. **Example:** `^sales_.*`
Included Table Names Comma-separated list of tables to replicate across datasets. **Example:** `transactions,customers`
Included Table Names Regex Regular expression for specifying table names to replicate across datasets. Combined with the Included Table Names to include any matching table. Note: REGEXP expression should match Google's RE2 syntax. **Example:** `^revenue_.*`
Included View Names Comma-separated list of views to replicate across datasets. **Example:** `customer_summary,revenue_report`
Included View Names Regex Regular expression for specifying view names to replicate across datasets. Combined with the Included View Names to include any matching view. Note: REGEXP expression should match Google's RE2 syntax. **Example:** `^report_.*`
Incremental Sync Frequency How often the connector runs incremental synchronization for each table. Runs do not overlap if a cycle takes longer than the configured interval, the next run waits for the prior one to finish. Because BigQuery limits max size of window to 24h, schedule must be more frequent than this value. **Example:** `10m`
View Sync Frequency How often the connector runs synchronization for each view. Runs do not overlap, if a cycle takes longer than the configured interval, the next run waits for the prior one to finish. View ingestion does not support CDC, only truncate and load. **Example:** `1h`
Temporary Table Dataset Dataset in which necessary temporary tables are created, such as CDC journal tables or temporary tables for view ingestion. Snowflake recommends having a separate dataset for temporary tables and not using the ingested dataset for this purpose. **Example:** `openflow_temp`
Parameter Description
Shop Domain The `myshopify.com` domain for your store. **Example:** `mystore.myshopify.com`
Access Token The Admin API access token generated from the custom app in your Shopify store. Stored securely as a sensitive parameter.
Shopify API Version The Shopify Admin API version to use for requests. **Default:** `2026-04`
Objects to Sync Comma-separated or newline-separated list of Shopify object types to replicate. Case-insensitive. Each value must correspond to a query endpoint in the Shopify Admin GraphQL API (for example, `orders` corresponds to the `orders` query, `products` to the `products` query). Types not found in the built-in catalog are skipped unless a custom definition is provided through the **Object Definitions Override** parameter or **Enable Introspection** is `true`. **Default:** `orders,products,customers,productVariants,inventoryItems,collections`
Objects to Track for Deletes Comma-separated or newline-separated list of Shopify object types to monitor for deletions through the Events API. Each type is polled independently. Types not found in the registry are skipped. Leave empty to disable delete tracking entirely. **Example:** `products, customers, collections`
Sync Schedule How frequently the connector polls Shopify for new or updated data. Uses NiFi scheduling syntax. **Example:** `15 min`
Deletes Schedule How frequently the connector polls the Shopify Events API for deletion events. Uses NiFi scheduling syntax. Set this to a longer interval than the sync schedule to reduce API cost. **Example:** `30 min`
Object Definitions Override Optional JSON array to add new object definitions or override existing ones in the built-in catalog. Each element fully replaces the catalog entry for that `apiType`. Use this parameter to customize which fields are extracted, define promoted columns, or register custom object types. For more information, see [Object definition overrides](/user-guide/data-integration/openflow/connectors/shopify/object-definitions#label-shopify-object-override).
Enable Introspection When `true`, unknown object types are auto-discovered by querying the Shopify Admin GraphQL introspection endpoint. Discovered definitions are cached for 24 hours. **Default:** `false`
Ignore Deprecated Fields When `true`, deprecated GraphQL fields are excluded from introspection-generated queries. Only applicable when **Enable Introspection** is `true`. **Default:** `true`
Parameter Description
Snowflake Authentication Strategy Authentication strategy for the connector to connect to Snowflake. - `SNOWFLAKE_MANAGED` (default): Uses the Snowflake-managed token associated with the specified Snowflake runtime role. Snowflake recommends this option for both %ofsfspcs-plural% and %ofbyoc-plural%. - `KEY_PAIR`: Uses a user-provided RSA key pair. Available only on %ofbyoc-plural%, for cross-account scenarios.
Snowflake Account Identifier Snowflake account identifier, formatted as `-`. Required when the authentication strategy is `KEY_PAIR`. **Example:** `MYORG-MYACCOUNT`
Snowflake Username The Snowflake user for authentication. Required when the authentication strategy is `KEY_PAIR`.
Snowflake Private Key PEM-encoded private key content (PKCS8 format) for Snowflake key pair authentication. Required when the authentication strategy is `KEY_PAIR`. Stored securely as a sensitive parameter. Either this parameter or **Snowflake Private Key File** must be defined.
Snowflake Private Key File Alternative to **Snowflake Private Key**. Upload the private key file by selecting the **Reference asset** checkbox, uploading the file as an asset, and selecting the asset as the value for the parameter. Either this parameter or **Snowflake Private Key** must be defined.
Snowflake Private Key Password Password to decrypt the Snowflake private key, if the key is encrypted. Only applicable when the authentication strategy is `KEY_PAIR`.
Snowflake Role The Snowflake role used for table creation, data ingestion, and access verification.
Destination Database Name of the destination database in Snowflake. The database must already exist before starting the connector.
Destination Schema Name of the destination schema in Snowflake. The schema must already exist before starting the connector.
Snowflake Warehouse The Snowflake warehouse used for table management operations such as `CREATE TABLE` and `MERGE`.
Parameter Description
Veeva Vault Base URL Base URL for the Veeva Vault environment. Must be a valid URL including the protocol. **Example:** `https://myvault.veevavault.com`
Veeva Vault Username Service account username for Veeva Vault authentication.
Veeva Vault Password Service account password for Veeva Vault authentication. Stored securely as a sensitive parameter.
Veeva Vault Ingestion Mode Determines how Direct Data files are consumed. Allowed values: - `SNAPSHOT_AND_INCREMENTAL` (default): Load the latest full archive first, then continue with incremental archives. - `SNAPSHOT`: Poll for the latest full archive only. - `INCREMENTAL`: Poll for incremental archives only.
Veeva Vault Incremental Start Time Optional starting timestamp for incremental polling. Only applicable when the ingestion mode is `INCREMENTAL`. If not set, incremental polling starts from the current time. Expected format: `yyyy-MM-dd'T'HH:mmZ`. **Example:** `2025-01-15T08:30Z`
Veeva Vault Include Audit Logs Whether to also ingest Direct Data audit log files. **Default:** `true`
Parameter Description
Snowflake Authentication Strategy Authentication strategy for the connector to connect to Snowflake. - `SNOWFLAKE_MANAGED` (default): Uses the Snowflake-managed token associated with the specified Snowflake runtime role. This is the recommended strategy for both %ofsfspcs-plural% and %ofbyoc-plural%. - `KEY_PAIR`: Uses a user-provided RSA key pair. Available only on %ofbyoc-plural%, for cross-account scenarios where the connector writes to a Snowflake account different from the one hosting the Openflow runtime.
Snowflake Account Snowflake account identifier, formatted as `-`. Required when the authentication strategy is `KEY_PAIR`. **Example:** `MYORG-MYACCOUNT`
Snowflake Username The Snowflake user for authentication. Required when the authentication strategy is `KEY_PAIR`.
Snowflake Private Key PEM-encoded private key content for Snowflake key pair authentication. Required when the authentication strategy is `KEY_PAIR`. This value is stored securely as a sensitive parameter. You can also upload the private key file by selecting the **Reference asset** checkbox, uploading the file as an asset, and selecting the asset as the value for the parameter.
Snowflake Private Key Password Password to decrypt the Snowflake private key, if the key is encrypted. Only applicable when the authentication strategy is `KEY_PAIR`.
Snowflake Role The Snowflake role used for table creation, data ingestion, and access verification. When using `SNOWFLAKE_MANAGED`, this is the Snowflake role for Openflow runtimes. When using `KEY_PAIR`, this is the role assigned to the specified Snowflake user.
Snowflake Database Name of the destination database in Snowflake. The database must already exist before starting the connector.
Snowflake Schema Name of the destination schema in Snowflake. The schema must already exist before starting the connector.
Snowflake Warehouse The Snowflake warehouse used for table management operations such as `CREATE TABLE` and `MERGE`.
Snowflake Table Prefix Optional prefix applied to all destination table names in Snowflake. Use this to namespace tables when multiple connectors write to the same schema.
Snowflake Delete Strategy How to apply Veeva delete extracts in Snowflake. - `Hard Delete` (default): Permanently remove rows from the table. - `Soft Delete`: Set `__SNOWFLAKE_DELETED` to `TRUE` and `__SNOWFLAKE_DELETED_AT` to the current timestamp. The columns are added automatically if they don't exist.
Parameter Description
Column Removal Strategy Defines the strategy when a column should be removed from the destination table based on the latest received schema. - `Drop Column` (default): Drop the column from the Snowflake table. - `Rename Column`: Rename the column in the Snowflake table by appending the suffix defined in the **Removed Column Name Suffix** parameter. - `Ignore Column`: Leave the column as-is in the Snowflake table.
Removed Column Name Suffix Suffix appended to the column name when the **Column Removal Strategy** is set to `Rename Column`. **Default:** `__deleted`
Order Task Description Persona
1 Review [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-snowflake) Setup for %sapsnowflake%. SAP® administrator
1 Review [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-bdc) Setup for %sapbdc%. SAP® administrator
2 Review [](/user-guide/data-integration/zero-copy/sap-sql/setup) Set up the Zerocopy Connector on the Snowflake side. Snowflake account administrator
3 Review [](/user-guide/data-integration/zero-copy/sap-sql/security) Review security requirements and privileges for Snowflake management. Snowflake account administrator
4 [](/user-guide/data-integration/zero-copy/sap-sql/explore-data-products) Explore data products shared from %sapbdc% to Snowflake. Snowflake account administrator and data engineer
5 [](/user-guide/data-integration/zero-copy/sap-sql/publish-data) Publish Snowflake data back to %sapbdc%. Snowflake account administrator and data engineer
Property Description
file-encoding File Encoding for signing
hash-algorithm Hash Algorithm for signing
private-key-id PGP Private Key Identifier formatted as uppercase hexadecimal string of 16 characters used for signing
private-key-service PGP Private Key Service for generating content signatures
signing-strategy Strategy for writing files to success after signing
Name Description
failure Content signing failed
success Content signing succeeded
Name Description
pgp.compression.algorithm Compression Algorithm
pgp.compression.algorithm.id Compression Algorithm Identifier
pgp.file.encoding File Encoding
pgp.signature.algorithm Signature Algorithm including key and hash algorithm names
pgp.signature.hash.algorithm.id Signature Hash Algorithm Identifier
pgp.signature.key.algorithm.id Signature Key Algorithm Identifier
pgp.signature.key.id Signature Public Key Identifier
pgp.signature.type.id Signature Type Identifier
pgp.signature.version Signature Version Number
Display Name API Name Default Value Allowable Values Description
CSV Format * CSV Format default - Custom Format - RFC 4180 - Microsoft Excel - Tab-Delimited - MySQL Format - Informix Unload - Informix Unload Escape Disabled - Default Format - RFC4180 Specifies which "format" the CSV data is in, or specifies if custom formatting should be used.
Character Set * Character Set UTF-8 The Character Encoding that is used to decode the CSV file.
Comment Marker Comment Marker The character that is used to denote the start of a comment. Any line that begins with this comment will be ignored.
Escape Character * Escape Character The character that is used to escape characters that would otherwise have a specific meaning to the CSV Parser. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Escape Character at runtime, then it will be skipped and the default Escape Character will be used. Setting it to an empty string means no escape character should be used.
Quote Character * Quote Character " The character that is used to quote values so that escape characters do not have to be used. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Quote Character at runtime, then it will be skipped and the default Quote Character will be used.
Quote Mode * Quote Mode MINIMAL - Quote All Values - Quote Minimal - Quote Non-Numeric Values - Do Not Quote Values Specifies how fields should be quoted when they are written
Trim Fields * Trim Fields true - true - false Whether or not white space should be removed from the beginning and end of fields
Value Separator * Value Separator , The character that is used to separate values/fields in a CSV Record. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Value Separator at runtime, then it will be skipped and the default Value Separator will be used.
CSV File * csv-file Path to a CSV File in which the key value pairs can be looked up.
Ignore Duplicates * ignore-duplicates true - true - false Ignore duplicate keys for records in the CSV file.
Lookup Key Column * lookup-key-column The field in the CSV file that will serve as the lookup key. This is the field that will be matched against the property specified in the lookup processor.
Lookup Value Column * lookup-value-column Lookup value column.
Required Permission Explanation
read filesystem Provides operator the ability to read from any file that NiFi has access to.
Display Name API Name Default Value Allowable Values Description
Cache Expiration Cache Expiration Time interval to clear all cache entries. If the Cache Size is zero then this property is ignored.
Cache Size * dbrecord-lookup-cache-size 0 Specifies how many lookup values/records should be cached. The cache is shared for all tables and keeps a map of lookup values to records. Setting this property to zero means no caching will be done and the table will be queried for each lookup value in each record. If the lookup table changes often or the most recent data must be retrieved, do not use the cache.
Clear Cache on Enabled * dbrecord-lookup-clear-cache-on-enabled true - true - false Whether to clear the cache when this service is enabled. If the Cache Size is zero then this property is ignored. Clearing the cache when the service is enabled ensures that the service will first go to the database to get the most recent data.
Database Connection Pooling Service * dbrecord-lookup-dbcp-service The Controller Service that is used to obtain connection to database
Lookup Key Column * dbrecord-lookup-key-column The column in the table that will serve as the lookup key. This is the column that will be matched against the property specified in the lookup processor. Note that this may be case-sensitive depending on the database.
Table Name * dbrecord-lookup-table-name The name of the database table to be queried. Note that this may be case-sensitive depending on the database.
Lookup Value Column * lookup-value-column The column whose value will be returned when the Lookup value is matched
Display Name API Name Default Value Allowable Values Description
TTL * redis-cache-ttl 0 secs Indicates how long the data should exist in Redis. Setting '0 secs' would mean the data would exist forever
Redis Connection Pool * redis-connection-pool
Display Name API Name Default Value Allowable Values Description
Module Directory Module Directory Comma-separated list of paths to files and/or directories which contain modules required by the script.
Script Body Script Body Body of script to execute. Only one of Script File or Script Body may be used
Script Engine * Script Engine Groovy - Groovy Language Engine for executing scripts
Script File Script File Path to script file to execute. Only one of Script File or Script Body may be used
Required Permission Explanation
execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
Display Name API Name Default Value Allowable Values Description
Access Token * access-token Bot OAuth Token used for authenticating and authorizing the Slack request sent by NiFi.
API URL * api-url [https://slack.com/api](https://slack.com/api) Slack Web API URL for posting text messages to channels. It only needs to be changed if Slack changes its API URL.
Channel ID * channel-id Slack channel, private group, or IM channel to send the message to. Use Channel ID instead of the name.
Input Character Set * input-character-set UTF-8 Specifies the character set of the records used to generate the Slack message.
Record Writer * record-sink-record-writer Specifies the Controller Service to use for writing out the records.
Web Service Client Provider * web-service-client-provider Controller service to provide HTTP client for communicating with Slack API
Display Name API Name Default Value Allowable Values Description
Domain domain The domain used for authentication. Optional, in most cases username and password is sufficient.
Enable DFS * enable-dfs false - true - false Enables accessing Distributed File System (DFS) and following DFS links during SMB operations.
Hostname * hostname The network host of the SMB file server.
Password password The password used for authentication.
Port * port 445 Port to use for connection.
Share * share The network share to which files should be listed from. This is the "first folder"after the hostname: [smb://hostname:port/[share]/dir1/dir2](smb://hostname:port/[share]/dir1/dir2)
SMB Dialect * smb-dialect AUTO - AUTO - SMB 2.0.2 - SMB 2.1 - SMB 3.0 - SMB 3.0.2 - SMB 3.1.1 The SMB dialect is negotiated between the client and the server by default to the highest common version supported by both end. In some rare cases, the client-server communication may fail with the automatically negotiated dialect. This property can be used to set the dialect explicitly (e.g. to downgrade to a lower version), when those situations would occur.
Timeout * timeout 5 sec Timeout for read and write operations.
Use Encryption * use-encryption false - true - false Turns on/off encrypted communication between the client and the server. The property's behavior is SMB dialect dependent: SMB 2.x does not support encryption and the property has no effect. In case of SMB 3.x, it is a hint/request to the server to turn encryption on if the server also supports it.
Username username Guest The username used for authentication.
Property Value
SASL Mechanism `AWS_MSK_IAM`
Security Protocol `#{Kafka Security Protocol}`
Bootstrap Servers `#{Kafka Bootstrap Servers}`
Display Name API Name Default Value Allowable Values Description
Account * Account Snowflake Account Identifier with Organization Name and Account Name formatted as [organization-name]-[account-name]
Authentication Strategy * Authentication Strategy PASSWORD - Password - Key Pair - Snowflake Session Token Strategy for authenticating Snowflake connections
Connection Strategy * Connection Strategy STANDARD - Standard - Private Connectivity Strategy for connecting to Snowflake services
Connection Timeout * Connection Timeout 30 seconds Maximum amount of time to wait for a connection from a reusable pool
Database Name Database Name Default Snowflake Database for connections
Idle Timeout * Idle Timeout 10 minutes Maximum amount of time for a connection to remain idle in a reusable pool
Maximum Connections * Maximum Connections 10 Maximum number of connections created and managed in a reusable pool
Maximum Lifetime * Maximum Lifetime 30 minutes Maximum lifetime for each connection in a reusable pool
Password * Password Snowflake Password for authenticating connections
Private Key Service * Private Key Service RSA Private Key Service for authenticating connections
Role Role Default Snowflake Role for connections
Schema Schema Default Snowflake Schema for connections
User * User Snowflake User for authenticating connections
Warehouse Warehouse Default Snowflake Warehouse for connections
Property Description
Content Hash The name of the FlowFile attribute that holds the pre-computed hash. Supports Expression Language.
Document Source Identifier Specifies the document source identifier (doc ID). Supports Expression Language.
Document Source Name Specifies the document source system name. Supports Expression Language.
Snowflake Connection Service The DBCPService that provides connection to Snowflake.
Snowflake Table Name The Snowflake table name that stores the file hashes. The table name is case-insensitive. Database and schema must be configured prior in the Snowflake Connection Service.
Name Description
distinct FlowFiles that do not match an existing document are routed here (new hash inserted).
duplicate FlowFiles that match an existing document (same hash) are routed here.
failure FlowFiles that encounter an error or exception during processing are routed here.
Name Description
snowflake.detect.duplicate A 'true' or 'false' attribute indicating if the FlowFile was detected as a duplicate.
Display Name API Name Default Value Allowable Values Description
Audience * Audience The audience claim (aud) for the JWT.
Connection Pooling Service * Connection Pooling Service The Connection Pooling Service that is used to obtain a connection to the database
JWT Expiration Time * JWT Expiration Time 5 minutes Expiration time used to set the corresponding claim of the JWT.
Snowflake Secret Name * Snowflake Secret Name Name of the JWT Key Pair secret in Snowflake that will be used to sign the JWT.
Subject * Subject The subject claim (sub) for the JWT.
Display Name API Name Default Value Allowable Values Description
Account * Account Snowflake Account Identifier with Organization Name and Account Name formatted as [organization-name]-[account-name]
Private Key Service * Private Key Service RSA Private Key Service for authenticating connections
User * User Snowflake User for authenticating connections
Web Client Service Provider * Web Client Service Provider Web Client Service Provider to make connections
Property Description
Output Size The number of Avro records to include per split file. In cases where the incoming file has less records than the Output Size, or when the total number of records does not divide evenly by the Output Size, it is possible to get a split file with less records.
Output Strategy Determines the format of the output. Either Avro Datafile, or bare record. Bare record output is only intended for use with systems that already require it, and shouldn't be needed for normal use.
Split Strategy The strategy for splitting the incoming datafile. The Record strategy will read the incoming datafile by de-serializing each record.
Transfer Metadata Whether or not to transfer metadata from the parent datafile to the children. If the Output Strategy is Bare Record, then the metadata will be stored as FlowFile attributes, otherwise it will be in the Datafile header.
Name Description
failure If a FlowFile fails processing for any reason (for example, the FlowFile is not valid Avro), it will be routed to this relationship
original The original FlowFile that was split. If the FlowFile fails processing, nothing will be sent to this relationship
split All new files split from the original FlowFile will be routed to this relationship
Name Description
fragment.identifier All split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.index A one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile
fragment.count The number of split FlowFiles generated from the parent FlowFile
segment.original.filename The filename of the parent FlowFile
Property Description
Byte Sequence A representation of bytes to look for and upon which to split the source file into separate files
Byte Sequence Format Specifies how the <Byte Sequence> property should be interpreted
Byte Sequence Location If <Keep Byte Sequence> is set to true, specifies whether the byte sequence should be added to the end of the first split or the beginning of the second; if <Keep Byte Sequence> is false, this property is ignored.
Keep Byte Sequence Determines whether or not the Byte Sequence should be included with each Split
Name Description
original The original file
splits All Splits will be routed to the splits relationship
Name Description
fragment.identifier All split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.index A one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile
fragment.count The number of split FlowFiles generated from the parent FlowFile
segment.original.filename The filename of the parent FlowFile
Property Description
Password The password for a password protected Excel spreadsheet
Protection Type Specifies whether an Excel spreadsheet is protected by a password or not.
Name Description
failure If a FlowFile cannot be transformed from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship.
original The original FlowFile that was split into segments. If the FlowFile fails processing, nothing will be sent to this relationship
split The individual Excel 'segments' of the original Excel FlowFile will be routed to this relationship.
Name Description
fragment.identifier All split Excel FlowFiles produced from the same parent Excel FlowFile will have the same randomly generated UUID added for this attribute
fragment.index A one-up number that indicates the ordering of the split Excel FlowFiles that were created from a single parent Excel FlowFile
fragment.count The number of split Excel FlowFiles generated from the parent Excel FlowFile
segment.original.filename The filename of the parent Excel FlowFile
sheetname The name of the Excel sheet from the original spreadsheet.
total.rows The number of rows in the Excel sheet from the original spreadsheet.
Property Description
JsonPath Expression A JsonPath expression that indicates the array element to split into JSON/scalar fragments.
Max String Length The maximum allowed length of a string value when parsing the JSON document
Null Value Representation Indicates the desired representation of JSON Path expressions resulting in a null value.
Name Description
failure If a FlowFile fails processing for any reason (for example, the FlowFile is not valid JSON or the specified path does not exist), it will be routed to this relationship
original The original FlowFile that was split into segments. If the FlowFile fails processing, nothing will be sent to this relationship
split All segments of the original FlowFile will be routed to this relationship
Name Description
fragment.identifier All split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.index A one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile
fragment.count The number of split FlowFiles generated from the parent FlowFile
segment.original.filename The filename of the parent FlowFile
Property Description
Record Reader Specifies the Controller Service to use for reading incoming data
Record Writer Specifies the Controller Service to use for writing out the records
Records Per Split Specifies how many records should be written to each 'split' or 'segment' FlowFile
Name Description
failure If a FlowFile cannot be transformed from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship.
original Upon successfully splitting an input FlowFile, the original FlowFile will be sent to this relationship.
splits The individual 'segments' of the original FlowFile will be routed to this relationship.
Name Description
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer for the FlowFiles routed to the 'splits' Relationship.
record.count The number of records in the FlowFile. This is added to FlowFiles that are routed to the 'splits' Relationship.
fragment.identifier All split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.index A one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile
fragment.count The number of split FlowFiles generated from the parent FlowFile
segment.original.filename The filename of the parent FlowFile
Property Description
Header Line Count The number of lines that should be considered part of the header; the header lines will be duplicated to all split files
Header Line Marker Characters The first character(s) on the line of the datafile which signifies a header line. This value is ignored when Header Line Count is non-zero. The first line not containing the Header Line Marker Characters and all subsequent lines are considered non-header
Line Split Count The number of lines that will be added to each split file, excluding header lines. A value of zero requires Maximum Fragment Size to be set, and line count will not be considered in determining splits.
Maximum Fragment Size The maximum size of each split file, including header lines. NOTE: in the case where a single line exceeds this property (including headers, if applicable), that line will be output in a split of its own which exceeds this Maximum Fragment Size setting.
Remove Trailing Newlines Whether to remove newlines at the end of each split file. This should be false if you intend to merge the split files later. If this is set to 'true' and a FlowFile is generated that contains only 'empty lines' (i.e., consists only of r and n characters), the FlowFile will not be emitted. Note, however, that if header lines are specified, the resultant FlowFile will never be empty as it will consist of the header lines, so a FlowFile may be emitted that contains only the header lines.
Name Description
failure If a file cannot be split for some reason, the original file will be routed to this destination and nothing will be routed elsewhere
original The original input file will be routed to this destination when it has been successfully split into 1 or more files
splits The split files will be routed to this destination when an input file is successfully split into 1 or more split files
Name Description
text.line.count The number of lines of text from the original FlowFile that were copied to this FlowFile
fragment.size The number of bytes from the original FlowFile that were copied to this FlowFile, including header, if applicable, which is duplicated in each split FlowFile
fragment.identifier All split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.index A one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile
fragment.count The number of split FlowFiles generated from the parent FlowFile
segment.original.filename The filename of the parent FlowFile
Property Description
Split Depth Indicates the XML-nesting depth to start splitting XML fragments. A depth of 1 means split the root 's children, whereas a depth of 2 means split the root's children's children and so forth.
Name Description
failure If a FlowFile fails processing for any reason (for example, the FlowFile is not valid XML), it will be routed to this relationship
original The original FlowFile that was split into segments. If the FlowFile fails processing, nothing will be sent to this relationship
split All segments of the original FlowFile will be routed to this relationship
Name Description
fragment.identifier All split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.index A one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile
fragment.count The number of split FlowFiles generated from the parent FlowFile
segment.original.filename The filename of the parent FlowFile
Display Name API Name Default Value Allowable Values Description
Anthropic API Key * Anthropic API Key The API Key for authenticating to Anthropic
Backoff Base Delay (ms) * Backoff Base Delay (ms) 1000 The base delay in milliseconds for exponential backoff between retries
Max Response Tokens * Max Response Tokens 1000 The maximum number of tokens to generate in the response.
Max Retries * Max Retries 3 The maximum number of retry attempts for API calls
Model Name * Model Name claude-3-5-sonnet-latest The name of the Anthropic model
Temperature Temperature The temperature to use for generating the response.
Top K Top K The top K value to use for generating the response. Only sample from the top K options for each subsequent token. Recommended for advanced use cases only. You usually only need to use temperature.
Top P Top P The top_p value for nucleus sampling. It controls the diversity of the generated responses.
User ID User ID The user id to set in the request metadata
Web Client Service * Web Client Service The Web Client Service to use for communicating with the LLM provider.
Display Name API Name Default Value Allowable Values Description
Credential Configuration Strategy * Credential Configuration Strategy default-credential - Default Credential - Managed Identity
Managed Identity Client ID Managed Identity Client ID Client ID of the managed identity. The property is required when User Assigned Managed Identity is used for authentication. It must be empty in case of System Assigned Managed Identity.
Display Name API Name Default Value Allowable Values Description
API Token * API Token Token used for API authentication
Environment URL * Environment URL URL to the Atlassian Confluence Environment ie. [https://domain.atlassian.net](https://domain.atlassian.net)
Request Rate Manager * Request Rate Manager Controller service for keeping track of rate limits for Atlassian APIs
User Email * User Email Confluence user email
Web Client Service * Web Client Service The Web Client Service to use for communicating with Confluence
Display Name API Name Default Value Allowable Values Description
Authentication Method * Authentication Method OAUTH_M2M - OAuth M2M - PAT Method to authenticate with Databricks
OAuth Client ID * OAuth Client ID Databricks OAuth Client ID, also known as an application ID
OAuth Client Secret * OAuth Client Secret Databricks Service Principal's OAuth Client Secret.
Personal Access Token * Personal Access Token Databricks Personal Access Token
Workspace ID * Workspace ID Databricks Workspace ID
Proxy Configuration Service proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Display Name API Name Default Value Allowable Values Description
Access Token * Access Token Access Token of the user's Dropbox app. See Additional Details for more information about Access Token generation.
App Key * App Key App Key of the user's Dropbox app. See Additional Details for more information.
App Secret * App Secret App Secret of the user's Dropbox app. See Additional Details for more information.
Refresh Token * Refresh Token Refresh Token of the user's Dropbox app. See Additional Details for more information about Refresh Token generation.
Display Name API Name Default Value Allowable Values Description
File Path * file-path $\{absolute.path\}/$\{filename\} Path to a file that can be accessed locally.
Required Permission Explanation
read filesystem Provides operator the ability to read from any file that NiFi has access to.
Display Name API Name Default Value Allowable Values Description
Configuration Strategy * configuration-strategy direct-properties - Direct Properties - Properties Files Specifies the source of the configuration properties.
Vault Authentication * vault.authentication TOKEN - TOKEN - APPID - APPROLE - AWS_EC2 - AZURE - CERT - CUBBYHOLE - KUBERNETES Vault authentication method, as described in the Spring Vault Environment Configuration documentation ([https://docs.spring.io/spring-vault/docs/2.3.x/reference/html/#vault.core.environment-vault-configuration](https://docs.spring.io/spring-vault/docs/2.3.x/reference/html/#vault.core.environment-vault-configuration)).
Connection Timeout * vault.connection.timeout 5 sec The connection timeout for the HashiCorp Vault client
Vault Properties Files * vault.properties.files A comma-separated list of files containing HashiCorp Vault configuration properties, as described in the Spring Vault Environment Configuration documentation ([https://docs.spring.io/spring-vault/docs/2.3.x/reference/html/#vault.core.environment-vault-configuration](https://docs.spring.io/spring-vault/docs/2.3.x/reference/html/#vault.core.environment-vault-configuration)). All of the Spring property keys and authentication-specific property keys are supported.
Read Timeout * vault.read.timeout 15 sec The read timeout for the HashiCorp Vault client
SSL Context Service vault.ssl.context.service The SSL Context Service used to provide client certificate information for TLS/SSL connections to the HashiCorp Vault server.
Vault URI * vault.uri The URI of the HashiCorp Vault server (e.g., [http://localhost:8200](http://localhost:8200)). Required if not specified in the Bootstrap HashiCorp Vault Configuration File.
Display Name API Name Default Value Allowable Values Description
Maximum Outstanding Requests * Maximum Outstanding Requests 5000 The maximum number of HTTP requests that can be outstanding at any one time. Any attempt to register an additional HTTP Request will cause an error
Request Expiration * Request Expiration 1 min Specifies how long an HTTP Request should be left unanswered before being evicted from the cache and being responded to with a Service Unavailable status code
Display Name API Name Default Value Allowable Values Description
HubSpot Access Token * HubSpot Access Token HubSpot Access Token
Web Client Service Provider * Web Client Service Provider The Web Client Service to use for communicating with HubSpot
Display Name API Name Default Value Allowable Values Description
JSON Schema Version * JSON Schema Version DRAFT_2020_12 - Draft 4 - Draft 6 - Draft 7 - Draft 2019-09 - Draft 2020-12 The JSON schema specification
Display Name API Name Default Value Allowable Values Description
Application Client ID * Application Client ID Azure Data Explorer Application Client Identifier for Authentication
Application Key * Application Key Azure Data Explorer Application Key for Authentication
Application Tenant ID * Application Tenant ID Azure Data Explorer Application Tenant Identifier for Authentication
Authentication Strategy * Authentication Strategy MANAGED_IDENTITY - Application Credentials - Managed Identity - Azure CLI (Dev Only) Authentication method for access to Azure Data Explorer
Cluster URI * Cluster URI Azure Data Explorer Cluster URI
Display Name API Name Default Value Allowable Values Description
Application Client ID * Application Client ID Azure Data Explorer Application Client Identifier for Authentication
Application Key * Application Key Azure Data Explorer Application Key for Authentication
Application Tenant ID * Application Tenant ID Azure Data Explorer Application Tenant Identifier for Authentication
Authentication Strategy * Authentication Strategy MANAGED_IDENTITY - Application Credentials - Managed Identity - Azure CLI (Dev Only) Authentication method for access to Azure Data Explorer
Cluster URI * Cluster URI Azure Data Explorer Cluster URI
Display Name API Name Default Value Allowable Values Description
API Key * API Key Milvus API Key for authenticating connections
Authentication Strategy * Authentication Strategy PASSWORD - Password - API Key Strategy for authenticating Milvus connections
Connection Timeout * Connection Timeout 30 seconds Maximum amount of time to wait for a connection from a reusable pool
Idle Timeout * Idle Timeout 10 minutes Maximum amount of time for a connection to remain idle in a reusable pool
Password * Password Milvus password for authenticating connections
SSL Context Service SSL Context Service The SSL Context Service used to provide client certificate information for TLS/SSL connections.
Service URI * Service URI The URI to use to communicate with Milvus
User * User Milvus username for authenticating connections
Display Name API Name Default Value Allowable Values Description
Audience Audience Audience for the access token request defined in RFC 8693 Section 2.1
Authorization Server URL * Authorization Server URL The URL of the authorization server that issues access tokens.
Client Authentication Strategy * Client Authentication Strategy REQUEST_BODY - REQUEST_BODY - BASIC_AUTHENTICATION Strategy for authenticating the client against the OAuth2 token provider service.
Client ID Client ID
Client secret * Client secret
Grant Type * Grant Type password - User Password - Client Credentials - Refresh Token The OAuth2 Grant Type to be used when acquiring an access token.
HTTP Protocols * HTTP Protocols H2_HTTP_1_1 - http/1.1 - h2 http/1.1 - h2 HTTP Protocols supported for Application Layer Protocol Negotiation with TLS
Password * Password Password for the username on the service that is being accessed.
Refresh Token * Refresh Token Refresh Token supports retrieving a new Access Token when configured
Refresh Window * Refresh Window 0 s The service will attempt to refresh tokens expiring within the refresh window, subtracting the configured duration from the token expiration.
Resource Resource Resource URI for the access token request defined in RFC 8707 Section 2
SSL Context Service SSL Context Service
Scope Scope Space-delimited, case-sensitive list of scopes of the access request (as per the OAuth 2.0 specification)
Username * Username Username on the service that is being accessed.
Proxy Configuration Service proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Display Name API Name Default Value Allowable Values Description
Communications Timeout * Communications Timeout 60 secs The amount of time to wait for a response from the OCR Service.
Custom Service URL * Custom Service URL The Custom URL of the Openflow Tesseract OCR Service.
OCR Languages * OCR Languages ENGLISH The Languages to use when performing OCR if none are provided by the caller.This is a commma separated list of the following Valid Values:ENGLISH, KOREAN, KOREAN_VERT, HEBREW
Service Location Strategy * Service Location Strategy Default - Default - Custom Determines how Service Locations configured within this Controller for the Openflow Tesseract OCR Service.
Display Name API Name Default Value Allowable Values Description
Backoff Base Delay (ms) * Backoff Base Delay (ms) 1000 The base delay in milliseconds for exponential backoff between retries
Max Response Tokens Max Response Tokens The maximum number of tokens to generate in the response.
Max Retries * Max Retries 3 The maximum number of retry attempts for API calls
Model Name * Model Name gpt-4o-mini The name of the OpenAI model.
OpenAI API Key * OpenAI API Key The API Key for authenticating to OpenAI.
Seed Seed The seed to use for generating the response
Temperature Temperature The temperature to use for generating the response.
Top P Top P The top_p value for nucleus sampling. It controls the diversity of the generated responses.
User User Your end user, sent to OpenAI for monitoring and detection of abuse
Web Client Service * Web Client Service The Web Client Service to use for communicating with the LLM provider.
Display Name API Name Default Value Allowable Values Description
Key Password * key-password Password used for decrypting Private Keys
Keyring keyring PGP Keyring or Secret Key encoded in ASCII Armor
Keyring File keyring-file File path to PGP Keyring or Secret Key encoded in binary or ASCII Armor
Display Name API Name Default Value Allowable Values Description
Keyring keyring PGP Keyring or Public Key encoded in ASCII Armor
Keyring File keyring-file File path to PGP Keyring or Public Key encoded in binary or ASCII Armor
Display Name API Name Default Value Allowable Values Description
Key key Private Key structured using PKCS8 and encoded as PEM
Key File key-file File path to Private Key structured using PKCS8 and encoded as PEM
Key Password key-password Password used for decrypting Private Keys
Display Name API Name Default Value Allowable Values Description
Message Name * Message Name Fully qualified name of the Protocol Buffers message including its package (eg. mypackage.MyMessage).
Message Name Resolution Strategy * Message Name Resolution Strategy MESSAGE_NAME_PROPERTY - Message Name Property - Message Name Resolver Strategy for determining the Protocol Buffers message name for processing
Message Name Resolver * Message Name Resolver Service that dynamically resolves Protocol Buffer message names from FlowFile content or attributes
Schema Access Strategy * Schema Access Strategy schema-name - Use 'Schema Name' Property - Use 'Schema Text' Property - Schema Reference Reader Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property
Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier
Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry
Schema Text * Schema Text $\{proto.schema\} The text of a Proto 3 formatted Schema
Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
Display Name API Name Default Value Allowable Values Description
Proxy Server Host proxy-server-host Proxy server hostname or ip-address.
Proxy Server Port proxy-server-port Proxy server port number.
Proxy Type * proxy-type DIRECT - DIRECT - HTTP - SOCKS Proxy type.
Proxy User Name proxy-user-name The name of the proxy client for user authentication.
Proxy User Password proxy-user-password The password of the proxy client for user authentication.
SOCKS Version * socks-version SOCKS5 - SOCKS4 - SOCKS5 SOCKS Protocol Version
Display Name API Name Default Value Allowable Values Description
Keystore Filename Keystore Filename The fully-qualified filename of the Keystore
Keystore Password Keystore Password The password for the Keystore
Keystore Type Keystore Type - BCFKS - PKCS12 - JKS The Type of the Keystore
TLS Protocol SSL Protocol TLS - TLS - TLSv1.3 - TLSv1.2 TLS Protocol Version for encrypted connections. Supported versions depend on the specific version of Java used.
Truststore Filename Truststore Filename The fully-qualified filename of the Truststore
Truststore Password Truststore Password The password for the Truststore
Truststore Type Truststore Type - BCFKS - PKCS12 - JKS The Type of the Truststore
Key Password key-password The password for the key. If this is not specified, but the Keystore Filename, Password, and Type are specified, then the Keystore Password will be assumed to be the same as the Key Password.
Display Name API Name Default Value Allowable Values Description
Encryption Strategy * Encryption Strategy NONE - None - Server-side S3 - Server-side KMS - Server-side Customer Key - Client-side KMS - Client-side Customer Key Strategy to use for S3 data encryption and decryption.
KMS Region KMS Region us-west-2 - AWS GovCloud (US) - AWS GovCloud (US-East) - US East (N. Virginia) - US East (Ohio) - US West (N. California) - US West (Oregon) - EU (Ireland) - EU (London) - EU (Paris) - EU (Frankfurt) - EU (Zurich) - EU (Stockholm) - EU (Milan) - EU (Spain) - Asia Pacific (Hong Kong) - Asia Pacific (Taipei) - Asia Pacific (Mumbai) - Asia Pacific (Hyderabad) - Asia Pacific (Singapore) - Asia Pacific (Sydney) - Asia Pacific (Jakarta) - Asia Pacific (Melbourne) - Asia Pacific (Malaysia) - Asia Pacific (Thailand) - Asia Pacific (Tokyo) - Asia Pacific (Seoul) - Asia Pacific (Osaka) - South America (Sao Paulo) - China (Beijing) - China (Ningxia) - Canada (Central) - Canada West (Calgary) - Middle East (UAE) - Middle East (Bahrain) - Africa (Cape Town) - US ISO East - US ISOB East (Ohio) - US ISO West - US ISOF East1 (California) - US ISOF South1 (Alpine) - Israel (Tel Aviv) - Mexico (Central) - EU ISOE West The Region of the AWS Key Management Service. Only used in case of Client-side KMS.
Key ID or Key Material Key ID or Key Material For None and Server-side S3: not used. For Server-side KMS and Client-side KMS: the KMS Key ID must be configured. For Server-side Customer Key and Client-side Customer Key: the Key Material must be specified in Base64 encoded form. In case of Server-side Customer Key, the key must be an AES-256 key. In case of Client-side Customer Key, it can be an AES-256, AES-192 or AES-128 key.
Display Name API Name Default Value Allowable Values Description
API Version * API Version 63.0 The version number of the Salesforce REST API appended to the URL after the services/data path. See Salesforce documentation for supported versions.
OAuth2 Access Token Provider * OAuth2 Access Token Provider Service providing OAuth2 Access Tokens for authenticating using the HTTP Authorization Header
Salesforce Instance * Salesforce Instance The hostname of the Salesforce instance including the domain such as MyDomainName.my.salesforce.com
Web Client Service * Web Client Service The Web Client Service to use for communicating with Salesforce
Display Name API Name Default Value Allowable Values Description
Data Cloud Instance Data Cloud Instance The hostname of the Salesforce instance including the domain such as MyDomainName.my.salesforce.com
Data Cloud Token Provider * Data Cloud Token Provider Service providing OAuth2 Access Tokens for authenticating using the HTTP Authorization Header
Web Client Service * Web Client Service The Web Client Service to use for communicating with Salesforce
Display Name API Name Default Value Allowable Values Description
Enable Rate Limiting * Enable Rate Limiting true - true - false Enable or disable rate limiting functionality
Display Name API Name Default Value Allowable Values Description
Keystore Filename Keystore Filename The fully-qualified filename of the Keystore
Keystore Password Keystore Password The password for the Keystore
Keystore Type Keystore Type - BCFKS - PKCS12 - JKS The Type of the Keystore
TLS Protocol SSL Protocol TLS - SSL - TLS - TLSv1.3 - TLSv1.2 - TLSv1.1 - TLSv1 SSL or TLS Protocol Version for encrypted connections. Supported versions include insecure legacy options and depend on the specific version of Java used.
Truststore Filename Truststore Filename The fully-qualified filename of the Truststore
Truststore Password Truststore Password The password for the Truststore
Truststore Type Truststore Type - BCFKS - PKCS12 - JKS The Type of the Truststore
Key Password key-password The password for the key. If this is not specified, but the Keystore Filename, Password, and Type are specified, then the Keystore Password will be assumed to be the same as the Key Password.
Display Name API Name Default Value Allowable Values Description
API Key * API Key Vectara API Key
Customer ID * Customer ID Vectara Customer ID
Display Name API Name Default Value Allowable Values Description
Connect Timeout * Connect Timeout 10 secs Maximum amount of time to wait before failing during initial socket connection
HTTP Protocol Version * HTTP Protocol Version HTTP_2 - HTTP_1_1 - HTTP_2 Preferred HTTP protocol version for requests
Read Timeout * Read Timeout 10 secs Maximum amount of time to wait before failing while reading socket responses
Redirect Handling Strategy * Redirect Handling Strategy FOLLOWED - FOLLOWED - IGNORED Handling strategy for responding to HTTP 301 or 302 redirects received with a Location header
SSL Context Service SSL Context Service SSL Context Service overrides system default TLS settings for HTTPS communication
Write Timeout * Write Timeout 10 secs Maximum amount of time to wait before failing while writing socket requests
Proxy Configuration Service proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Communications Timeout
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
JSON Payload JSON request for AWS Machine Learning services. The Processor will use FlowFile content for the request when this property is not specified.
Region
SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections
Name Description
failure FlowFiles are routed to failure relationship
original Upon successful completion, the original FlowFile will be routed to this relationship.
success FlowFiles are routed to success relationship
Name Description
awsTaskId The task ID that can be used to poll for Job completion in GetAwsPollyJobStatus
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Communications Timeout
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
JSON Payload JSON request for AWS Machine Learning services. The Processor will use FlowFile content for the request when this property is not specified.
Region
SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections
Textract Type Supported values: "Document Analysis", "Document Text Detection", "Expense Analysis"
Name Description
failure FlowFiles are routed to failure relationship
original Upon successful completion, the original FlowFile will be routed to this relationship.
success FlowFiles are routed to success relationship
Name Description
awsTaskId The task ID that can be used to poll for Job completion in GetAwsTextractJobStatus
awsTextractType The selected Textract type, which can be used in GetAwsTextractJobStatus
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Communications Timeout
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
JSON Payload JSON request for AWS Machine Learning services. The Processor will use FlowFile content for the request when this property is not specified.
Region
SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections
Name Description
failure FlowFiles are routed to failure relationship
original Upon successful completion, the original FlowFile will be routed to this relationship.
success FlowFiles are routed to success relationship
Name Description
awsTaskId The task ID that can be used to poll for Job completion in GetAwsTranscribeJobStatus
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Communications Timeout
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
JSON Payload JSON request for AWS Machine Learning services. The Processor will use FlowFile content for the request when this property is not specified.
Region
SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections
Name Description
failure FlowFiles are routed to failure relationship
original Upon successful completion, the original FlowFile will be routed to this relationship.
success FlowFiles are routed to success relationship
Name Description
awsTaskId The task ID that can be used to poll for Job completion in GetAwsTranslateJobStatus
Property Description
gcp-credentials-provider-service The Controller Service used to obtain Google Cloud Platform credentials.
json-payload JSON request for AWS Machine Learning services. The Processor will use FlowFile content for the request when this property is not specified.
output-bucket Name of the GCS bucket where the output of the Vision job will be persisted. The value of this property applies when the JSON Payload property is configured. The JSON Payload property value can use Expression Language to reference the value of $\{output-bucket\}
vision-feature-type Type of GCP Vision Feature. The value of this property applies when the JSON Payload property is configured. The JSON Payload property value can use Expression Language to reference the value of $\{vision-feature-type\}
Name Description
failure FlowFiles are routed to failure relationship
success FlowFiles are routed to success relationship
Name Description
operationKey A unique identifier of the operation returned by the Vision server.
Property Description
gcp-credentials-provider-service The Controller Service used to obtain Google Cloud Platform credentials.
json-payload JSON request for AWS Machine Learning services. The Processor will use FlowFile content for the request when this property is not specified.
output-bucket Name of the GCS bucket where the output of the Vision job will be persisted. The value of this property applies when the JSON Payload property is configured. The JSON Payload property value can use Expression Language to reference the value of $\{output-bucket\}
vision-feature-type Type of GCP Vision Feature. The value of this property applies when the JSON Payload property is configured. The JSON Payload property value can use Expression Language to reference the value of $\{vision-feature-type\}
Name Description
failure FlowFiles are routed to failure relationship
success FlowFiles are routed to success relationship
Name Description
operationKey A unique identifier of the operation returned by the Vision server.
Property Description
Column Delimiter The column delimiter used for CSV job data.
Configuration Mode The configuration mode for configuring this processor. If using advanced mode, the SOQL query has to be provided and the processor 's state will only store the timestamp of the last query job submission regardless of the object queried. If using simple mode, the object name and the fields to be queried have to be provided and the processor's state will store the timestamp of the last query job submission for each object queried.
Incremental Offload Whether the processor should perform incremental offload. If true, the processor will only fetch the records that have been modified since the last query job submission by using a WHERE clause on the SystemModstamp field.
Line Ending The line ending used for CSV job data, marking the end of a data row.
Object Fields Comma separated list of the name of the fields to be queried for the specified object.
Object Name The name of the object to be queried.
Operation The type of query to submit.
Query The query to be performed. In order to perform incremental retrieval (ie. only the added/modified/deleted elements since the last submission of the query are retrieved), this processor exposes two attributes: $\{nowTs\} and $\{lastJobTimestamp\}. It is possible to use those placeholders like SELECT Id FROM Account WHERE SystemModstamp > $\{lastJobTimestamp\} AND SystemModstamp <= $\{nowTs\}.
Result Format The format to be used for the results. Currently the only supported value is CSV.
Salesforce Bulk Job State Service Controller Service to store Bulk Jobs state per object type (used in SIMPLE mode). In ADVANCED mode, the processor stores a single 'last' timestamp in processor state.
Salesforce Client Salesforce Client to interact with the APIs
Scopes Description
CLUSTER In case the placeholders for incremental retrieval are used in the query field, the timestamp of the last Query Job submission time minus 30 seconds will be stored in the state.
Name Description
comms.failure An incoming FlowFile is routed to this relationship if the Query Job could not be submitted but the operation might be retried
failure An incoming FlowFile is routed to this relationship if the Query Job could not be submitted
in.progress An incoming FlowFile is routed to this relationship when a previous job for the same object is still IN_PROGRESS
success When a Query Job is successfully submited, a FlowFile is routed to this relationship
Name Description
jobId The unique ID for this job.
operationType The type of query.
objectType The object type being queried.
createdById The ID of the user who created the job.
createdDate The UTC date and time when the job was created.
systemModstamp The UTC date and time when the API last updated the job information.
jobState The current state of processing for the job.
concurrencyMode How the request is processed.
contentType The format to be used for the results.
apiVersion The API version that the job was created in.
lineEnding The line ending used for CSV job data, marking the end of a data row.
columnDelimiter The column delimiter used for CSV job data.
nowTs Upper limit of the time range used in the WHERE close to construct the Query Job.
lastJobTimestamp Lower limit of the time range used in the WHERE close to construct the Query Job.
Property Description
Content The content to be summarized. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}
LLM Provider Service The provider service for sending evaluation prompts to LLM
Max File Size The maximum size of a FlowFile that can be summarized. If the FlowFile is larger than this, it will be routed to 'failure'.
Output Strategy Determines response output destination
Results Attribute The name of the attribute to write the response to.
Name Description
failure FlowFiles that cannot be processed are routed to this relationship
success FlowFiles that are successfully processed are routed to this relationship
Display Name API Name Default Value Allowable Values Description
Character Set * Character Set UTF-8 Specifies which character set of the Syslog messages
Raw message * syslog-5424-reader-raw-message false - true - false If true, the record will have a _raw field containing the raw message
Display Name API Name Default Value Allowable Values Description
Character Set * Character Set UTF-8 Specifies which character set of the Syslog messages
Raw message * syslog-5424-reader-raw-message false - true - false If true, the record will have a _raw field containing the raw message
Property Description
AWS Credentials Provider service The Controller Service that is used to obtain AWS credentials provider
Append Tag If set to true, the tag will be appended to the existing set of tags on the S3 object. Any existing tags with the same key as the new tag will be updated with the specified value. If set to false, the existing tags will be removed and the new tag will be set on the S3 object.
Bucket The S3 Bucket to interact with
Communications Timeout The amount of time to wait in order to establish a connection to AWS or receive data from AWS before timing out.
Custom Signer Class Name Fully qualified class name of the custom signer class. The signer must implement com.amazonaws.auth. Signer interface.
Custom Signer Module Location Comma-separated list of paths to files and/or directories which contain the custom signer's JAR file and its dependencies (if any).
Endpoint Override URL Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
Object Key The S3 Object Key to use. This is analogous to a filename for traditional file systems.
Region The AWS Region to connect to.
SSL Context Service Specifies an optional SSL Context Service that, if provided, will be used to create connections
Signer Override The AWS S3 library uses Signature Version 4 by default but this property allows you to specify the Version 2 signer to support older S3-compatible services or even to plug in your own custom signer implementation.
Tag Key The key of the tag that will be set on the S3 Object
Tag Value The value of the tag that will be set on the S3 Object
Version The Version of the Object to tag
proxy-configuration-service Specifies the Proxy Configuration Controller Service to proxy network requests.
Name Description
failure If the Processor is unable to process a given FlowFile, it will be routed to this Relationship.
success FlowFiles are routed to this Relationship after they have been successfully processed.
Name Description
s3.tag.___ The tags associated with the S3 object will be written as part of the FlowFile attributes
s3.exception The class name of the exception thrown during processor execution
s3.additionalDetails The S3 supplied detail from the failed operation
s3.statusCode The HTTP error code (if available) from the failed operation
s3.errorCode The S3 moniker of the failed operation
s3.errorMessage The S3 exception message from the failed operation
Property Description
File Location Specifies where the state is located either local or cluster so that state can be stored appropriately in order to ensure that all data is consumed without duplicating data upon restart of NiFi
File to Tail Path of the file to tail in case of single file mode. If using multifile mode, regular expression to find files to tail in the base directory. In case recursivity is set to true, the regular expression will be used to match the path starting from the base directory (see additional details for examples).
Initial Start Position When the Processor first begins to tail data, this property specifies where the Processor should begin reading data. Once data has been ingested from a file, the Processor will continue from the last point from which it has received data.
Line Start Pattern A Regular Expression to match against the start of a log line. If specified, any line that matches the expression, and any following lines, will be buffered until another line matches the Expression. In doing this, we can avoid splitting apart multi-line messages in the file. This assumes that the data is in UTF-8 format.
Max Buffer Size When using the Line Start Pattern, there may be situations in which the data in the file being tailed never matches the Regular Expression. This would result in the processor buffering all data from the tailed file, which can quickly exhaust the heap. To avoid this, the Processor will buffer only up to this amount of data before flushing the buffer, even if it means ingesting partial data from the file.
Post-Rollover Tail Period When a file is rolled over, the processor will continue tailing the rolled over file until it has not been modified for this amount of time. This allows for another process to rollover a file, and then flush out any buffered data. Note that when this value is set, and the tailed file rolls over, the new file will not be tailed until the old file has not been modified for the configured amount of time. Additionally, when using this capability, in order to avoid data duplication, this period must be set longer than the Processor's Run Schedule, and the Processor must not be stopped after the file being tailed has been rolled over and before the data has been fully consumed. Otherwise, the data may be duplicated, as the entire file may be written out as the contents of a single FlowFile.
Rolling Filename Pattern If the file to tail "rolls over" as would be the case with log files, this filename pattern will be used to identify files that have rolled over so that if NiFi is restarted, and the file has rolled over, it will be able to pick up where it left off. This pattern supports wildcard characters * and ?, it also supports the notation $\{filename\} to specify a pattern based on the name of the file (without extension), and will assume that the files that have rolled over live in the same directory as the file being tailed. The same glob pattern will be used for all files.
pre-allocated-buffer-size Sets the amount of memory that is pre-allocated for each tailed file.
reread-on-nul If this option is set to 'true', when a NUL character is read, the processor will yield and try to read the same part again later. (Note: Yielding may delay the processing of other files tailed by this processor, not just the one with the NUL character.) The purpose of this flag is to allow users to handle cases where reading a file may return temporary NUL values. NFS for example may send file contents out of order. In this case the missing parts are temporarily replaced by NUL values. CAUTION! If the file contains legitimate NUL values, setting this flag causes this processor to get stuck indefinitely. For this reason users should refrain from using this feature if they can help it and try to avoid having the target file on a file system where reads are unreliable.
tail-base-directory Base directory used to look for files to tail. This property is required when using Multifile mode.
tail-mode Mode to use: single file will tail only one file, multiple file will look for a list of file. In Multiple mode the Base directory is required.
tailfile-lookup-frequency Only used in Multiple files mode. It specifies the minimum duration the processor will wait before listing again the files to tail.
tailfile-maximum-age Only used in Multiple files mode. It specifies the necessary minimum duration to consider that no new messages will be appended in a file regarding its last modification date. This should not be set too low to avoid duplication of data in case new messages are appended at a lower frequency.
tailfile-recursive-lookup When using Multiple files mode, this property defines if files must be listed recursively or not in the base directory.
Scopes Description
LOCAL Stores state about where in the Tailed File it left off so that on restart it does not have to duplicate data. State is stored either local or clustered depend on the <File Location> property.
CLUSTER Stores state about where in the Tailed File it left off so that on restart it does not have to duplicate data. State is stored either local or clustered depend on the <File Location> property.
Required Permission Explanation
read filesystem Provides operator the ability to read from any file that NiFi has access to.
Name Description
success All FlowFiles are routed to this Relationship.
Name Description
tailfile.original.path Path of the original file the flow file comes from.
Property Description
XSLT file name Provides the name (including full path) of the XSLT file to apply to the FlowFile XML content. One of the 'XSLT file name' and 'XSLT Lookup' properties must be defined.
cache-size Maximum number of stylesheets to cache. Zero disables the cache.
cache-ttl-after-last-access The cache TTL (time-to-live) or how long to keep stylesheets in the cache after last access.
indent-output Whether or not to indent the output.
secure-processing Whether or not to mitigate various XML-related attacks like XXE (XML External Entity) attacks.
xslt-controller Controller lookup used to store XSLT definitions. One of the 'XSLT file name' and 'XSLT Lookup' properties must be defined. WARNING: note that the lookup controller service should not be used to store large XSLT files.
xslt-controller-key Key used to retrieve the XSLT definition from the XSLT lookup controller. This property must be set when using the XSLT controller property.
Name Description
failure If a FlowFile fails processing for any reason (for example, the FlowFile is not valid XML), it will be routed to this relationship
success The FlowFile with transformed content will be routed to this relationship
; ``` If `total_rows` differs from `distinct_keys`, the columns aren't suitable as a logical key. 2. Correct the key configuration (choose columns that are unique, or add a primary key to the table) and run a full reload for the affected table to reconcile the destination. To avoid this issue, verify uniqueness before declaring a logical key. On large tables, consider running the verification query during a low-traffic window or sampling with a `WHERE` clause. ## No changes in incremental load The incremental load isn't capturing or applying changes from the source database. **Solution** Run the verification for the **Read Oracle CDC Stream** processor: 1. In your Openflow runtime, double-click the **Oracle** flow. 2. Double-click the process group named **Incremental Load**. 3. Find the **Read Oracle CDC Stream** processor. 1. If it is running, right-click and select **Stop**. The processor must be stopped before you can verify its configuration. 4. Right-click **Read Oracle CDC Stream** again, then select **Configure**. 5. Select the **Properties** tab. 6. Select the **Verification** checkmark icon in the upper-right corner. 7. In the popup window that appears, select **Verify** in the lower-right corner. The results of the verification procedure appear below. The procedure validates database connectivity and checks the status of the components required for incremental load to work. If any of the verification steps fail, view the error message, fix the issue, and run the verification again. The following sections describe specific issues and solutions. ## Capture Status not ENABLED The capture process status is `DISABLED` or `ABORTED`. A `DISABLED` status means the capture process was stopped manually (with `DBMS_XSTREAM_ADM.STOP_OUTBOUND`) or the database was restarted. An `ABORTED` status means the capture encountered an error, usually because redo logs needed for the capture process have been deleted. You can confirm this by checking the System Change Number (SCN) position or querying the capture status. **Solution** Start the outbound server: ```sql BEGIN DBMS_XSTREAM_ADM.START_OUTBOUND('XOUT1'); END; / ``` ## UNKNOWN status of LogMiner session The LogMiner status is `UNKNOWN`, which means that archived logs that LogMiner depended on were deleted. You can confirm this by querying `V$ARCHIVED_LOG` and checking for rows where the DELETED column has value YES. **Solution** Recreate the XStream outbound server. For more information, see [](#label-recreate-xstream-outbound-server) ## WAITING FOR REDO status of XStream capture The XStream capture status shows `WAITING FOR REDO: FILE NA, THREAD 1, SEQUENCE 47, SCN 0x0000000000190ac4`. This means LogMiner is waiting for an archived log file that isn't available because it was deleted. You can confirm this by querying `V$ARCHIVED_LOG` and checking for rows where the DELETED column has value YES. **Solution** Recreate the XStream outbound server. For more information, see [](#label-recreate-xstream-outbound-server) ## XStream capture rules are incorrect XStream isn't configured to capture changes from the expected schemas or tables. **Solution** Verify the capture rules by running the following query: ```sql SELECT STREAMS_NAME, SCHEMA_NAME, OBJECT_NAME, RULE_TYPE FROM DBA_XSTREAM_RULES WHERE STREAMS_NAME = 'XOUT1'; ``` You can also query the capture status and error message directly: ```sql SELECT CLIENT_NAME, STATUS, ERROR_MESSAGE FROM ALL_CAPTURE; ``` This query returns: - `CLIENT_NAME`: The name of the XStream client (outbound server). - `STATUS`: The current status of the capture process (for example, `ENABLED`, `DISABLED`, `ABORTED`). - `ERROR_MESSAGE`: Any error message associated with the capture process. ## Error ORA-21560: argument last_position is null, invalid, or out of range The connector attempted to connect to an SCN position for which redo logs are no longer available. **Solution** Confirm the issue by running the following query. The SCN for `Last SCN processed by XStream` must be higher than the lowest SCN for which redo logs exist. ```sql SELECT min(FIRST_CHANGE#) as SCN, 'Lowest SCN for which redo logs still exist' AS DESCRIPTION FROM V$ARCHIVED_LOG WHERE DELETED = 'NO' UNION ALL SELECT PROCESSED_LOW_SCN, 'Last SCN processed by XStream' FROM DBA_XSTREAM_OUTBOUND_PROGRESS WHERE SERVER_NAME = 'XOUT1' ORDER BY SCN; ``` To recover from this error, recreate the XStream outbound server. For more information, see [](#label-recreate-xstream-outbound-server) ## Error ORA-26701: Streams process XOUT1 does not exist The XStream outbound server can't be found on the database instance. **Solution** Verify the following: - The database name in `Oracle Source Parameters` %ra% `XStream Out Server URL` points to the database instance with the XStream outbound server, not a different PDB. - XStream has been created on this instance and has the same name. ## Error ORA-01722: invalid number when creating the outbound server Executing `DBMS_XSTREAM_ADM.CREATE_OUTBOUND` fails with: ```sql ORA-01722: invalid number ORA-06512: at "SYS.DBMS_LOGREP_UTIL", line 582 ORA-06512: at "SYS.DBMS_LOGREP_UTIL", line 636 ORA-06512: at "SYS.DBMS_XSTREAM_ADM_UTL", line 440 ORA-06512: at "SYS.DBMS_XSTREAM_UTL_IVK", line 2094 ORA-06512: at "SYS.DBMS_XSTREAM_UTL_IVK", line 2302 ORA-06512: at "SYS.DBMS_XSTREAM_ADM", line 44 ORA-06512: at line 8 ``` This error is misleading. The outbound server already exists. **Solution** No action is needed. Use the existing outbound server. ## Problems occur with the XStream outbound server Multiple issues, such as deleted redo logs or corrupted LogMiner state, can be resolved by recreating the XStream outbound server. **Solution** 1. Drop the existing outbound server: ```sql BEGIN DBMS_XSTREAM_ADM.DROP_OUTBOUND('XOUT1'); END; / ``` 2. Create the outbound server again. For more information, see [](#label-create-xstream-outbound-server). --- title: Troubleshooting the Openflow Connector for Salesforce Bulk API source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/troubleshoot.md section: Loading & Unloading Data --- # Troubleshooting the %salesforcebulkapiof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-snowflake) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce) - [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/configure-connector) This topic describes how to troubleshoot the %salesforcebulkapiof%. ## Monitoring To track the amount of data being synced from Salesforce to Snowflake, query the event table. The following example query retrieves relevant logs from the last 30 minutes: ```sql SELECT timestamp, Deployment_ID, Runtime_Key, parsed_log:level as log_level, parsed_log:loggerName as logger, parsed_log:formattedMessage as message, parsed_log FROM ( SELECT timestamp, resource_attributes:"openflow.dataplane.id" as Deployment_ID, resource_attributes:"k8s.namespace.name" as Runtime_Key, TRY_PARSE_JSON(value) as parsed_log FROM OPENFLOW.TELEMETRY.EVENTS WHERE true AND timestamp > dateadd('minutes', -30, sysdate()) AND record_type = 'LOG' AND resource_attributes:"k8s.namespace.name" like 'runtime-%' ORDER BY timestamp DESC ) WHERE true AND logger = 'org.apache.nifi.processors.standard.LogMessage' AND message LIKE '%SALESFORCE_BULK_API%'; ``` ## Troubleshooting Use the following information to troubleshoot issues with the connector. ### Authentication and OAuth errors The connector uses the OAuth 2.0 JWT Bearer Flow to authenticate with Salesforce. Authentication errors typically occur during initial setup and can be diagnosed using the [Verification feature](#salesforce-verify-connection) on the controller service before starting the connector. #### `invalid_grant` error The `invalid_grant` error indicates that Salesforce rejected the OAuth token request. Common causes include: - **Wrong OAuth flow type.** The external client app in Salesforce does not have the **Enable JWT Bearer Flow** checkbox selected. The connector requires this specific flow. Other OAuth flows (such as Authorization Code Flow) are not supported. See [](#salesforce-create-external-client-app). - **Mismatched private key and certificate.** The private key configured in the connector (the **Connected App Key** parameter) does not match the public certificate uploaded to the external client app in Salesforce. - **Wrong Consumer Key.** The **OAuth2 Client ID** parameter does not match the **Consumer Key** of the external client app where the certificate was uploaded. - **Mixed credentials from multiple apps.** If you have created multiple external client apps or experimented with different configurations, the Client ID, certificate, and private key might belong to different apps. All three must come from the same external client app. - **Deprecated Connected App.** Salesforce has deprecated Connected Apps in favor of External Client Apps. If you are using a Connected App, Snowflake recommends creating a new external client app instead. - **Incorrect token endpoint URL.** The **OAuth2 Token Endpoint URL** parameter must point to the correct Salesforce instance. For example: `https://myCompany.my.salesforce.com/services/oauth2/token`. - **Incorrect audience.** The **OAuth2 Audience** parameter must be set to `https://login.salesforce.com` for production environments or `https://test.salesforce.com` for sandboxes and test environments. #### Permission errors If the JWT token is successfully generated but the user lacks permissions, you see a permission or authorization error. This means the JWT Bearer Flow is working, but the Salesforce user (the OAuth2 Subject) is not authorized to use the external client app. To resolve this issue: 1. In Salesforce, go to the **Policies** tab of the external client app. 2. Verify that **Permitted Users** is set to **Admin approved users are pre-authorized**. 3. Verify that the profiles or permission sets assigned in the **App Policies** section include the user specified in the **OAuth2 Subject** parameter of the connector. For more details, see [](#salesforce-approve-client-app). ### Check the connector state You can examine the connector state to ensure that data is being replicated as expected. The connector maintains a state of current and past operations to ensure no Salesforce changes are missed and to retry bulk job queries if failures occur. To view the state: 1. Right-click on the canvas and select **Controller services**. 2. Locate the controller service named **Salesforce Bulk Jobs State**. 3. In the **Salesforce Bulk Jobs State** menu, click **View state**. The state is a set of key/value pairs where the key is the Salesforce Object type. For example, the state for the `Account` object might look like the following example: ```json {"previousLast":"2025-09-30T09:41:23.484406926Z","currentLast":"2025-09-30T09:41:23.484406926Z","status":"COMPLETED"} ``` The `status` can be one of the following: - `IN_PROGRESS` - `COMPLETED` - `FAILED` - `ABORTED` If the status is `IN_PROGRESS`, a FlowFile is still being processed for that object type. Do not delete flow files manually. This can cause a job to remain in the `IN_PROGRESS` status indefinitely because the state cannot be manually updated. If this occurs, you must perform a full reload for that object type. ### Force a full load for a given object type To force the connector to perform a full refresh for one or more object types: 1. Stop all processors in the flow. 2. Ensure that no in-flight FlowFiles are being processed. 3. Right-click on the canvas and select **Disable all controller services**. 4. Go to **Controller services** and open the state of the controller service named **Salesforce Bulk Jobs State**. 5. Perform one of the following actions: - Select **Clear state** to clear the entire state. This forces a full load for **all** configured Object types fetched by the connector. - Select the trash icon next to a specific Object type to clear the state for a specific object type only. This forces a full load of that specific object type during the next execution of the connector. 6. In the canvas, right-click, select **Enable all controller services**, and then start all processors. ### If an object type remains in status IN_PROGRESS If the state for a given object type is stuck in `IN_PROGRESS` and there are no in-flight FlowFiles for that object type, a FlowFile may have been manually deleted before it could update the status. In this case, you must perform a full load for that object type to ensure the connector captures all events. If the state is stuck in `IN_PROGRESS` but no FlowFiles were manually deleted, contact [Snowflake Support](https://docs.snowflake.com/user-guide/contacting-support). --- title: UDPEventRecordSink source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/udpeventrecordsink.md section: Loading & Unloading Data --- # UDPEventRecordSink This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Format and send Records as UDP Datagram Packets to a configurable destination ## Tags UDP, event, record, sink ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: UnpackContent 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/unpackcontent.md section: Loading & Unloading Data --- # UnpackContent 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Unpacks the content of FlowFiles that have been packaged with one of several different Packaging Formats, emitting one to many FlowFiles for each input FlowFile. Supported formats are TAR, ZIP, and FlowFile Stream packages. ## Tags Unpack, archive, flowfile-stream, flowfile-stream-v3, tar, un-merge, zip ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## Use cases | Unpack Zip containing filenames with special characters, created on Windows with filename charset 'Cp437' or 'IBM437'. | | ---------------------------------------------------------------------------------------------------------------------- | ## See also - [org.apache.nifi.processors.standard.MergeContent](/user-guide/data-integration/openflow/processors/mergecontent) --- title: UpdateAttribute 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updateattribute.md section: Loading & Unloading Data --- # UpdateAttribute 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-update-attribute-nar ## Description Updates the Attributes for a FlowFile by using the Attribute Expression Language and/or deletes the attributes based on a regular expression ## Tags Attribute Expression Language, attributes, delete, modification, state, update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## State management
## Relationships
## Writes attributes
## Use cases | Add a new FlowFile attribute | | ----------------------------------------------- | | Overwrite a FlowFile attribute with a new value | | Rename a file | --- title: UpdateBoxFileMetadataInstance 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updateboxfilemetadatainstance.md section: Loading & Unloading Data --- # UpdateBoxFileMetadataInstance 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-box-nar ## Description Updates metadata template values for a Box file using the record in the given flowFile. This record represents the desired end state of the template after the update. The processor will calculate the necessary changes (add/replace/remove) to transform the current metadata to the desired state. The input record should be a flat key-value object. ## Tags box, metadata, storage, templates, update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile) - [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile) - [org.apache.nifi.processors.box.ListBoxFileMetadataTemplates](/user-guide/data-integration/openflow/processors/listboxfilemetadatatemplates) --- title: UpdateBulkJobState 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updatebulkjobstate.md section: Loading & Unloading Data --- # UpdateBulkJobState 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-salesforce-processors-nar ## Description Updates the status of a Salesforce Bulk Job in the shared state service for a specific object type ## Tags bulk, preview, salesforce, state ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: UpdateByQueryElasticsearch 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updatebyqueryelasticsearch.md section: Loading & Unloading Data --- # UpdateByQueryElasticsearch 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-elasticsearch-restapi-nar ## Description Update documents in an Elasticsearch index using a query. The query can be loaded from a flowfile body or from the Query parameter. The loaded Query can contain any JSON accepted by Elasticsearch's _update_by_query API, for example a "query" object to identify what documents are to be updated, plus a "script" to define the updates to perform. ## Tags elastic, elasticsearch, elasticsearch7, elasticsearch8, elasticsearch9, query, update ## Input Requirement ALLOWED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: UpdateCounter 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updatecounter.md section: Loading & Unloading Data --- # UpdateCounter 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description This processor allows users to set specific counters and key points in their flow. It is useful for debugging and basic counting functions. ## Tags counter, debug, instrumentation ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: UpdateDatabaseTable 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updatedatabasetable.md section: Loading & Unloading Data --- # UpdateDatabaseTable 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description This processor uses a JDBC connection and incoming records to generate any database table changes needed to support the incoming records. It expects a 'flat' record layout, meaning none of the top-level record fields has nested fields that are intended to become columns themselves. ## Tags alter, database, jdbc, metadata, table, update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: UpdateRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updaterecord.md section: Loading & Unloading Data --- # UpdateRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Updates the contents of a FlowFile that contains Record-oriented data (i.e., data that can be read via a RecordReader and written by a RecordWriter). This Processor requires that at least one user-defined Property be added. The name of the Property should indicate a RecordPath that determines the field that should be updated. The value of the Property is either a replacement value (optionally making use of the Expression Language) or is itself a RecordPath that extracts a value from the Record. Whether the Property value is determined to be a RecordPath or a literal value depends on the configuration of the <Replacement Value Strategy> Property. ## Tags avro, csv, freeform, generic, json, log, logs, record, schema, text, update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## Use cases | Combine multiple fields into a single field. | | -------------------------------------------------------------------- | | Change the value of a record field to an explicit value. | | Copy the value of one record field to another record field. | | Enrich data by injecting the value of an attribute into each Record. | | Change the format of a record field's value. | ## See also - [org.apache.nifi.processors.standard.ConvertRecord](/user-guide/data-integration/openflow/processors/convertrecord) --- title: UpdateSnowflakeDatabase 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updatesnowflakedatabase.md section: Loading & Unloading Data --- # UpdateSnowflakeDatabase 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowflake-processors-nar ## Description Updates the definition of a Snowflake table based on the schema provided in the incoming FlowFile. The schema is expected to be in JSON with the following format, regardless of whether it is provided via FlowFile content or specified as a property: \{ "columns": [ \{ "name": "<column name>", "type": "<column type>", "nullable": <true/false>, "precision": <precision, only for numeric type>, "scale": <scale, only for numeric type> \}, ... ], "primaryKeys": ["<name of first primary key column>", "<name of second primary key column>", ...] \} ## Tags ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: UpdateSnowflakeIcebergDatabase 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updatesnowflakeicebergdatabase.md section: Loading & Unloading Data --- # UpdateSnowflakeIcebergDatabase 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowflake-processors-nar ## Description Updates the definition of a Snowflake Iceberg table. A target schema can be inferred from a RecordReader or defined explicitly using the format below: \{ "columns": [ \{ "name": "<column name>", "type": "<iceberg data type>" \}, ... ] \} where <iceberg data type> can be one of: - primitive iceberg type ("string", "int", "boolean",...) - decimal with given precision and scale ("decimal(P,S)") - \{"type": "list", "element": <iceberg data type>\} - \{"type": "map", "key": <iceberg data type>, "value": <iceberg data type>\} - \{"type": "struct", "fields":[<list of struct fields>] \} ## Tags iceberg ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: UpdateSnowflakeSchema 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updatesnowflakeschema.md section: Loading & Unloading Data --- # UpdateSnowflakeSchema 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowflake-processors-nar ## Description Creates Snowflake database schema if it does not exist. ## Tags create, ddl, preview, schema, snowflake ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: UpdateSnowflakeStream 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updatesnowflakestream.md section: Loading & Unloading Data --- # UpdateSnowflakeStream 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowflake-processors-nar ## Description Manages Snowflake streams by creating, dropping, or replacing them based on the configured operation. Streams in Snowflake capture data change for tables and can be used to track DML changes over time. ## Tags cdc, create, drop, preview, replace, snowflake, stream, table ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: UpdateSnowflakeTable 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updatesnowflaketable.md section: Loading & Unloading Data --- # UpdateSnowflakeTable 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowflake-processors-nar ## Description Updates the definition of a Snowflake table based on the schema provided in the incoming FlowFile. The schema is expected to be in JSON with the following format, regardless of whether it is provided via FlowFile content or specified as a property: \{ "columns": [ \{ "name": "<column name>", "type": "<column type>", "nullable": <true/false>, "precision": <only for numeric type>, "scale": <only for numeric type> \}, ... ], "primaryKeys": ["<name of first primary key column>", "<name of second primary key column>", ...] \} This processor supports table-only operations: creating, altering, and dropping tables. ## Tags alter, columns, create, ddl, drop, preview, snowflake, table, update ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: UpdateSnowflakeView 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updatesnowflakeview.md section: Loading & Unloading Data --- # UpdateSnowflakeView 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-snowflake-processors-nar ## Description Creates or replaces Snowflake views based on column mappings provided in the incoming FlowFile. The processor checks if the view exists and only recreates it if the definition has changed. The FlowFile content should contain JSON with column mappings, optional join configuration, and optional flatten configuration: \{ "columns": [ \{ "source_field": "customer_data:id", "destination_column": "customer_id", "type": "VARCHAR" \}, \{ "source_field": "f.value:order_amount", "destination_column": "order_amount", "type": "NUMBER" \}, \{ "expression": "SUM(f.value:order_amount::NUMBER)", "destination_column": "total_amount" \}, \{ "expression": "COUNT(*)", "destination_column": "order_count" \} ], "from": \{ "table": "raw_data", "alias": "rd", "joins": [ \{ "type": "INNER", "table": "customers", "alias": "c", "on": "customer_data:id::VARCHAR = c.customer_id" \} ] \}, "flatten": [ \{ "input": "rd.orders", "alias": "f", "path": null \} ], "where": "active = true AND status ='VALID'", "group_by": ["customer_id", "region"], "order_by": ["order_amount DESC", "customer_id ASC"] \} Column configuration supports: - source_field: Simple field/column reference (supports JSON notation like "data:field" or table aliases like "t.column") - expression: Complex SQL expression (e.g., "SUM(amount)", "COUNT(*)") - destination_column: The output column name in the view (optional - auto-generated if not provided) - type: Snowflake data type for automatic type casting (VARCHAR, NUMBER, BOOLEAN, DATE, TIMESTAMP, etc.) Use either source_field OR expression, not both. When type is specified, automatic type casting is applied. When type is omitted, the expression is used as-is without casting. Flatten configuration supports: - input: The nested field/column to flatten (required) - alias: Alias for the flattened data (required) - path: Optional path within the nested structure The "from" section is required and specifies the source table and optional joins. Optional SQL clauses can be included: - where: WHERE clause condition (e.g., "active = true AND status ='VALID'") - group_by: GROUP BY clause as an array of column names (e.g., ["customer_id", "region"]) - order_by: ORDER BY clause as an array of column/expression with direction (e.g., ["order_amount DESC", "customer_id ASC"]) ## Tags flatten, view ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: UpdateTableState 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/updatetablestate.md section: Loading & Unloading Data --- # UpdateTableState 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-database-cdc-processors-nar ## Description Updates the state of a table in the Table State Service ## Tags snowflake, state, table ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: UpsertMilvus 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/upsertmilvus.md section: Loading & Unloading Data --- # UpsertMilvus 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-milvus-processors-nar ## Description Upserts vectors into Milvus database for a given collection ## Tags chatbot, embeddings, gen ai, genai, generative ai, insert, llm, metadata, milvus, openflow, publish, text, upsert, vector ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## See also - [com.snowflake.openflow.runtime.processors.milvus.DeleteMilvus](/user-guide/data-integration/openflow/processors/deletemilvus) --- title: UpsertPinecone 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/upsertpinecone.md section: Loading & Unloading Data --- # UpsertPinecone 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-pinecone-nar ## Description Publishes vectors, including metadata, and optionally text, to a Pinecone index. ## Tags chatbot, embeddings, gen ai, genai, generative ai, llm, metadata, openflow, pinecone, publish, text, upsert, vector ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Use Cases Involving Other Components | Create embeddings for raw text data, or text that exists in a Record field such as JSON, using OpenAI's embeddings model and publish the vectors to Pinecone. | | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Add embeddings for a document to a Pinecone index, replacing any embeddings that already exist for the document. | ## See also - [com.snowflake.openflow.runtime.processors.openai.CreateOpenAiEmbeddings](/user-guide/data-integration/openflow/processors/createopenaiembeddings) - [com.snowflake.openflow.runtime.processors.pinecone.DeletePinecone](/user-guide/data-integration/openflow/processors/deletepinecone) --- title: UpsertSFDCObjects 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/upsertsfdcobjects.md section: Loading & Unloading Data --- # UpsertSFDCObjects 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-salesforce-processors-nar ## Description Upserts the records from the incoming FlowFile into Salesforce ## Tags insert, objects, preview, salesforce, sfdc, update, upsert ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [com.snowflake.openflow.runtime.processors.salesforce.DeleteQueryJob](/user-guide/data-integration/openflow/processors/deletequeryjob) - [com.snowflake.openflow.runtime.processors.salesforce.DescribeSFDCObject](/user-guide/data-integration/openflow/processors/describesfdcobject) - [com.snowflake.openflow.runtime.processors.salesforce.GetQueryJobResult](/user-guide/data-integration/openflow/processors/getqueryjobresult) - [com.snowflake.openflow.runtime.processors.salesforce.SubmitQueryJob](/user-guide/data-integration/openflow/processors/submitqueryjob) --- title: Use the Openflow Connector for Google BigQuery source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/google-big-query/use.md section: Loading & Unloading Data --- # Use the %bigqueryof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/google-big-query/about) - [](/user-guide/data-integration/openflow/connectors/google-big-query/setup) This topic describes tasks you may need to perform after installing and configuring the connector. ## Remove and re-add a table for replication To remove a table from replication: 1. Verify the table's state in the Table State Store. 2. If the state is `INCREMENTAL_IN_PROGRESS`, stop the **Trigger BigQuery Cdc On Incremental** processor. Wait for the state to change to `INCREMENTAL_REPLICATION`. 3. Remove the table from the **Included Table Names** or **Included Table Names Regex** parameters in the BigQuery Ingestion Parameters context. To re-add a table for replication: 1. Drop the destination table in Snowflake. 2. Add the table back to the **Included Table Names** or **Included Table Names Regex** parameters. This approach can also be used to recover from a failed table replication scenario. --- title: Use the Openflow Connector for Veeva Vault source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/veeva-vault/use.md section: Loading & Unloading Data --- # Use the %veevavaultof% This feature is not available in the People's Republic of China. Snowflake connectors are supported in every region where Snowflake Openflow is available. [Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)). [Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions. This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/). - [](/user-guide/data-integration/openflow/connectors/veeva-vault/about) - [](/user-guide/data-integration/openflow/connectors/veeva-vault/setup) This topic describes tasks you may need to perform after installing and configuring the %veevavaultof%. ## Force a full reload The connector maintains an internal state to track which Direct Data files have already been processed. In some situations, you may need to force the connector to perform a fresh full snapshot, for example, after resolving a data issue or after the connector has been stopped for an extended period. To force a full reload: 1. Stop all processors in the flow by right-clicking on the connector process group and selecting **Stop**. 2. Ensure that no in-flight FlowFiles are being processed. You can verify this by checking that all queues in the flow are empty. 3. Right-click on the canvas and select **Disable all controller services**. 4. Go to **Controller services** and locate the controller service named **Veeva Vault Client Service**. 5. In the processor **List Veeva Vault Files**, select **View state** and then select **Clear state**. 6. Right-click on the canvas and select **Enable all controller services**, then start all processors to resume the connector. The connector treats a cleared state as a fresh start and processes the latest full Direct Data file on the next execution. ## Monitoring To track the amount of data being synced from Veeva Vault to Snowflake, query the event table. The following example query retrieves relevant logs from the last 30 minutes: ```sql SELECT timestamp, Deployment_ID, Runtime_Key, parsed_log:level as log_level, parsed_log:loggerName as logger, parsed_log:formattedMessage as message, parsed_log FROM ( SELECT timestamp, resource_attributes:"openflow.dataplane.id" as Deployment_ID, resource_attributes:"k8s.namespace.name" as Runtime_Key, TRY_PARSE_JSON(value) as parsed_log FROM OPENFLOW.TELEMETRY.EVENTS WHERE true AND timestamp > dateadd('minutes', -30, sysdate()) AND record_type = 'LOG' AND resource_attributes:"k8s.namespace.name" like 'runtime-%' ORDER BY timestamp DESC ) WHERE true AND logger LIKE '%veeva%'; ``` For more information about monitoring Openflow flows, see [](/user-guide/data-integration/openflow/monitor). ## Troubleshooting Use the following information to troubleshoot common issues with the connector. ### Direct Data is not enabled If the connector fails with an error indicating that Direct Data files can't be retrieved, verify that Direct Data is enabled on your Vault. Contact your Veeva Vault administrator to enable this feature in the Vault admin settings. ### Authentication failures If the connector reports authentication failures: - Verify that the **Veeva Vault Username** and **Veeva Vault Password** parameters are correct. - Verify that the service account has API access enabled and has the required permissions. - Check whether the account is locked or disabled in Veeva Vault. You can verify the credentials by right-clicking on the canvas, selecting **Controller services**, locating the **Veeva Vault Client Service**, and selecting **Verify Properties**. This triggers a connectivity and authentication check against the configured Vault. ### Session expiration during long operations The connector automatically handles session expiration by re-authenticating when it receives an `INVALID_SESSION_ID` error from Vault API. No manual intervention is required. If you see repeated session expiration messages in the event table, verify that no other process is using the same service account credentials, as concurrent sessions may cause the previous session to be invalidated. ### No new data appearing in Snowflake If the connector is running but no new data appears in Snowflake: - Check that new Direct Data files are being published by Veeva Vault. The connector can only process archives that exist. - Verify that the **Veeva Vault Ingestion Mode** is set correctly. If set to `INCREMENTAL` and no incremental archives exist after the configured start time, no data is processed. - Check the event table for error messages that may indicate failures in the download, unpacking, or loading stages. ### Snowflake permission errors If the connector reports permission errors when creating tables or loading data: - Verify that the Snowflake role configured in the connector has `USAGE` on the destination database and schema, `CREATE TABLE` on the schema, and `USAGE` and `OPERATE` on the warehouse. - For %ofbyoc-plural%, verify that the service user has the correct role assigned and that the RSA public key is configured correctly. --- title: Validate your BYOC deployment source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/byoc-validate-vpc-config.md section: Loading & Unloading Data --- # Validate your BYOC deployment This feature is not available in the People's Republic of China. Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/setup-openflow-byoc) This topic describes how to use BYOC Pre-flight Validation to verify your AWS network configuration for Openflow deployments. ## About BYOC Pre-flight Validation BYOC Pre-flight Validation is a script that verifies your AWS network environment is ready for an Openflow deployment. It checks that required networking, connectivity, and access settings are in place. Use this tool to identify and resolve network or access misconfigurations before deployment. This helps prevent failures and ensures a smoother rollout by providing specific feedback and actionable guidance for any issues found. There are two versions of this script:
`byoc-validator.sh`:
Verifies that your AWS environment is ready for a new Openflow deployment.
`byo-vpc-validator.sh`:
Verifies that your existing VPC is configured correctly for Openflow.
## What does BYOC Pre-flight Validation review? BYOC Pre-flight Validation performs a pre-deployment review that verifies your existing AWS setup, identifies issues, and explains what needs to be corrected. BYOC Pre-flight Validation checks the following: - Prerequisites (applies only to existing VPCs) - VPC components such as subnets, gateways, and routing - Network sizing and placement across availability zones - Required resource tags - Network Connectivity - Access to Openflow services and endpoints - Image registry access for required containers - Connectivity to core AWS services - Permissions - Security group rules - Required IAM permissions - Encryption key access when needed ## When to use BYOC Pre-flight Validation? Use BYOC Pre-flight Validation: - Before your initial Openflow deployment - After AWS networking changes that might impact connectivity - During troubleshooting to confirm your setup - When migrating Openflow to a new VPC or AWS account ## Download the CloudFormation template for BYOC Pre-flight Validation Follow these steps to set up BYOC Pre-flight Validation in your AWS environment: 1. Create a new BYOC deployment in the Openflow Control Plane. 2. Download the CloudFormation template for BYOC Pre-flight Validation. To download the CloudFormation template for BYOC Pre-flight Validation, click **Download Validator** in the confirmation dialog that appears after creating the deployment. 3. Apply the BYOC Pre-flight Validation CloudFormation template in AWS. 4. Access the EC2 instance where BYOC Pre-flight Validation is installed. ## Configure the CloudFormation template for BYOC Pre-flight Validation The CloudFormation template for the BYOC validator includes defaults for all parameters, and those defaults should not be changed. The CloudFormation template for the BYO-VPC validator includes defaults for most parameters, and those defaults should not be changed. However, the following parameters do not have defaults and must be provided, using the inputs you plan to use for the actual deployment:
`InfraVPC`
Select an existing VPC.
`PrivateSubnet1`
The first private subnet for Openflow runtimes.
`PrivateSubnet2`
The second private subnet for the EKS control plane.
`PrivateSecurityGroup`
Security group for the agent instance, EC2 Instance Connect endpoint, and EKS cluster.
`EBSKMSKeyArn`
Optional KMS key ARN for encrypted EBS volumes.
## Run BYOC Pre-flight Validation and view results Follow these steps to run BYOC Pre-flight Validation: 1. Connect to the EC2 instance where BYOC Pre-flight Validation is installed. 2. Run the BYOC Pre-flight Validation script from the home directory: ```bash /home/ec2-user/byoc-validator.sh ``` You can run BYOC Pre-flight Validation as many times as needed. 3. Review the output file in the `home` directory: Each run produces a new, timestamped results file, for example: `/home/ec2-user/byoc-validation-results-YYYYMMDDHHMMSS.txt` 4. Open and inspect the results: Use a tool of your preference to read the output and review pass/fail messages. Follow these steps to run BYOC Pre-flight Validation for an existing VPC: 1. Connect to the EC2 instance where BYOC Pre-flight Validation is installed. 2. Run the BYOC Pre-flight Validation script in the home directory: ```bash /home/ec2-user/byo-vpc-validator.sh ``` You can run BYOC Pre-flight Validation as many times as needed. 3. Review the output file in the `home` directory: Each run produces a new, timestamped results file, for example: `/home/ec2-user/byo-vpc-validation-results-YYYYMMDDHHMMSS.txt` 4. Open and inspect the results: Use a tool of your preference to read the output and review pass/fail messages. ## Example output The following example shows a successful validation output: ```text 2026-01-15 11:43:37,599 - INFO - Starting BYO-VPC validation suite... 2026-01-15 11:43:37,599 - INFO - ============================================================ ... 2026-01-15 11:43:37,599 - INFO - Starting Prerequisites validation... 2026-01-15 11:43:37,704 - INFO - Running validation rule: internet_gateway 2026-01-15 11:43:38,538 - INFO - ✅ internet_gateway: Internet Gateway validation passed ... 2026-01-15 11:43:39,769 - INFO - Prerequisites Summary: 4/4 rules passed 2026-01-15 11:43:39,769 - INFO - -------------------------------------------------- 2026-01-15 11:43:39,769 - INFO - Starting Network validation... 2026-01-15 11:43:39,780 - INFO - Running validation rule: snowflake_authentication 2026-01-15 11:43:41,130 - INFO - ✅ snowflake_authentication: Snowflake OAuth authentication successful ... 2026-01-15 11:43:55,920 - INFO - Network Summary: 7/7 rules passed 2026-01-15 11:43:55,920 - INFO - -------------------------------------------------- 2026-01-15 11:43:55,920 - INFO - Starting Permissions validation... 2026-01-15 11:43:55,946 - INFO - Running validation rule: private_security_group 2026-01-15 11:43:56,766 - INFO - ✅ private_security_group: Private security group validation passed ... 2026-01-15 11:43:57,560 - INFO - Permissions Summary: 2/2 rules passed 2026-01-15 11:43:57,560 - INFO - ============================================================ 2026-01-15 11:43:57,560 - INFO - 🎉 Openflow compatibility checker completed successfully! ``` The output highlights each check with a status icon: - ✅ - The requirement is met. - ❌ - The requirement is not met, and action is needed. ## AWS permissions required The CloudFormation template creates an IAM role with the necessary permissions for the EC2 instance where BYOC Pre-flight Validation is installed. If your organization uses custom IAM controls, ensure the instance role includes the following permissions: - Required to access the Snowflake OAuth secret created by the template: - `secretsmanager:GetSecretValue` - Required to inspect network resources: - `ec2:DescribeInternetGateways` - `ec2:DescribeSubnets` - `ec2:DescribeRouteTables` - `ec2:DescribeNATGateways` - `ec2:DescribeSecurityGroups` - Required only when validating an optional EBS KMS key: - `kms:DescribeKey` - `kms:GetKeyPolicy` The Secrets Manager permission is scoped to the BYOC Pre-flight Validation secret created by the template. The EC2 and KMS actions can be scoped to `*` (read-only metadata). ## Cleanup After validation is complete, you can delete BYOC Pre-flight Validation to avoid ongoing AWS costs. To delete BYOC Pre-flight Validation, delete the CloudFormation stack used to create it. This automatically removes the EC2 instance, the IAM role, and the Secrets Manager secret. --- title: ValidateCsv 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/validatecsv.md section: Loading & Unloading Data --- # ValidateCsv 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Validates the contents of FlowFiles or a FlowFile attribute value against a user-specified CSV schema. Take a look at the additional documentation of this processor for some schema examples. ## Tags csv, schema, validation ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ValidateJson 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/validatejson.md section: Loading & Unloading Data --- # ValidateJson 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Validates the contents of FlowFiles against a configurable JSON Schema. See json-schema.org for specification standards. This Processor does not support input containing multiple JSON objects, such as newline-delimited JSON. If the input FlowFile contains newline-delimited JSON, only the first line will be validated. ## Tags JSON, schema, validation ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Restrictions
## Relationships
## Writes attributes
--- title: ValidateRecord 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/validaterecord.md section: Loading & Unloading Data --- # ValidateRecord 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Validates the Records of an incoming FlowFile against a given schema. All records that adhere to the schema are routed to the "valid" relationship while records that do not adhere to the schema are routed to the "invalid" relationship. It is therefore possible for a single incoming FlowFile to be split into two individual FlowFiles if some records are valid according to the schema and others are not. Any FlowFile that is routed to the "invalid" relationship will emit a ROUTE Provenance Event with the Details field populated to explain why records were invalid. In addition, to gain further explanation of why records were invalid, DEBUG-level logging can be enabled for the "org.apache.nifi.processors.standard. ValidateRecord" logger. ## Tags record, schema, validate ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: ValidateXml 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/validatexml.md section: Loading & Unloading Data --- # ValidateXml 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Validates XML contained in a FlowFile. By default, the XML is contained in the FlowFile content. If the 'XML Source Attribute' property is set, the XML to be validated is contained in the specified attribute. It is not recommended to use attributes to hold large XML documents; doing so could adversely affect system performance. Full schema validation is performed if the processor is configured with the XSD schema details. Otherwise, the only validation performed is to ensure the XML syntax is correct and well-formed, e.g. all opening tags are properly closed. ## Tags schema, validation, xml, xsd ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Restrictions
## Relationships
## Writes attributes
--- title: VerifyContentMAC 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/verifycontentmac.md section: Loading & Unloading Data --- # VerifyContentMAC 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-cipher-nar ## Description Calculates a Message Authentication Code using the provided Secret Key and compares it with the provided MAC property ## Tags Authentication, HMAC, MAC, Signing ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
--- title: VerifyContentPGP 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/verifycontentpgp.md section: Loading & Unloading Data --- # VerifyContentPGP 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-pgp-nar ## Description Verify signatures using OpenPGP Public Keys ## Tags Encryption, GPG, OpenPGP, PGP, RFC 4880, Signing ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.pgp.DecryptContentPGP](/user-guide/data-integration/openflow/processors/decryptcontentpgp) - [org.apache.nifi.processors.pgp.EncryptContentPGP](/user-guide/data-integration/openflow/processors/encryptcontentpgp) - [org.apache.nifi.processors.pgp.SignContentPGP](/user-guide/data-integration/openflow/processors/signcontentpgp) --- title: Version control for custom flows source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/version-control-custom-flows.md section: Loading & Unloading Data --- # Version control for custom flows This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions). - [](/user-guide/data-integration/openflow/manage) - [](/user-guide/data-integration/openflow/monitor) Openflow supports Registry Clients, including the GitHub Registry Client, which allows you to use a Git repository to store and version your custom flow definitions. This enables standard software development lifecycle (SDLC) practices, such as branching, pull requests, code review, and environment promotion. A common workflow is: - Maintain a `main` branch representing your production flow definitions. - Create feature branches for new development. - Develop and commit changes on the Openflow canvas. - Open pull requests, review with Flow Diff, and merge. ## Prerequisites - A GitHub repository for storing flow definitions. - A GitHub Personal Access Token with `repository` access. - An Openflow Runtime with access to the Openflow canvas. - Appropriate Snowflake role privileges on the Runtime Integration object. ## Step 1: Create a GitHub Registry Client 1. Create a repository in GitHub to store your flow definitions. 2. Generate a Personal Access Token (PAT) in GitHub with repository access permissions. 3. On the Openflow canvas, navigate to **Controller Settings** and create a new Registry Client. 4. Select **GitHub Registry Client** as the type. 5. Configure the Registry Client with: - Your GitHub repository URL. - The GitHub repository owner. - Your Personal Access Token for authentication. ## Step 2: Create and version a new flow 1. On the Openflow canvas, create a new Process Group for your flow. 2. Build your flow: add processors, configure connections, and set up your data pipeline. 3. Right-click the Process Group and select **Start Version Control**. 4. Choose the GitHub Registry Client you configured in [Step 1](#label-openflow-git-create-registry-client). 5. Provide a flow name and an initial commit message. After you save, the flow definition is committed to your GitHub repository. You can verify by checking the repository in GitHub. ## Step 3: Use branches to manage changes ### Create a development branch In your GitHub repository, create a new branch (for example, `dev` or a feature branch like `feature/add-new-table`). ### Import and develop on the branch 1. On the Openflow canvas, import the flow from the GitHub Registry into a new Process Group by dragging the **Import from Registry** icon from the toolbar to the canvas. 2. When importing, select the target branch (for example, `dev`) to work against. 3. Make your changes to the flow inside the Process Group. 4. Commit your changes in Openflow. This pushes the updated flow definition to the selected branch in GitHub. ### Review and merge via pull request 1. In GitHub, open a pull request from your development branch to `main`. 2. Review the changes. Use the Snowflake Flow Diff GitHub Action (see [Step 4](#label-openflow-git-flow-diff)) for human-readable diffs. 3. Merge the pull request after it's approved. 4. Back on the Openflow canvas, update the `main` Process Group to pull the latest version from the `main` branch. ## Step 4: Set up Snowflake Flow Diff (GitHub Action) Snowflake Flow Diff is a GitHub Action that makes flow changes human-readable by rendering a visual diff of your pipeline changes directly in pull request conversations. ### Set up the workflow file 1. In your GitHub repository, create the file `.github/workflows/flowdiff.yml`. 2. Copy the workflow configuration from the [Snowflake Flow Diff repository](https://github.com/Snowflake-Labs/snowflake-flow-diff) (see the Usage section in the README). 3. Commit and push the workflow file. ### Review flow changes 1. When a pull request is opened, the Flow Diff action runs automatically. 2. Navigate to the **Conversations** tab on the pull request and wait for the Flow Diff analysis to appear. 3. The analysis shows a visual, human-readable comparison of flow changes instead of raw JSON diffs. ## Manage parameters across environments Openflow uses Parameters to manage environment-specific values (for example, connection strings, credentials, table names) across different Runtimes. Keep the following concepts in mind: - Parameters are grouped into a Parameter Context, which has a one-to-one mapping with a Process Group. - Parameter Context inheritance allows you to define shared parameters in a parent context and override specific values in child contexts. This is useful for promoting flows across dev, staging, and production environments. - Parameter Contexts can integrate with Secrets Managers to securely handle sensitive credentials without storing them in the flow definition. ## Recommended SDLC workflow 1. **Development environment**: Developers create feature branches, build or modify flows, and commit changes on the Openflow canvas against their feature branch. 2. **Code review**: Open a pull request in GitHub. Use Snowflake Flow Diff for readable reviews. 3. **Merge to main**: After approval, merge the pull request into the `main` branch. 4. **Promote to production**: In your production Runtime, update the Process Group to pull the latest version from `main`. 5. **Parameterize**: Use Parameter Contexts to handle environment-specific configuration without modifying the flow definition itself. --- title: VolatileSchemaCache source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/volatileschemacache.md section: Loading & Unloading Data --- # VolatileSchemaCache This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Provides a Schema Cache that evicts elements based on a Least-Recently-Used algorithm. This cache is not persisted, so any restart of NiFi will result in the cache being cleared. Additionally, the cache will be cleared any time that the Controller Service is stopped and restarted. ## Tags cache, record, schema ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: Wait 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/wait.md section: Loading & Unloading Data --- # Wait 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle org.apache.nifi | nifi-standard-nar ## Description Routes incoming FlowFiles to the 'wait' relationship until a matching release signal is stored in the distributed cache from a corresponding Notify processor. When a matching release signal is identified, a waiting FlowFile is routed to the 'success' relationship. The release signal entry is then removed from the cache. The attributes of the FlowFile that produced the release signal are copied to the waiting FlowFile if the Attribute Cache Regex property of the corresponding Notify processor is set properly. If there are multiple release signals in the cache identified by the Release Signal Identifier, and the Notify processor is configured to copy the FlowFile attributes to the cache, then the FlowFile passing the Wait processor receives the union of the attributes of the FlowFiles that produced the release signals in the cache (identified by Release Signal Identifier). Waiting FlowFiles will be routed to 'expired' if they exceed the Expiration Duration. If you need to wait for more than one signal, specify the desired number of signals via the 'Target Signal Count' property. This is particularly useful with processors that split a source FlowFile into multiple fragments, such as SplitText. In order to wait for all fragments to be processed, connect the 'original' relationship to a Wait processor, and the 'splits' relationship to a corresponding Notify processor. Configure the Notify and Wait processors to use the '$\{fragment.identifier\}' as the value of 'Release Signal Identifier', and specify '$\{fragment.count\}' as the value of 'Target Signal Count' in the Wait processor. It is recommended to use a prioritizer (for instance First In First Out) when using the 'wait' relationship as a loop. ## Tags cache, distributed, hold, map, release, signal, wait ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
## Writes attributes
## See also - [org.apache.nifi.processors.standard.Notify](/user-guide/data-integration/openflow/processors/notify) --- title: WaitForTableState 2025.10.9.21 source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/waitfortablestate.md section: Loading & Unloading Data --- # WaitForTableState 2025.10.9.21 This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/processors/index)
## Bundle com.snowflake.openflow.runtime | runtime-database-cdc-processors-nar ## Description Blocks incoming FlowFiles until the corresponding table state is not equal to accepted state. Blocked FlowFiles stay in the upstream queue. When table is in terminated state or table is removed from the state then all FlowFiles are routed to the 'failure' relationship. ## Tags cdc, event, jdbc, mysql, postgresql, sql ## Input Requirement REQUIRED ## Supports Sensitive Dynamic Properties false ## Properties
## Relationships
--- title: WindowsEventLogReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/windowseventlogreader.md section: Loading & Unloading Data --- # WindowsEventLogReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Reads Windows Event Log data as XML content having been generated by ConsumeWindowsEventLog, ParseEvtx, etc. (see Additional Details) and creates Record object(s). If the root tag of the input XML is 'Events', the child content is expected to be a series of 'Event' tags, each of which will constitute a single record. If the root tag is 'Event', the content is expected to be a single 'Event' and thus a single record. No other root tags are valid. Only events of type 'System' are currently supported. ## Tags event, log, parser, reader, record, windows, xml ## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: XMLFileLookupService source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/xmlfilelookupservice.md section: Loading & Unloading Data --- # XMLFileLookupService This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description A reloadable XML file-based lookup service. This service uses Apache Commons Configuration. Example XML configuration file and how to access specific configuration can be found at [http://commons.apache.org/proper/commons-configuration/userguide/howto_hierarchical.html](http://commons.apache.org/proper/commons-configuration/userguide/howto_hierarchical.html). External entity processing is disabled. ## Tags cache, enrich, join, key, lookup, reloadable, value, xml ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted ## Restrictions
## System Resource Considerations This component does not specify system resource considerations. --- title: XMLReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/xmlreader.md section: Loading & Unloading Data --- # XMLReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Reads XML content and creates Record objects. Records are expected in the second level of XML data, embedded in an enclosing root tag. ## Tags parser, reader, record, xml ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: XMLRecordSetWriter source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/xmlrecordsetwriter.md section: Loading & Unloading Data --- # XMLRecordSetWriter This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Writes a RecordSet to XML. The records are wrapped by a root tag. ## Tags record, recordset, resultset, row, serialize, writer, xml ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. --- title: YamlTreeReader source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/yamltreereader.md section: Loading & Unloading Data --- # YamlTreeReader This feature is not available in the People's Republic of China. This feature is not available in the People's Republic of China. Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions). Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics** - [](/user-guide/data-integration/openflow/about) - [](/user-guide/data-integration/openflow/controllers/index)
## Description Parses YAML into individual Record objects. While the reader expects each record to be well-formed YAML, the content of a FlowFile may consist of many records, each as a well-formed YAML array or YAML object. If an array is encountered, each element in that array will be treated as a separate record. If the schema that is configured contains a field that is not present in the YAML, a null value will be used. If the YAML contains a field that is not present in the schema, that field will be skipped. Please note this controller service does not support resolving the use of YAML aliases. Any alias present will be treated as a string. See the Usage of the Controller Service for more information and examples. ## Tags parser, reader, record, tree, yaml ## Properties In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
## State management This component does not store state. ## Restricted This component is not restricted. ## System Resource Considerations This component does not specify system resource considerations. ## Snowflake Cortex (AI & ML) LLM functions, vector search, document AI, Cortex Analyst, and AI-powered features. --- title: Agent skills source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-skills.md section: Snowflake Cortex (AI & ML) --- # Agent skills This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/cortex-agents) - [](/user-guide/snowflake-cortex/cortex-agents-rest-api) - [](/sql-reference/sql/create-stage) A skill is a modular, portable package of instructions, scripts, and context that gives agents the capability to perform specific, repeatable tasks. You can reference skills stored in a named stage or a Git repository and Cortex Agents discover them automatically for use in orchestration. ## How skills work When an agent receives a user query, it evaluates the name and description of each configured skill. If the agent identifies a skill as relevant, it retrieves the full instructions and any supporting scripts from the `SKILL.md` file and executes the skill. Skills follow a discovery-and-execution model. The agent doesn't persist a copy of the skill files, it only references the skill files in their original location and reads them on demand during orchestration. ### SKILL.md file structure Each skill is defined by a `SKILL.md` file that contains the following: - A skill name - A description of the skill - Instructions for the agent - Optional script references Each skill folder must contain a `SKILL.md` file at its root. The file defines the skill's identity, instructions, and any associated scripts. The following example shows the structure of the skill folder: ``` skills/ forecaster/ SKILL.md forecaster.py planner/ SKILL.md planner.py ``` The `SKILL.md` file includes the following fields: | Field | Required | Description | | ------------ | -------- | --------------------------------------------------------------------------- | | name | Yes | Unique identifier for the skill | | description | Yes | Brief summary used by the agent during orchestration to determine relevance | | instructions | Yes | Detailed instructions the agent follows when executing the skill | ### Skill discovery Cortex Agents reference the `SKILL.md` files at the root of each skill folder. The agent scans the stage contents for `SKILL.md` files and returns the skill name, description, and file location. ### Skill orchestration During agent invocation, the agent orchestrator uses the name and description of every skill referenced in the agent to decide which skills are relevant to the user's query. If a skill is selected, the agent retrieves the full `SKILL.md` content, including detailed instructions and script paths, from the source location. ## Skill sources You can store skills in one of the two following types of locations: - Named stages - Git repositories ### Named stages The following example shows how to store skill folders in a Snowflake named stage. **%sf-web-interface% UI:** 1. Sign in to %sf-web-interface-link%. 2. Navigate to the database and schema where you want to create the stage. 3. Create a stage named `skill_stage`. 4. Upload the skill files to the stage, placing them in the `skills/forecaster/` path. **SQL:** 1. Create a stage for skills. ```sql CREATE STAGE IF NOT EXISTS db1.schema1.skill_stage; ``` 2. Upload skill files to the stage. ``` PUT file:///path/to/forecaster/SKILL.md @db1.schema1.skill_stage/skills/forecaster/; PUT file:///path/to/forecaster/forecaster.py @db1.schema1.skill_stage/skills/forecaster/; ``` ### Git repositories The following example shows how to reference skills located in a Snowflake Git repository. You can point to a specific commit hash for stability or a tag for automatic updates: **%sf-web-interface% UI:** 1. Sign in to %sf-web-interface-link%. 2. Navigate to the Git repository integration where your skills are stored. 3. Reference the skill at a specific commit hash for stability, or use a tag for automatic updates after a FETCH. **SQL:** ```sql -- Reference a skill at a specific commit @my_db.my_schema.skills_repo/commits/abc123def/skills/forecaster -- Reference a skill at a tag (updates automatically on fetch) @my_db.my_schema.skills_repo/tags/latest/skills/forecaster ``` When you reference a Git tag, the skill updates automatically after the account admin runs a FETCH on the repository. Commit hash references are immutable. ## Manage skills ### List available skills List all skills available in a named stage or git repository: ```sql LS @db1.schema1.stage1/ PATTERN='.*SKILL\.md'; ``` The output shows each skill's name, description, and file location: | Name | Size | Checksum | Last Modified | | ------------------------------- | ---- | ------------- | --------------------------- | | skill_stage/forecaster/SKILL.md | 1008 | 1232131231231 | Tue March 10 2026 02:45 GMT | | skill_stage/planner/SKILL.md | 2001 | 1231231231231 | Tue March 10 2026 02:45 GMT | ### List skills on an agent View all skills configured on a specific agent: ```sql DESCRIBE AGENT db1.schema1.my_agent; ``` The output returns a JSON structure with each skill's name and source URL. ### Add a skill to an agent Add a skill to a Cortex Agent by updating the agent specification. You can use the %sf-web-interface% UI, SQL, or the REST API. The description field is optional. If omitted, Snowflake reads the description from the skill's `SKILL.md` file. **%sf-web-interface% UI:** 1. Sign in to %sf-web-interface-link%. 2. Navigate to the **Skills** tab. 3. Select **Add Skill** and choose **Stage** or **Git** as the source. 4. For a stage source, provide the name of the stage and the skill folder path. 5. For a Git source, provide the repository path to the skill. 6. Select **Add Skill**. **SQL:** To add a skill from a Snowflake named stage: ```sql ALTER AGENT db1.schema1.my_agent MODIFY LIVE VERSION SET SPECIFICATION = $$ { //Please include all existing non-changed fields as well "skills": [ { "name": "forecaster", "source": { "type": "STAGE", "path": "@db1.schema1.stage1/skills/forecaster" } } ] } $$; ``` To add a skill from a Git repository: ```sql ALTER AGENT db1.schema1.my_agent MODIFY LIVE VERSION SET SPECIFICATION = $$ { //Please include all existing non-changed fields as well "skills": [ { "name": "forecaster", "source": { "type": "GIT", "path": "@my_db.my_schema.skills_repo/tags/latest/skills/forecaster" } } ] } $$; ``` **API:** To add a skill from a Snowflake named stage: ``` PUT /api/v2/databases/{database}/schemas/{schema}/agents/{name} { "name": "my_agent", "comment": "Agent with skill capabilities", "spec": { "models": { "orchestration": "claude-4-sonnet" }, "instructions": { "response": "Provide concise forecasts and analysis." }, "skills": [ { "name": "forecaster", "source": { "type": "STAGE", "path": "@db1.schema1.stage1/skills/forecaster" } } ] } } ``` To add a skill from a Git repository: ``` PUT /api/v2/databases/{database}/schemas/{schema}/agents/{name} { "name": "my_agent", "comment": "Agent with skill capabilities", "spec": { "models": { "orchestration": "claude-4-sonnet" }, "instructions": { "response": "Provide concise forecasts and analysis." }, "skills": [ { "name": "forecaster", "source": { "type": "GIT_INTEGRATION", "path": "@my_db.my_schema.skills_repo/tags/latest/skills/forecaster" } } ] } } ``` ### Update a skill To update a skill's content, modify the `SKILL.md` file and any associated scripts at the source location. All agents that reference the skill automatically use the updated version on their next invocation. To update a skill's metadata in the agent specification (for example, the description), use the same PUT endpoint with the updated values. ### Remove a skill from an agent Remove a skill from an agent using the %sf-web-interface% UI, SQL, or the REST API. The remaining skills continue to function. **%sf-web-interface% UI:** 1. Sign in to %sf-web-interface-link%. 2. Navigate to the **Skills** tab. 3. Select the skill you want to remove and delete it. 4. Select **Save**. **SQL:** ```sql PUT /api/v2/databases/{database}/schemas/{schema}/agents/{name} { "name": "my_agent", "comment": "Agent with skill capabilities", "spec": { "models": { "orchestration": "claude-4-sonnet" }, "instructions": { "response": "Provide concise forecasts and analysis." }, "skills": [ ] } } ``` **API:** Update the agent specification and omit the skill from the skills array: ``` PUT /api/v2/databases/{database}/schemas/{schema}/agents/{name} { "name": "my_agent", "comment": "Agent with skill capabilities", "spec": { "models": { "orchestration": "claude-4-sonnet" }, "instructions": { "response": "Provide concise forecasts and analysis." }, "skills": [ ] } } ``` ### Add an existing skill to another agent You can add the same skill to multiple agents by referencing the same source path in each agent's specification. Because skills are referenced and not copied, updates to the skill files apply to all agents that use the skill. ### Skills with code If your skills need to execute code, you must enable the code execution tool on the agent. All scripts referenced by a skill must be located in the same folder as the `SKILL.md` file. ### Use skills in %sf-intelligence% The skills configured on an agent are automatically available in %sf-intelligence%. You can also explicitly select a skill for use by selecting the **+** button and then choosing the skill from the list. ## Access control The following table describes the privileges required for skill operations: | Privilege | Object | Required for | | --------- | --------------- | ------------------------------------------------ | | USAGE | Stage | Reading skill files from a named stage | | USAGE | Git Integration | Reading skill files from a Git repository | | MODIFY | Agent | Adding, updating, or removing skills in an agent | | OWNERSHIP | Agent | Full control over the agent configuration | | USAGE | Agent | Invoking the agent and its skills | ## Monitoring Skill invocations are surfaced in the thinking steps during %sf-intelligence% interactions. The monitoring dashboard displays skill invocation details alongside other orchestration information, including which skill was selected, the input provided, and the result returned. ## Limitations The following limitations apply to Cortex Agent skills: - **SKILL.md location:** The `SKILL.md` file must be at the root of the skill folder. Snowflake doesn't search subdirectories. - **Supporting files:** All scripts and supporting files must reside in the same folder as the `SKILL.md` file. - **Git fetch requirement:** Skills referenced by Git tag don't update automatically. The account admin must run a FETCH on the repository for changes to take effect. --- title: AI cost management and governance source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/governance-and-availability/ai-cost-management-and-governance.md section: Snowflake Cortex (AI & ML) --- # AI cost management and governance Snowflake gives you a consistent way to understand, monitor, and manage AI usage alongside the rest of your platform activity. Across AI features, pricing is primarily based on consumption, including token-based usage where applicable, so teams can align spend to actual usage instead of fixed capacity. To support cost transparency, Cortex AI provides usage views that help you analyze activity over time, break down consumption, and connect usage to billing workflows already used across your organization. These views can be used for reporting, governance, showback, and internal monitoring. For detailed pricing by feature, model, and unit of consumption, refer to the [consumption table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf), which provides the current pricing structure across Snowflake AI capabilities. ## Usage views Snowflake provides usage views to help you track AI consumption using the same core approach used across the platform. These views support analysis of usage over time and can help teams understand how AI activity maps to overall spend, whether they are monitoring adoption, reviewing trends, or supporting internal reporting. This allows finance, platform, and engineering teams to work from a common system of record when evaluating usage. Pricing details remain available in the [consumption table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf), which outlines how individual AI features are billed. Together, usage views and pricing documentation provide a foundation for understanding and managing AI costs across your Snowflake environment. ### Usage views for total cost These views should be used when calculating AI usage and AI-related spend. Together, they provide the standard foundation for cost reporting across AI features.
*Coming Soon: Cortex AI Guardrails Account Usage View*

[1] UTC Converted to local means if your account is altered to local time it will display in local time. The underlying data is still in UTC.

[2] CORTEX_REST_API_USAGE_HISTORY is billed in dollars and is not currently shown in account level DAILY_METERING_HISTORY.

[3] CORTEX_SEARCH_DAILY_USAGE_HISTORY includes embeddings which need to be excluded from combined calculations as they are also shown in CORTEX_AI_FUNCTIONS_USAGE_HISTORY.

### Usage views for additional analysis Use these views when you need more granular or feature-specific insight. They complement the primary views, but are not intended to serve as the standard source for AI cost totals.
### Total Cost of Operations Tokens, Messages, and others are not the only ways in which you are billed for Cortex AI, you also are billed for the query, warehouse time, and any other associated Snowflake charges. Through query_id, warehouse_id, user_id you should be able to calculate your total cost of operation. For more details please see the associated usage view or contact support. ## Budget features Snowflake budgets help organizations monitor credit usage and respond when spending approaches or exceeds configured thresholds. These features can support internal planning, alerting, and broader governance processes for AI usage as part of an overall cost management strategy. A budget defines a monthly spending limit for an account or for a custom group of Snowflake objects. Budgets can send notifications when spend is projected to exceed the configured limit, and Snowflake also supports custom actions for budgets based on either projected or actual consumption. This allows teams to pair spend monitoring with operational responses, using the same core budgeting model across Snowflake cost management workflows. ### Resource budgets for AI features [Resource budgets](/user-guide/snowflake-cortex/cortex-agents-resource-budgets) let administrators define a monthly credit limit for a tagged Cortex Agent object and evaluate spend against that budget on a periodic basis. Because they use Snowflake's tag-based cost attribution model, they fit into broader governance and budget management patterns already used across the platform. Snowflake also announced [resource budgets for %sf-intelligence%](/user-guide/snowflake-cortex/snowflake-cowork/cowork-resource-budgets) on the same date, extending this model across additional AI experiences. ### Shared resource budgets for AI Features A [shared resource budget](https://docs.snowflake.com/en/user-guide/budgets/budget-shared-resources) lets you track and control credit consumption for AI features – such as AI Functions, Cortex Agents, Cortex Code, and %sf-intelligence% – broken down by the team or cost center consuming them. Instead of budgeting a resource that belongs to a single owner or with a single budget, this budget tracks AI features that are used by specific users. Those users are identified with tags, so you can group them into logical units like a cost center or team. For example, if both an engineering team and a finance team call the same AI function, you can set up separate budgets that each track only the credits consumed by their respective tagged users, even though both teams are using the same underlying AI feature. ### Budget capability by feature
*Not supported nor planned: Cortex Analyst, Cortex Fine-tuning.* ### Budget timing, enforcement, and automated actions For Resource Budgets and Shared Resource Budgets you can attach stored procedures that are executed when spending reaches specific thresholds, which are expressed as a percentage of the spending limit and apply to the monthly budget period. Budget evaluation and enforcement are calculated periodically rather than instantaneously. After a budget threshold is exceeded, actions can take up to eight hours to take effect under normal operation, or up to two hours when using the latency-optimized option. Budgets are useful for ongoing spend management and policy enforcement, while still being part of a broader cost governance strategy that may also include usage monitoring and internal operational review. --- title: AI Observability in Snowflake Cortex source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/ai-observability.md section: Snowflake Cortex (AI & ML) --- # AI Observability in Snowflake Cortex This feature is not available in the People's Republic of China. Use AI Observability in Snowflake Cortex to evaluate and trace your generative AI applications. With AI Observability, you can make your applications more trustworthy and transparent. Use it to measure the performance of your AI applications by running systematic evaluations. You can use the information from the evaluations to iterate on your application configurations and optimize performance. You can also use it to log application traces for debugging purposes. Use AI Observability to benchmark performance, thus making your applications trustworthy and providing greater confidence for production deployments. AI Observability has the following features: - **Evaluations:** Use AI Observability to systematically evaluate the performance of your generative AI applications and agents using the LLM-as-a-judge technique. You can use metrics, such as accuracy, latency, usage, and cost, to quickly iterate on your application configurations and optimize performance. - **Comparison:** Compare multiple evaluations side by side and assess the quality and accuracy of responses. You can analyze the responses across different LLMs, prompts, and inference configurations to identify the best configuration for production deployments. - **Tracing:** Trace every step of application executions across input prompts, retrieved context, tool use, and LLM inference. Use it to debug individual records and refine the app for accuracy, latency, and cost. AI Observability can be used to evaluate a variety of task types, such as retrieval-augmented generation (RAG) and summarization. For example, the context relevance score can help you detect the quality of the search results retrieval corresponding to a user query. You can use the answer relevance and groundedness scores to detect the truthfulness and relevance of the final response based on the retrieved context. For summarization, you can measure the factual correctness and comprehensiveness of the LLM-generated summaries based on original input and avoid prompts and LLMs that have a higher frequency of hallucinations in your generative AI applications. To get started, learn about the [](#label-ai-observability-key-concepts), and then take a quick walkthrough with the [](/user-guide/snowflake-cortex/ai-observability/tutorial). You can then use the information in [](/user-guide/snowflake-cortex/ai-observability/evaluate-ai-applications) for an in-depth walkthrough. To review a specific concept, see the [](/user-guide/snowflake-cortex/ai-observability/reference). For querying `AI_OBSERVABILITY_EVENTS` with SQL for a Cortex Agent (pass `CORTEX AGENT` as `agent_type`) or an External Agent application (pass `EXTERNAL AGENT` as `agent_type`), see [](/user-guide/snowflake-cortex/cortex-agents-monitor), [](/sql-reference/functions/get_ai_observability_events-snowflake-local), and [](/sql-reference/commands-external-agent). Visibility of **unredacted** raw fields in monitoring and in observability user-defined table function results is covered by the **READ UNREDACTED AI OBSERVABILITY EVENTS TABLE** account privilege; it does not apply to **Cortex Agent evaluation** runs or the **External AgentEvaluations** experience. For more details, please see [](/release-notes/bcr-bundles/un-bundled/bcr-read-unredacted-ai-observability-events) and [](/user-guide/snowflake-cortex/cortex-agents-monitor). ## Access control and prerequisites Before you start using AI Observability: 1. To create and execute runs, your role must have the following roles or privileges granted. For more information, see [](#label-ai-observability-required-privileges): - CORTEX_USER database role - CREATE EXTERNAL AGENT privilege on the schema - CREATE TASK privilege on the schema - EXECUTE TASK global privilege 2. Install the following Trulens Python packages in your Python project: - `trulens-core` - `trulens-connectors-snowflake` - `trulens-providers-cortex` The version of the package that you're using in your Python project should be version 2.1.2 or later. TruLens is the platform that Snowflake uses to track your applications. For more information, see the [TruLens documentation](https://trulens.org/getting_started). ## Key concepts ### Applications An application is an end-to-end generative AI application that is designed using multiple components such as LLMs, tools (such as search retrievers or APIs), and additional custom logic. For example, an application can contain a RAG pipeline with retrievers, re-rankers, and LLMs chained together. You can enable AI observability for applications that can run in any environment (such as Snowflake, cloud, or on-premises). ### External Agent Applications are represented in Snowflake as External Agent objects. An External Agent object is used to store application and evaluation metadata (such as the application name, version name, or run name). It does not store the application code, application definition, execution traces, or evaluation results. While the application can be hosted in any environment (such as Snowflake, cloud, or on-premises), the execution traces and evaluation results are stored in an event table in your Snowflake account. For more information, see [](#label-ai-observability-data). In addition to storing application and evaluation metadata, the External Agent object is also used to govern access to the traces and evaluation results for the application. For more information, see [](#label-ai-observability-required-privileges). The TruLens SDK automatically creates External Agent objects when you register an application (for example, using `TruApp()`, `TruChain`, `TruGraph`, or `TruLlama`). Running an evaluation can also create an External Agent if one does not already exist for the specified application name. You can also manage external agents using SQL commands. For more information, see [](/sql-reference/commands-external-agent). External Agent objects share a namespace with [model](/sql-reference/sql/create-model) objects. You cannot create an external agent with the same name as an existing model in the same schema, and vice versa. If a name collision occurs (for example, when an evaluation and a model share the same name), you must rename or drop the conflicting object before proceeding. ### Versions Applications can have multiple versions. Each version represents a different implementation. For example, these versions can represent different retrievers, prompts, LLMs or inference configurations. ### Dataset A dataset represents a set of inputs. You can configure it to also represent a set of expected outputs (the ground truth) to test the application. Using the dataset, you can invoke the application to do the following tasks: - Generate the output. - Capture the traces. - Compute evaluation metrics. You can use a dataset containing both the inputs and the generated outputs to compute the evaluation metrics without invoking the application. For a list of fields supported in the dataset, see [](#label-ai-observability-dataset-and-attributes). ### Runs A run is an evaluation job. It uses the dataset and the application version that you've specified to compute evaluation metrics. A run has an invocation stage and a computation stage. The invocation stage triggers the application to generate the output and corresponding traces. The computation stage computes the evaluation metrics specified for the run. Multiple computations can be performed to add new metrics to an existing run. For the list of statuses associated with the execution of a run, see [](#label-ai-observability-runs). ### Metrics Evaluation metrics are scores that you use to assess generative AI application performance based on your own criteria. These metrics use LLMs to grade outputs and provide detailed scoring information. For a comprehensive list of metrics and their definitions, see [](#label-ai-observability-evaluation-metrics). ### Traces Traces are comprehensive records that capture the inputs, outputs, and intermediate steps of the interactions with an LLM application. Traces provide a detailed view of the application's execution. Use traces to analyze and understand the model's behavior at each stage. You can compare the traces of different application versions to identify improvements, debug issues, and verify intended performance. For information about accessing traces associated with each record, see [](/user-guide/snowflake-cortex/ai-observability/evaluate-ai-applications). ## Pricing AI Observability uses LLM judges to compute the evaluation metrics. For server-side evaluations, LLMs on Cortex AI are used as LLM judges. The LLM judges are invoked via the [](/sql-reference/functions/ai_complete) function to perform evaluations. You incur charges for the Cortex Complete function calls. The LLM used to perform the evaluations determines how much you're charged. Additionally, you're charged the following: - Warehouse charges for tasks used to manage evaluation runs - Warehouse charges for queries used to compute evaluation metrics - Storage charges for the evaluation results - Warehouse charges to retrieve the evaluation results to be viewed in Snowsight --- title: AI Observability Tutorial source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/ai-observability/tutorial.md section: Snowflake Cortex (AI & ML) --- # AI Observability Tutorial Learn how to implement AI observability in a retrieval-augmented generation (RAG) application using [](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) and [](/sql-reference/functions/complete-snowflake-cortex) functions. In the [Getting started with AI observability](https://quickstarts.snowflake.com/guide/getting_started_with_ai_observability) tutorial, you'll learn how to do the following tasks: - Build a RAG application using Snowflake Cortex Search and Snowflake Cortex LLM functions. - Create a run. - Compute evaluation metrics. --- title: AI_COMPLETE structured outputs source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/complete-structured-outputs.md section: Snowflake Cortex (AI & ML) --- # AI_COMPLETE structured outputs This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/aisql) - [](/sql-reference/functions/ai_complete) AI_COMPLETE lets you supply a JSON schema or SQL type literal that completion responses must follow, producing structured output. Structured output reduces the need for post-processing in your AI data pipelines and enables seamless integration with systems that require deterministic responses. AI_COMPLETE verifies each generated token against your structured output definition to ensure that the response conforms to your type structure. Every model supported by AI_COMPLETE supports structured output, but the most powerful models typically generate higher quality responses. ## Using AI_COMPLETE with type literals Type literals allow you to define structured output for AI_COMPLETE using SQL types, taking advantage of Snowflake's built-in mappings between SQL and JSON types. Begin your type literal with the TYPE keyword and use a SQL OBJECT as the top-level type. The properties of your top-level object can be any SQL type with a supported mapping to JSON. Type literals are supported only for the single string text prompt version of AI_COMPLETE. For more information, see [](/sql-reference/functions/ai_complete-single-string). The following example uses a type literal to produce structured output for a prompt. The prompt contains both instructions to the model and the data to process. The `response_format` type literal produces the model's response as a JSON object with a top-level `note` containing a `date`, `address`, `items_count`, and a `price` array containing prices. ```sql SELECT AI_COMPLETE( model => 'llama3.3-70b', prompt => 'Extract structured data from this customer interaction note: Customer Sarah Jones complained about the mobile app crashing during checkout. She tried to purchase 3 items: a red XL jacket ($89.99), blue running shoes ($129.50), and a fitness tracker ($199.00). The app crashed after she entered her shipping address at 123 Main St, Portland OR, 97201. She has been a premium member since January 2024.', response_format => TYPE OBJECT(note OBJECT(items_count NUMBER, price ARRAY(STRING), address STRING, member_date STRING)), show_details => TRUE ); ``` The following is a full response to this query: ```text { "created": 1758755328, "model": "llama3.3-70b", "structured_output": [ { "raw_message": { "note": { "items_count": 3, "price": [ "$89.99", "$129.50", "$199.00" ] } }, "type": "json" } ], "usage": { "completion_tokens": 49, "prompt_tokens": 100, "total_tokens": 149 } } ``` ### Type literal notes and limitations Specifying a structured output schema as a type literal follows these rules: - STRING and VARCHAR types are mapped to JSON strings. - VARCHAR types aren't guaranteed to produce output of a specific length. - FIXED types without a scale are mapped to JSON integers. All other numeric types are mapped to JSON numbers. Type literals have restrictions around supported types: - The empty object OBJECT() isn't allowed as a type literal. - Not all SQL types have a mapping for structured output. These include, but aren't limited, to: - VARIANT - MAP - [](/sql-reference/data-types-datetime) The use of an unsupported data type returns an error. ## Using AI_COMPLETE with JSON schemas For more control over structured output, use a [JSON schema](https://json-schema.org/) as the value for `response_format`. The supplied JSON schema defines the structure, data types, and constraints that the generated text must conform to, including required fields. For simple tasks, you don't need to specify any details of the output format, or even instruct the model to "respond in JSON." For more complex tasks, prompting the model to respond in JSON can improve accuracy; see [](#label-complete-structured-output-optimizing-adherence). The following illustrates the syntax of an AI_COMPLETE function call that uses a JSON schema to specify the structured output format. The schema defines a top-level object, `properties`, with a `property_name` property of type string; this field is required in the response. ```sql AI_COMPLETE( ... response_format => { 'type': 'json', 'schema': { 'type': 'object', 'properties': { 'property_name': { 'type': 'string' }, ... }, 'required': ['property_name', ...] } } ) ``` For OpenAI (GPT) models, the following requirements apply: - [additionalProperties](https://json-schema.org/understanding-json-schema/reference/object#additionalproperties) field must be set to `false` in every node of the schema. - The [required](https://json-schema.org/understanding-json-schema/reference/object#required) field must be included and contain the names of every property in the schema. Other models do not require these fields, but you might include them anyway so you don't need a different schema for OpenAI models. ## SQL examples The following example is a more complete demonstration of using AI_COMPLETE with a single string input. ```sql SELECT AI_COMPLETE( model => 'mistral-large2', prompt => 'Return the customer sentiment for the following review: New kid on the block, this pizza joint! The pie arrived neither in a flash nor a snail\'s pace, but the taste? Divine! Like a symphony of Italian flavors, it was a party in my mouth. But alas, the party was a tad pricey for my humble abode\'s standards. A mixed bag, I\'d say!', response_format => { 'type':'json', 'schema':{'type' : 'object','properties' : {'sentiment_categories':{'type': 'object','properties': {'food_quality' : {'type' : 'string'},'food_taste': {'type':'string'}, 'wait_time': {'type':'string'}, 'food_cost': {'type':'string'}},'required':['food_quality','food_taste' ,'wait_time','food_cost']}}} } ); ``` Response: ```text { "sentiment_categories": { "food_cost": "negative", "food_quality": "positive", "food_taste": "positive", "wait_time": "neutral" } } ``` The following example demonstrates how to use the `response_format` argument to specify a JSON schema for the response and using the `show_details` argument to return inference metadata. ```sql SELECT AI_COMPLETE( model => 'mistral-large2', prompt => 'Return the customer sentiment for the following review: New kid on the block, this pizza joint! The pie arrived neither in a flash nor a snail\'s pace, but the taste? Divine! Like a symphony of Italian flavors, it was a party in my mouth. But alas, the party was a tad pricey for my humble abode\'s standards. A mixed bag, I\'d say!', response_format => { 'type':'json', 'schema':{'type' : 'object','properties' : {'sentiment_categories':{'type': 'object','properties': {'food_quality' : {'type' : 'string'},'food_taste': {'type':'string'}, 'wait_time': {'type':'string'}, 'food_cost': {'type':'string'}},'required':['food_quality','food_taste' ,'wait_time','food_cost']}}} }, show_details => TRUE ); ``` Response: ```text { "created": 1738683744, "model": "mistral-large2", "structured_output": [ { "raw_message": { "sentiment_categories": { "food_cost": "negative", "food_quality": "positive", "food_taste": "positive", "wait_time": "neutral" } }, "type": "json" } ], "usage": { "completion_tokens": 60, "prompt_tokens": 94, "total_tokens": 154 } } ``` ## Python example Structured output is supported in `snowflake-ml-python` version 1.8.0 and later. The following example demonstrates how to use the `response_format` argument to specify a JSON schema for the response. ```python from snowflake.cortex import complete, CompleteOptions response_format = { "type": "json", "schema": { "type": "object", "properties": { "people": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "number"}, }, "required": ["name", "age"], }, } }, "required": ["people"], }, } prompt = [{ "role": "user", "content": "Please prepare me a data set of 5 ppl and their age", }] options = CompleteOptions( max_tokens=4096, temperature=0.7, top_p=1, guardrails=False, response_format=response_format ) result = complete( model="claude-sonnet-4-6", prompt=prompt, session={session_object}, # session created via connector stream=True, options=options, ) output = "".join(result) print(output) ``` Response: ```text {"people": [{"name":"John Smith","age":32},{"name":"Sarah Johnson","age":28}, {"name":"Michael Chen","age":45},{"name":"Emily Davis","age":19},{"name":"Robert Wilson","age":56}]} ``` ### Pydantic example Pydantic is a data validation and settings management library for Python. This example uses Pydantic to define a schema for the response format. The code performs these steps: 1. Uses Pydantic to define a schema 2. Converts the Pydantic model to a JSON schema using the `model_json_schema` method 3. Passes the JSON schema to the `complete` function as the `response_format` argument This example is meant to be run in a Snowsight Python worksheet, which already has a connection to Snowflake. To run it in a different environment, you might need to [establish a connection to Snowflake](/developer-guide/python-connector/python-connector-connect) using the Snowflake Connector for Python. ```python from pydantic import BaseModel, Field from snowflake.cortex import complete, CompleteOptions from snowflake.snowpark.context import get_active_session class Person(BaseModel): age: int = Field(description="Person age") name: str = Field(description="Person name") class People(BaseModel): people: list[Person] = Field(description="People list") ppl = People.model_json_schema() ''' This is the ppl object, keep in mind there's a '$defs' key used {'$defs': {'Person': {'properties': {'age': {'description': 'Person age', 'title': 'Age', 'type': 'integer'}, 'name': {'description': 'Person name', 'title': 'Name', 'type': 'string'}}, 'required': ['age', 'name'], 'title': 'Person', 'type': 'object'}}, 'properties': {'people': {'description': 'People list', 'items': {'$ref': '#/$defs/Person'}, 'title': 'People', 'type': 'array'}}, 'required': ['people'], 'title': 'People', 'type': 'object'} ''' response_format_pydantic={ "type": "json", "schema": ppl, } prompt=[{"role": "user", "content": "Please prepare me a data set of 5 ppl and their age"}] options_pydantic = CompleteOptions( # random params max_tokens=4096, temperature=0.7, top_p=1, guardrails=False, response_format=response_format_pydantic ) model_name = "claude-sonnet-4-6" session = get_active_session() try: result_pydantic = complete( model=model_name, prompt=prompt, session=session, stream=True, options=options_pydantic, ) except Exception as err: result_pydantic = (chunk for chunk in err.response.text) # making sure it's generator, similar to the valid response output_pydantic = "".join(result_pydantic) print(output_pydantic) ``` Response: ```text {"people": [{"name":"John Smith","age":32},{"name":"Sarah Johnson","age":45}, {"name":"Mike Chen","age":28},{"name":"Emma Wilson","age":19},{"name":"Robert Brown","age":56}]} ``` ## REST API example You can use the [Snowflake Cortex LLM REST API](/user-guide/snowflake-cortex/cortex-rest-api) to invoke COMPLETE with the LLM of your choice. Below is an example supplying a schema using the Cortex LLM REST API: ```shell-session curl --location --request POST 'https://.snowflakecomputing.com/api/v2/cortex/inference:complete' --header 'Authorization: Bearer ' \ --header 'Accept: application/json, text/event-stream' \ --header 'Content-Type: application/json' \ --data-raw '{ "model": "claude-sonnet-4-6", "messages": [{ "role": "user", "content": "Order a pizza for a hungry space traveler heading to the planet Zorgon. Make sure to include a special instruction to avoid any intergalactic allergens." }], "max_tokens": 1000, "response_format": { "type": "json", "schema": { "type": "object", "properties": { "crust": { "type": "string", "enum": [ "thin", "thick", "gluten-free", "Rigellian fungus-based" ] }, "toppings": { "type": "array", "items": { "type": "string", "enum": [ "Gnorchian sausage", "Andromedian mushrooms", "Quasar cheese" ] } }, "delivery_planet": { "type": "string" }, "special_instructions": { "type": "string" } }, "required": [ "crust", "toppings", "delivery_planet" ] } } }' ``` Response: ```text data: {"id":"4d62e41a-d2d7-4568-871a-48de1463ed2a","model":"claude-sonnet-4-6","choices":[{"delta":{"content":"{\"crust\":","content_list":[{"type":"text","text":"{\"crust\":"}]}}],"usage":{}} data: {"id":"4d62e41a-d2d7-4568-871a-48de1463ed2a","model":"claude-sonnet-4-6","choices":[{"delta":{"content":" \"thin\"","content_list":[{"type":"text","text":" \"thin\""}]}}],"usage":{}} data: {"id":"4d62e41a-d2d7-4568-871a-48de1463ed2a","model":"claude-sonnet-4-6","choices":[{"delta":{"content":", \"topping","content_list":[{"type":"text","text":", \"topping"}]}}],"usage":{}} data: {"id":"4d62e41a-d2d7-4568-871a-48de1463ed2a","model":"claude-sonnet-4-6","choices":[{"delta":{"content":"s\": [\"Quasar","content_list":[{"type":"text","text":"s\": [\"Quasar"}]}}],"usage":{}} ``` ## Create a JSON schema definition To get the best accuracy from COMPLETE Structured Outputs, follow these guidelines: - **Use the "required" field** in the schema to specify required fields. COMPLETE raises an error if a required field cannot be extracted. In the following example, the schema directs COMPLETE to find people mentioned in the document. The `people` field is marked as required to make sure people are identified. ```sql { 'type': 'object', 'properties': { 'dataset_name': { 'type': 'string' }, 'created_at': { 'type': 'string' }, 'people': { 'type': 'array', 'items': { 'type': 'object', 'properties': { 'name': { 'type': 'string' }, 'age': { 'type': 'number' }, 'isAdult': { 'type': 'boolean' } } } } }, 'required': [ 'dataset_name', 'created_at', 'people' ] } ``` Response: ```text { "dataset_name": "name", "created_at": "date", "people": [ { "name": "Andrew", "isAdult": true } ] } ``` - **Provide detailed descriptions** of the fields to be extracted so that the model can more accurately identify them. For example, the following schema includes a description of each of the fields of `people`: `name`, `age`, and `isAdult`. ```sql { 'type': 'object', 'properties': { 'dataset_name': { 'type': 'string' }, 'created_at': { 'type': 'string' }, 'people': { 'type': 'array', 'items': { 'type': 'object', 'properties': { 'name': { 'type': 'string', 'description': 'name should be between 9 to 10 characters' }, 'age': { 'type': 'number', 'description': 'Should be a value between 0 and 200' }, 'isAdult': { 'type': 'boolean', 'description': 'Persons is older than 18' } } } } } } ``` ### Using a JSON reference Schema references solve practical problems when using Cortex COMPLETE Structured Outputs. With references, represented by `$ref`, you can define common objects like addresses or prices once, then reuse them throughout the schema. This way, when you need to update validation logic or add a field, you can change it in one place instead of in multiple locations. Using references reduces coding effort, reduces bugs from inconsistent implementations, and makes code reviews simpler. Referenced components create cleaner hierarchies that better represent entity relationships in your data model. As projects grow more complex, this modular approach helps you manage technical debt while maintaining schema integrity. Third-party libraries such as Pydantic support the reference mechanism natively in Python, simplifying schema usage in your code. The following guidelines apply to the use of references in JSON schema: - **Scope limitation:** The `$ref` mechanism is limited to the user's schema only; external schema references (such as HTTP URLs) are not supported. - **Definition placement:** Object definitions should be placed at the top level of the schema, specifically under the definitions or `$defs` key. - **Enforcement:** While the JSON Schema specification recommends using the `$defs` key for definitions, Snowflake's validation mechanism strictly enforces this structure. This is an example of a valid `$defs` object: ```javascript { '$defs': { 'person':{'type':'object','properties':{'name' : {'type' : 'string'},'age': {'type':'number'}}, 'required':['name','age']}}, 'type': 'object', 'properties': {'title':{'type':'string'},'people':{'type':'array','items':{'$ref':'#/$defs/person'}}} } ``` #### Example using JSON reference This SQL example demonstrates the use of references in a JSON schema. ```sql select ai_complete( model => 'claude-sonnet-4-6', prompt => 'Extract structured data from this customer interaction note: Customer Sarah Jones complained about the mobile app crashing during checkout. She tried to purchase 3 items: a red XL jacket ($89.99), blue running shoes ($129.50), and a fitness tracker ($199.00). The app crashed after she entered her shipping address at 123 Main St, Portland OR, 97201. She has been a premium member since January 2024.', 'response_format' => { 'type': 'json', 'schema': { 'type': 'object', '$defs': { 'price': { 'type': 'object', 'properties': { 'amount': {'type': 'number'}, 'currency': {'type': 'string'} }, 'required': ['amount'] }, 'address': { 'type': 'object', 'properties': { 'street': {'type': 'string'}, 'city': {'type': 'string'}, 'state': {'type': 'string'}, 'zip': {'type': 'string'}, 'country': {'type': 'string'} }, 'required': ['street', 'city', 'state'] }, 'product': { 'type': 'object', 'properties': { 'name': {'type': 'string'}, 'category': {'type': 'string'}, 'color': {'type': 'string'}, 'size': {'type': 'string'}, 'price': {'$ref': '#/$defs/price'} }, 'required': ['name', 'price'] } }, 'properties': { 'customer': { 'type': 'object', 'properties': { 'name': {'type': 'string'}, 'membership': { 'type': 'object', 'properties': { 'type': {'type': 'string'}, 'since': {'type': 'string'} } }, 'shipping_address': {'$ref': '#/$defs/address'} }, 'required': ['name'] }, 'issue': { 'type': 'object', 'properties': { 'type': {'type': 'string'}, 'platform': {'type': 'string'}, 'stage': {'type': 'string'}, 'severity': {'type': 'string', 'enum': ['low', 'medium', 'high', 'critical']} }, 'required': ['type', 'platform'] }, 'cart': { 'type': 'object', 'properties': { 'items': { 'type': 'array', 'items': {'$ref': '#/$defs/product'} }, 'total': {'$ref': '#/$defs/price'}, 'item_count': {'type': 'integer'} } }, 'recommended_actions': { 'type': 'array', 'items': { 'type': 'object', 'properties': { 'department': {'type': 'string'}, 'action': {'type': 'string'}, 'priority': {'type': 'string', 'enum': ['low', 'medium', 'high', 'urgent']} } } } }, 'required': ['customer', 'issue','cart'] } } } ); ``` Response: ```text { "created": 1747313083, "model": "claude-sonnet-4-6", "structured_output": [ { "raw_message": { "cart": { "item_count": 3, "items": [ { "color": "red", "name": "jacket", "price": { "amount": 89.99, "currency": "USD" }, "size": "XL" }, { "color": "blue", "name": "running shoes", "price": { "amount": 129.5, "currency": "USD" } }, { "name": "fitness tracker", "price": { "amount": 199, "currency": "USD" } } ], "total": { "amount": 418.49, "currency": "USD" } }, "customer": { "membership": { "since": "2024-01", "type": "premium" }, "name": "Sarah Jones", "shipping_address": { "city": "Portland", "state": "OR", "street": "123 Main St", "zip": "97201" } }, "issue": { "platform": "mobile", "severity": "high", "stage": "checkout", "type": "app_crash" } }, "type": "json" } ], "usage": { "completion_tokens": 57, "prompt_tokens": 945, "total_tokens": 1002 } } ``` ## Optimizing JSON adherence accuracy COMPLETE Structured Outputs does not usually require a prompt; it already understands that its response should conform to the schema you specify. However, task complexity can significantly influence the ability of LLMs to follow a JSON response format. The more complex the task, the more you can improve the accuracy of results by specifying a prompt. - **Simple tasks** such as text classification, entity extraction, paraphrasing, and summarization tasks that don't require complex reasoning generally do not require additional prompting. For smaller models of lower intelligence, just using Structured Outputs significantly improves JSON adherence accuracy, as it ignores any text the model provides unrelated to the supplied schema. - **Medium-complexity tasks** include any simple task in which the model is asked for additional reasoning, such as providing its rationale for a classification decision. For these use cases, we recommend adding “Respond in JSON” in the prompt to optimize performance. - **Complex reasoning tasks** prompt models to perform more open-ended ambiguous tasks, such as assessing and scoring the quality of a call based on the relevance, professionalism, and faithfulness of answers. For these use cases, we recommend using the most powerful models like Anthropic's `claude-sonnet-4-6` or Mistral AI's `mistral-large2` and adding “Respond in JSON”, and details about the schema you want to generate in the prompt. For the most consistent results, set the `temperature` option to 0 when you call COMPLETE, regardless of the task or model. To handle possible errors raised by a model, use [TRY_COMPLETE](/sql-reference/functions/try_complete-snowflake-cortex) rather than COMPLETE. ## Cost considerations Cortex COMPLETE Structured Outputs incurs compute cost based on the number of tokens processed, but does not incur additional compute cost for the overhead of verifying each token against the supplied JSON schema. However, the number of tokens processed (and billed) increases with schema complexity. In general, the larger and more complex the supplied schema is, the more input and output tokens are consumed. Highly-structured responses with deep nesting (e.g., hierarchical data) consume a larger number of tokens than simpler schemas. ## Limitations - You cannot use spaces in the keys of the schema. - The characters allowed for property names are letters, digits, hyphen, underscore. Names may be a maximum of 64 characters long. - You cannot address external schemas using `$ref` or `$dynamicRef`. The following constraint keywords are not supported. The use of an unsupported constraint keyword results in an error. | Type | Keywords | | ------- | ------------------------------------------------------------------------------- | | integer | `multipleOf` | | number | `multipleOf`, `minimum`, `maximum`, `exclusiveMinimum`, `exclusiveMaximum` | | string | `minLength`, `maxLength`, `format` | | array | `uniqueItems`, `contains`, `minContains`, `maxContains`, `minItems`, `maxItems` | | object | `patternProperties`, `minProperties`, `maxProperties`, `propertyNames` | These limitations might be addressed in future releases. ## Error conditions --- title: AI_COMPLETE with documents source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/ai-complete-document-intelligence.md section: Snowflake Cortex (AI & ML) --- # AI_COMPLETE with documents This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/aisql) - [](/user-guide/snowflake-cortex/parse-document) - [](/user-guide/snowflake-cortex/document-extraction) - [](/sql-reference/functions/ai_complete-prompt-object) The Cortex AI_COMPLETE function is a general purpose AI Function that can understand data stored in PDF, Microsoft Word, and other document file formats. You can use AI_COMPLETE to perform a variety of document data extraction tasks, such as: - Answer questions using data in graphs and charts. - Finding relations between charts and document text. - Summarizing document content in the a specific question. - Extracting entities from documents. An advantage of AI_COMPLETE over other [document processing AI Functions](/user-guide/snowflake-cortex/ai-documents) is the ability to choose a model, so you can use the best model for your specific document processing task. ## Processing documents with AI_COMPLETE The COMPLETE function processes documents files stored in an internal Snowflake stage or an external stage. The completion prompt can reference a single document or multiple documents. For example, you compare the correctness of a translation of marketing materials by providing the original and translated documents as input to the function, along with a prompt asking the model to evaluate the translation quality. When calling the function, you must specify the model to use and a prompt. The prompt should include instructions along with a FILE object reference for each document you want to process. See [Examples](#examples) for sample prompts and completions, and [](/sql-reference/functions/ai_complete-prompt-object) for function call syntax. ### Input requirements AI_COMPLETE is optimized for documents both digital-born and scanned. The following table lists the limitations and requirements of input documents: | Supported file type | All models: .txt, .md, .pdf Claude models: .txt, .md, .pdf, .doc, .docx, .xls, .xlsx, .csv, .xhtml | | ------------------- | -------------------------------------------------------------------------------------------------- | | Stage encryption | Server-side encryption | | Data type | [FILE](/sql-reference/functions/to_file) object | Processing files from stages with AI_COMPLETE is currently incompatible with custom network policies. ### Examples The following examples illustrate how to use AI_COMPLETE to process documents for three common use cases: chart Q&A, contextualized document summarization, and technical report exploration. ## Chart Q&A example ![Title page of "Breaking the Gridlock" EU report](/static/images/cortex-functions/breaking-the-gridlock.png) The following example uses Anthropic's Claude Opus 4 model to analyze data represented in a chart within the context of the document `hdr2023-24snapshoten.pdf` stored in the `@docs` stage. ```sql SELECT AI_COMPLETE( MODEL => 'claude-4-opus', PROMPT => PROMPT('Compare the distributions of HDI in each group: low HDI group, medium HDI group, high HDI group and very high HDI group visualized in {0}', TO_FILE('@docs', 'hdr2023-24snapshoten.pdf')) ); ``` Response: ```text Looking at the document, I can see Figure S.2 on page 6 which shows the recovery of HDI values since the 2020-2021 decline across different HDI groups. The visualization shows: **Low HDI group**: - 49% recovered - 51% did not recover **Medium HDI group**: - The document doesn't provide specific recovery percentages for this group in the figure **High HDI group**: - The document doesn't provide specific recovery percentages for this group in the figure **Very high HDI group**: - 100% recovered (all OECD countries) The document also provides additional insights about HDI distributions: 1. **Inequality trends** (Figure S.3, page 7): The inequality between very high HDI and low HDI countries has been increasing since 2020, reversing a long-term declining trend. The difference in HDI values between these groups increased from 0.38 in 2017 to 0.39 in 2020 and is projected to continue rising. 2. **Overall recovery patterns**: The text notes that "every Organisation for Economic Co-operation and Development country is projected to have recovered, but only about half of the Least Developed Countries are projected to have done so." 3. **Global HDI trajectory**: While the global HDI value is projected to reach a record high in 2023, it remains below its pre-2019 trend, suggesting a permanent shift in the development trajectory. The document emphasizes the highly unequal nature of recovery across HDI groups, with countries in higher HDI categories showing much stronger recovery rates than those in lower HDI categories. ``` ## Contextualized document summarization example ![First page of EU Resolution 1008/2008](/static/images/cortex-functions/eu-resolution-1008-2008.png) The following example uses Anthropic's Claude Sonnet 4 model to extract the summary of a legal text with a complex layout. The document `CELEX_32008R1008_EN_TXT.pdf` is stored inthe @docs stage; the prompt narrows the summarization context. ```sql SELECT AI_COMPLETE( MODEL => 'claude-4-sonnet', PROMPT => PROMPT('summarize the changes from the perspective of private aviation {0}', TO_FILE('@docs', 'CELEX_32008R1008_EN_TXT.pdf')) ); ``` Response: ```text From a private aviation perspective, this EU Regulation 1008/2008 introduces several significant changes that affect private operators and business aviation: ## Key Changes Affecting Private Aviation: ### **Operating License Requirements** - **Stricter financial oversight**: Private operators must demonstrate they can meet financial obligations for 24 months from start of operations, with enhanced monitoring during the first two years - **Simplified requirements for smaller operators**: Aircraft under 10 tonnes MTOM and/or less than 20 seats need only demonstrate €100,000 net capital (unless operating scheduled services or exceeding €3 million turnover) ### **Aircraft Registration and Leasing** - **Flexible registration**: Private operators can register aircraft either in their home Member State or anywhere within the Community - **Wet lease restrictions**: Stricter limitations on wet leasing aircraft from third countries, requiring prior approval and justification based on: - Exceptional needs (up to 7 months, renewable once) - Seasonal capacity requirements - Operational difficulties when EU-registered aircraft unavailable - **Safety equivalence**: All third-country leased aircraft must meet equivalent safety standards to EU requirements. ### **Enhanced Supervision** - **Unified oversight**: The same Member State authority now oversees both the Air Operator Certificate (AOC) and operating license, improving efficiency for operators with bases in multiple countries - **Regular assessments**: Mandatory financial reviews, particularly after two years of operation and when potential problems are suspected ### **Insurance Requirements** - **Extended coverage**: Insurance requirements now explicitly include mail liability coverage in addition to passengers, cargo, and third parties ### **Operational Flexibility** - **Code-sharing freedom**: Private operators can more freely enter into code-share arrangements on intra-Community routes and routes to third countries - **Pricing freedom**: Complete freedom to set fares and rates for intra-Community services ### **Administrative Streamlining** - **Consolidated regulation**: The three separate regulations are now combined into one comprehensive framework, simplifying compliance - **Reduced bureaucracy**: Member States cannot require documents already provided to licensing authorities These changes generally **liberalize** private aviation operations within the EU while **strengthening** financial and safety oversight, creating a more integrated and competitive market for private operators. ``` ## Technical report exploration ![First page of 75mm gun specification booklet](/static/images/cortex-functions/gun-75mm-specification.png) The following example uses the Gemini 3.1 Pro model to analyze casualty data represented in the diagrams of a technical report. The document `75mm-M3-spec-booklet-MK-VI.pdf` is stored in the `@docs` stage. ```sql SELECT AI_COMPLETE( MODEL => 'gemini-3.1-pro', PROMPT => PROMPT('explain findings from figures 69-73 of {0}', TO_FILE('@docs', '75mm-M3-spec-booklet-MK-VI.pdf')) ); ``` Response: ```text Based on the provided document, specifically **page 4**, here is an explanation of the findings from Figures 69 through 73. These figures illustrate the fragmentation patterns and effectiveness of the **75-mm Shell, H.E., M48** when fired from an M3 Gun. They visualize how dangerous the shell is to personnel (casualties) and equipment (perforation of mild steel) at different burst heights and orientations. ``` ### Supported models and limitations All models available to Snowflake Cortex have limitations on the total number of input and output tokens, known as the model's *context window.* The context window size is measured in tokens. Inputs exceeding the context window limit result in an error. For text models, tokens generally represent approximately four characters of text; the word count corresponding to a limit is somewhat less than the context window given in tokens. For image models, the token count per document depends on the vision model's architecture. Tokens within a prompt (e.g., “summarize this document:”) also contribute to the model's context window.
| Model | Context window (tokens) | File types | File size | Max pages | Documents per prompt | | ------------------- | ----------------------- | ------------------------------------------------------- | --------- | --------- | -------------------- | | `gemini-3.1-pro` | 1,000,000 | .pdf, .txt, .md | 37.5MB | 3,000 | 20 | | `gemini-3.5-flash` | 1,000,000 | .pdf, .txt, .md | 37.5MB | 1,000 | 20 | | `claude-4-sonnet` | 200,000 | .txt, .md, .pdf, .doc, .docx, .xls, .xlsx, .csv, .xhtml | 22MB | 100 | 5 | | `claude-4-opus` | 200,000 | .txt, .md, .pdf, .doc, .docx, .xls, .xlsx, .csv, .xhtml | 22MB | 100 | 5 | | `claude-haiku-4-5` | 200,000 | .txt, .md, .pdf, .doc, .docx, .xls, .xlsx, .csv, .xhtml | 22MB | 100 | 5 | | `claude-sonnet-4-5` | 200,000 | .txt, .md, .pdf, .doc, .docx, .xls, .xlsx, .csv, .xhtml | 22MB | 100 | 5 | | `claude-opus-4-5` | 200,000 | .txt, .md, .pdf, .doc, .docx, .xls, .xlsx, .csv, .xhtml | 22MB | 100 | 5 | | `claude-sonnet-4-6` | 1,000,000 | .txt, .md, .pdf, .doc, .docx, .xls, .xlsx, .csv, .xhtml | 22MB | 100 | 5 |
### Access control requirements To use the AI_COMPLETE function, a user with the ACCOUNTADMIN role must grant the SNOWFLAKE.CORTEX_USER database role to the user who will call the function. See [](#label-cortex-llm--privileges) topic for details. Users must also have READ access to the stage and file being processed. ### Cost considerations Cost is determined by the total number of [tokens processed](#label-cortex-llm-cost-considerations), not by file size. When documents are uploaded, textual content is extracted and converted into tokens; visual page segments (images) are also transformed into tokens. Billing is based on the sum of input tokens (text plus images that the model reads) and output tokens (text the model generates). Actual token counts vary based on the underlying architecture of a model, as well as the document composition and structure. Content such as dense tables, spreadsheets, structured data, code, repeated headers and footers, or OCR-derived text may increase token volume. Conversely, image-heavy or slide-based documents with minimal extractable text may result in lower token counts. The AI_COUNT_TOKENS function does not currently support document inputs in multimodal models. ### Choosing a model The [MMLongBench-Doc](https://proceedings.neurips.cc/paper_files/paper/2024/hash/ae0e43289bffea0c1fa34633fc608e92-Abstract-Datasets_and_Benchmarks_Track.html) benchmark is used for evaluating model capabilities in multimodal and long context comprehension, including cross page information retrieval.
| Model | MMLongBench-Doc score | | ----------------- | --------------------- | | claude-3-7-sonnet | 52.8% | | claude-4-sonnet | 50.2% | | claude-4-opus | 53.0% | | claude-haiku-4-5 | 48.9% | | claude-sonnet-4-5 | 61.4% | | claude-opus-4-5 | 63.8% | | claude-sonnet-4-6 | 62.3% | | gemini-3.1-pro | 60.5% |
### Regional availability See [](#label-cortex-llm-availability). ### Error conditions Snowflake Cortex AI_COMPLETE can produce the following error messages:
### Legal notices The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). --- title: Artifacts in Snowflake CoWork source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/snowflake-cowork/artifacts.md section: Snowflake Cortex (AI & ML) --- # Artifacts in %sf-intelligence% This feature is not available in the People's Republic of China. Available to all accounts. %sf-intelligence% delivers rich, interactive charts and tables as part of its responses. When you find an insight worth keeping, you can save or share it as an artifact. An artifact is a persistent representation of that insight that you can revisit, refresh, and collaborate on without regenerating it. After being created, an artifact preserves the query, visualization, and context so you can return later to see fresh data, or share it with a teammate who sees the same artifact filtered through their own data permissions. ## Interactive tables and charts When you ask %sf-intelligence% a question, the agent generates a response that may include a chart, a table, or both. Charts and tables are interactive, so you can sort, filter, search, and resize directly without asking a new question. In explorer mode, charts and tables are synced so that interactions in one update the other. Queries default to rolling time windows (for example, "last 30 days" always means the most recent 30 days). If you need a fixed time period, ask with explicit dates such as "show me data from November 15 through November 22." ## Save artifacts When a chart or table contains an insight you want to keep, select **Save** to create an artifact. The artifact preserves the underlying query, visualization settings, and a data snapshot so it loads instantly when viewed later. ## Manage artifacts The artifacts hub is the central place to manage your artifacts. It contains the following tabs: - **Saved**: All artifacts you've saved. - **Shared with me**: Artifacts shared with you through a link. The hub displays cached snapshots as tile previews for fast loading. You can select a tile to expand the artifact, see additional context, and start a follow-up conversation. You can also search for saved artifacts by name within the artifacts hub. Artifacts auto-refresh when you view them more than 12 hours after your last view. You can also refresh manually at any time. The refresh re-runs the original SQL query with your current credentials and updates both the data and the snapshot. You can ask follow-up questions on any saved artifact. Each follow-up starts a new conversation thread that includes the artifact's visualization spec, data snapshot, and a summary of the original conversation context. The original conversation stays private and unchanged. ## Share artifacts You can share an artifact by copying a link and sending it through any communication channel. When you share a link, you create a pointer to a single artifact object, not a copy. Any account user with the link can open the shared artifact, as long as they have access to the underlying data. When a recipient opens the link: - The artifact runs the SQL query using the recipient's credentials, respecting their role-based access controls (RBAC), row-level security, and column masking. - The artifact appears in the recipient's **Shared with me** tab in the artifacts hub. - The recipient can explore the artifact, ask follow-up questions, and return to it later. Recipients can re-share artifacts they have access to. Recipients can also save a shared artifact to their own **Saved** tab. ### Follow-up conversations about shared artifacts Recipients can ask follow-up questions using the same agent that created the artifact, if they have access to that agent. If they don't have agent access, %sf-intelligence% displays a warning that follow-up questions may not be available or may produce degraded results with a different agent. Follow-up conversations are private to the person asking. No information flows back to the original sharer. ### Revoking access You can unshare an artifact at any time. Unsharing invalidates the link immediately and no one can open it afterward. Admins can't disable artifact sharing using the UI or SQL. To disable sharing for your account, contact your Sales Engineer, Account Executive, or [Snowflake Support](https://community.snowflake.com/s/article/How-To-Submit-a-Support-Case-in-Snowflake-Lodge). ## Security and access control Artifacts follow a caller's-rights model. Every data interaction validates the current user's permissions at runtime. The following security behaviors apply: - **Saved artifacts are user-scoped:** Saved artifacts are private to each user. Other users can only see artifacts that are explicitly shared. - **RBAC is enforced:** Every refresh and share runs the query under the viewer's current role and credentials. Two users with different roles may see different results from the same artifact. - **Ownership is persistent:** Artifacts are tied to the user, not to a specific role or agent. If you lose access to the originating agent, you keep the artifact and can still refresh it as long as you have access to the underlying data. ## Artifact lifecycle Saved artifacts persist until you explicitly delete them. %sf-intelligence% never automatically deletes a saved artifact. The following table describes what happens when access conditions change:
When an agent is no longer available, %sf-intelligence% displays a warning. ## Known limitations - **Single artifacts only:** Currently, you can save and share an individual tile per artifact. Collections of multiple tiles aren't supported. - **No user-level sharing permissions:** Currently, sharing is link-based and public within the account. You can't restrict a shared link to specific users. - **No folders or labels:** Currently, artifacts can't be organized into groups, folders, or labeled for categorization. - **Chart editor only available for some chart types:** The manual chart editor UI is only available for bar, line, pie, and scatter charts. For artifacts containing other chart types, ask %sf-intelligence% to make changes to the chart. --- title: Batch Cortex Search source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/batch-cortex-search.md section: Snowflake Cortex (AI & ML) --- # Batch Cortex Search This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) - [](/user-guide/snowflake-cortex/cortex-search/query-cortex-search-service) - [](/user-guide/snowflake-cortex/cortex-search/cortex-search-customize-scoring) - [](/sql-reference/sql/create-cortex-search) The Batch Cortex Search function is a table function that lets you submit a batch of queries to a Cortex Search Service. It is intended for offline use cases with high throughput requirements, such as entity resolution, deduplication, or clustering tasks. Jobs submitted to a Cortex Search Service with the `CORTEX_SEARCH_BATCH` function leverage additional compute resources to provide significantly higher throughput (queries per second) than the interactive (Python, REST, or SEARCH_PREVIEW) API search query surfaces. Common use cases for batch search include: - **Entity resolution**: Match messy or inconsistent records (such as product names, company names, or addresses) against a canonical reference dataset. - **Audience matching**: Link customer identities across datasets by fuzzy-matching names, emails, or other attributes. - **Deduplication**: Identify and merge near-duplicate records within a single dataset. - **Clustering and categorization**: Assign unstructured items to the closest categories or groups in a taxonomy. - **Bulk enrichment**: Augment large tables with related content retrieved from a knowledge base. ## Syntax Use the following syntax to query a Cortex Search Service in batch mode using the `CORTEX_SEARCH_BATCH` table function: ```sql SELECT q.query, r.* FROM query_table AS q, LATERAL CORTEX_SEARCH_BATCH( service_name => '..', query => q.query, -- optional STRING multi_index_query => q.miq, -- optional VARIANT filter => q.filter, -- optional VARIANT limit => 10, -- optional INT options => q.options -- optional VARIANT ) AS r; ``` ## Parameters The `CORTEX_SEARCH_BATCH` function supports the following parameters:
`service_name` (string, required)
Fully-qualified name of the Cortex Search Service to query.
`query` (string, optional)
Column containing query string for searching the service.
`multi_index_query` (variant, optional)
An object that specifies one or more vector or keyword query inputs to search against the service index. See [multi_index_query](#label-cortex-multi-index-search) for details on how to construct this parameter. For performance reasons, `multi_index_query` currently supports at most one vector index entry in the query array.
`filter` (variant, optional)
Column containing filter objects to apply to the search results.
`limit` (integer, optional)
Maximum number of results to return per query. Default: 10.
`options` (variant, optional)
Column containing a VARIANT object with optional per-query settings. Supported top-level keys include: - `scoring_config` (object, optional): Same structure as the `scoring_config` parameter for interactive Cortex Search queries (Python, REST, or `SEARCH_PREVIEW`). Use it to customize ranking for that row's batch query. See [](#label-cortex-search-customize-scoring). - `replicas` (integer, optional): How many copies of the search index serve that row's batch query. Default: 2. Higher values can improve throughput by completing the job faster.
At least one of `query`, `multi_index_query`, or `filter` must be specified. ## Usage notes - The throughput of the batch search function might vary depending on the amount of data indexed in the queried Cortex Search Service and the complexity of the search queries. Run the function on a small number of queries to measure the throughput for your specific workload. In general, queries to larger services with more filter conditions see lower throughput. - The throughput of the batch search function, the number of search queries processed per second, is not influenced by the size of the warehouse used to query it. - Because batch search spins up dedicated resources to serve each job, it incurs additional startup latency. If you need to run fewer than 2,000 queries, you'll typically get faster results using the interactive Cortex Search API (Python or REST API) rather than batch search. - Unlike the interactive Cortex Search API, the batch search function can query services that are currently suspended in serving. - A single Cortex Search Service can be queried in interactive and batch mode concurrently without any degradation to interactive query performance or throughput. Separate compute resources are used to serve interactive and batch queries. - Reranking is not supported for batch search queries. If your `scoring_config` includes reranker settings, they are ignored. ## Cost considerations Batch search has three cost components:
**Serving cost**
A charge based on the size of the search index data and the duration of the batch search job, excluding the startup time. Higher `replicas` values complete jobs faster but don't increase total serving cost because the same machine hours are consumed over a shorter duration.
**Query embedding cost**
A charge for the number of tokens embedded as a result of the input queries. Unlike interactive Cortex Search, query embedding is not free for batch search.
**Virtual warehouse cost**
A charge for the virtual warehouse compute used to run the batch job.
For usage tracking, see the [CORTEX_SEARCH_BATCH_QUERY_USAGE_HISTORY](/sql-reference/account-usage/cortex_search_batch_query_usage_history) Account Usage view. For more information on Cortex Search costs, see [](#label-cortex-search-cost-considerations). ## Regional availability Batch search is available in all regions where Cortex Search is available. See [](#label-cortex-search-overview-regional-availability) for a full list of supported regions. ## Example Usage In this example, match products in a user-submitted order form to a "golden" product catalog. The `CORTEX_SEARCH_BATCH` call uses `options` so embeddings are computed without the default search query prefix; see [](#label-cortex-search-disable-query-prefix). Use that setting only when you have evaluated the impact on result quality. ```sql -- Create the golden product catalog with canonical product names CREATE OR REPLACE TABLE golden_catalog (product_name TEXT); INSERT INTO golden_catalog VALUES ('Wireless Bluetooth Headphones'), ('Wireless Noise-Canceling Earbuds'), ('USB-C Charging Cable 6ft'), ('Portable Power Bank 10000mAh'); -- Create Cortex Search Service on the golden catalog CREATE CORTEX SEARCH SERVICE golden_product_service ON product_name WAREHOUSE = TARGET_LAG = '1 day' AS SELECT product_name FROM golden_catalog; -- Create a table of user-submitted products (may contain variations or typos) CREATE OR REPLACE TABLE submitted_products (product TEXT); INSERT INTO submitted_products VALUES ('bluetooth headphones wireless'), ('usb c cable'); -- For each user-submitted product, query the service for the two closest golden results SELECT q.product, s.* FROM submitted_products AS q, LATERAL CORTEX_SEARCH_BATCH( service_name => 'golden_product_service', query => q.product, limit => 2, options => OBJECT_CONSTRUCT( 'scoring_config', OBJECT_CONSTRUCT( 'disable_vector_embedding_query_prefix', true ) ) ) AS s; ``` The following example uses `multi_index_query` to submit precomputed embeddings as the query input instead of raw text. Here, the source table `my_db.my_schema.product_embeddings` contains a column `embedding` with precomputed vectors, and the Cortex Search Service `my_db.my_schema.golden_product_service` was created with a bring-your-own-vector (BYOV) configuration. For details on constructing `multi_index_query`, see [multi_index_query](#label-cortex-multi-index-search). ```sql SELECT q.product_name, s.* FROM ( SELECT product_name, embedding::ARRAY AS emb_arr FROM my_db.my_schema.product_embeddings LIMIT 100000 ) q, LATERAL CORTEX_SEARCH_BATCH( service_name => 'my_db.my_schema.golden_product_service', multi_index_query => OBJECT_CONSTRUCT( 'EMBEDDING', ARRAY_CONSTRUCT( OBJECT_CONSTRUCT('vector', q.emb_arr) ) ), limit => 5 ) s; ``` --- title: Build agents source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/snowflake-cowork/build-agents.md section: Snowflake Cortex (AI & ML) --- # Build agents This feature is not available in the People's Republic of China. You can build agents for %sf-intelligence% using the following methods: - In %sf-web-interface% - Using the [Agent object REST API](/user-guide/snowflake-cortex/cortex-agents-rest-api) - With the [Cortex Agents SQL](/sql-reference/commands-cortex-agent) commands The following sections provide information about how to build agents for %sf-intelligence% using SQL commands. Each section provides information about a different part of the agent configuration. The final section shows an example of an agent configuration that includes all of the components described in this topic. For more information about the other methods to create an agent and the options available, see [](/user-guide/snowflake-cortex/cortex-agents-manage). ## Agent structure An agent consists of the following parts: - The base model that provides the foundation for the agent's behavior - A model (the orchestrator) that interprets intent, selects the right tools, and plans the sequence of actions - Instructions for the agent's behavior - Tools for the agent to use - Resources for the tools The following sections provide information about model selection and tool configuration. This example uses a semantic view, a Cortex Search service, and a custom tool to provide answers. Although you can create a basic agent that doesn't use any of these tools, that basic agent can only use the base model to provide answers. As a result, the agent lacks access to data within your Snowflake account and has limited context for answers. For information about the other components of the agent, see [](/user-guide/snowflake-cortex/cortex-agents). ## Prerequisites To create a Cortex Agent, you must use a role with the following privileges: | Privilege | Object | Notes | | ------------ | ---------------- | ------------------------------------------------------------------------- | | CREATE AGENT | Schema | Required to create the Cortex Agent. | | USAGE | Database, schema | Required to create the Cortex Agent in the specified database and schema. | The following code grants the necessary privileges to create a Cortex Agent:
```sql GRANT USAGE ON DATABASE to ROLE ; GRANT USAGE ON SCHEMA . to ROLE ; GRANT CREATE AGENT ON SCHEMA . to ROLE ; ```
In addition to the privileges required to create a Cortex Agent, the following prerequisites are necessary to connect the agent to specific tools: -
A semantic view to connect to the agent
For information about creating a semantic view, see [](/user-guide/views-semantic/overview).
-
A Cortex Analyst tool to connect to the agent
For information about creating a Cortex Analyst tool, see [](/user-guide/snowflake-cortex/cortex-analyst).
- Unstructured data in a database to connect to the agent -
A Cortex Search tool to connect to the agent
For information about creating a Cortex Search tool, see [](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview).
-
A custom tool to connect to the agent
For information about creating user-defined functions (UDFs) and stored procedures to use as custom tools, see [](/developer-guide/extensibility).
To attach tools to an agent, the role that is used to create the agent must have the following privileges: | Privilege | Object | Notes | | --------- | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | USAGE | Cortex Search service | Required to add the Cortex Search service to the Cortex Agent. | | SELECT | Table/View | Required to access the objects referenced in the agent's semantic view/model. | | USAGE | Tools | Required to access all of the custom tools to attach to the agent. For example, if the custom tool is a stored procedure, then the you must have USAGE on the procedure. | | USAGE | Semantic view/model | Required to access the semantic view/model to attach to the agent. | ## Agent configuration basics When you create an agent, you must specify information about the agent, such as the name, description, and model. You can also specify the tools that the agent can use and the resources that the agent can access. These resources are passed as a YAML specification in the `FROM SPECIFICATION` clause of the `CREATE AGENT` command. The following recommendations provide best practices for this configuration: **Scope agents narrowly:** Before adding tools or writing instructions, define why the agent exists, who it serves, and what specific questions it should answer. This step shapes everything that follows, from tool selection to performance and trust. Snowflake recommends that you narrow the agent's scope to a specific, high-value use case. After an agent proves reliable in one area, you can replicate the pattern for others. For example, you could have one agent to analyze your store's recent sales and marketing data, and another that recommends the best SKUs to pitch to the retailer. **Select the number of tools carefully:** Every agent should have access to only the tools it needs. To determine that, consider the documents or data that the agent needs to fulfill its purpose. If the agent needs to access unstructured data, use [](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview). If the agent needs to access structured data, use [](/user-guide/snowflake-cortex/cortex-analyst). If the agent needs other tools, you can use custom tools. **Write a useful tool description:** These descriptions are used to help the agent understand what the tool does and how to use it. Unclear tool descriptions can create cascading failures and lead to "hallucinations."
To create a useful tool description, follow these guidelines:
- Add a clear and specific tool name that clarifies the tool's domain ("Customer", "Sales") and function ("Analytics", "Search"). - Write a purpose-driven tool description that tells the agent: - What the tool does - Which data it accesses - When to use it - When NOT to use it - Be explicit about the tool's expected inputs. Ambiguous inputs to your tools lead to incorrect tool calls and errors. - Be specific. - Specify the data format. - Provide clear data instructions. - Provide default guidance. - Use consistent terminology.
For more agent configuration recommendations, see [Best Practices for Building Cortex Agents](https://www.snowflake.com/en/developers/guides/best-practices-to-building-cortex-agents/). For guidance on evaluating reliability and behavior, see [Best Practices for Evaluating Cortex Agents](https://www.snowflake.com/en/developers/guides/best-practices-for-evaluating-cortex-agents/). ## Model selection When you create an agent, we recommend that you select **auto** for the model. With this option, Cortex automatically selects the highest quality model for your account, and the quality automatically improves as new models become available. For more information about the available models, see [](#label-snowflake-intelligence-models). The following example shows how to specify the model for the agent: ```yaml models: orchestration: auto ``` ### Cross-region inference Cross-region inference is disabled by default. We recommend using cross-region inference to access the full set of LLMs and avoid limitations within a single region. When using a model that is not available in the local region, you must use Cortex cross-region inference. This setting enables inference requests to be processed in a different region from the default region. The parameter for cross-region inference can only be set at the account level by the ACCOUNTADMIN role, not at the user or session levels. To set the parameter, use the following command: ```sql ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'ANY_REGION'; ``` For more information about configuring Cortex cross-region inference, see [](/user-guide/snowflake-cortex/cross-region-inference). ## Connect semantic views using Cortex Analyst (structured data) %sf-intelligence% supports semantic views, which are a type of structured data with instructions that tell the agent how to query or interpret the data. Cortex Agents use Cortex Analyst to retrieve structured data from semantic views by converting natural language requests into SQL queries. Agents can route across multiple semantic views to provide the response. Each semantic view should cover a similar set of tables. You can set data-specific defaults, such as always adding a date filter for the past three months if not specified or always excluding internal accounts. You can connect a semantic view to an agent by specifying the semantic view as part of the tool resources. The following example shows how to connect a semantic view to an agent and how to specify the Cortex Analyst tool to retrieve structured data from the semantic view: ```yaml tools: - tool_spec: type: "cortex_analyst_text_to_sql" name: "" description: "" tool_resources: : semantic_view: ".." ``` ### Best practices for semantic views Semantic views power how %sf-intelligence% understands and queries your data. A well-designed semantic view improves accuracy, reduces latency, and builds user trust. The following best practices are designed to help you create a semantic view that is accurate and efficient: **Start small and focused:** Begin with 5-10 tables in a single business domain. Organize by use case (Sales Performance, Customer Support Metrics) rather than by data structure. Scale after you validate accuracy. **Write clear descriptions:** Descriptions are the most important element. Every table and column should have a business-friendly description that explains what the data represents, not just its name. Include context like calculation logic, business definitions, and any legacy terminology. **Add verified queries:** These are examples of questions paired with validated SQL. They improve accuracy on similar questions, reduce latency, and help the system learn your business patterns. Start with 10 to 20 queries that cover your most common questions, and add more based on actual usage. **Define metrics and filters:** Pre-define reusable calculations (like total revenue or average order value) and common conditions (like active customers or current fiscal year). These can significantly improve consistency. **Use custom instructions for business logic:** Add SQL generation instructions for data quirks, fiscal year definitions, default filters, or domain-specific rules. Be specific: "If no date filter is provided, default to last 12 months" is better than "filter by date." **Enable Cortex Search for text matching:** For high-cardinality text columns like product names, customer names, or company names, Cortex Search enables fuzzy matching when user input does not exactly match your data. **Test and iterate:** Create an evaluation set of representative questions, measure accuracy, and refine based on real usage patterns. Review suggestions regularly to add verified queries and improve descriptions over time. For more information about best practices for creating semantic views, see [Best Practices for Semantic Views in Cortex Analyst](https://www.snowflake.com/en/developers/guides/best-practices-semantic-views-cortex-analyst/). ## Connect Cortex Search (unstructured data) To process unstructured data, you can connect a Cortex Search tool to an agent by specifying the Cortex Search tool in the YAML specification as part of the tool resources. Cortex Search services retrieve documents and records from unstructured data sources using semantic search. The two primary use cases for Cortex Search are retrieval augmented generation (RAG) and enterprise search. For information about creating a Cortex Search service, see [](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview). You can also use a Cortex Knowledge Extension (CKE) that is shared with you. When you connect a Cortex Search tool to an agent, it is especially important to include the following information about the parameters and their expected values: - Type and format (include examples) - Whether required or optional (with default values) - Valid values or constraints (enums, ranges, formats) - Relationship to other parameters (dependencies, conflicts) - How to obtain the value (especially for IDs) The following example shows how to connect a Cortex Search tool to an agent and how to specify the Cortex Search tool in the YAML specification: ```yaml tools: - tool_spec: type: "cortex_search" name: "" description: "" tool_resources: : name: ".." max_results: "5" filter: "@eq": region: "North America" title_column: "" id_column: "" ``` ## Add custom tools %sf-intelligence% supports custom tools, which are user-defined functions or stored procedures that can be used to implement custom business logic. You can connect a custom tool to an agent by specifying the custom tool in the YAML specification as part of the tool resources. The following example shows how to connect a custom tool to an agent and how to specify the custom tool in the YAML specification: ```yaml tools: - tool_spec: type: "generic" name: "" description: "" tool_resources: : user-defined-function-argument: "argument1" ``` ## Create an agent - Combine all of the tools and components to create an agent using SQL: ```sqlexample-yaml CREATE OR REPLACE AGENT COMMENT = 'agent level comment' PROFILE = '{"display_name": "My Business Assistant", "avatar": "business-icon.png", "color": "blue"}' FROM SPECIFICATION $$ models: orchestration: claude-4-sonnet orchestration: budget: seconds: 30 tokens: 16000 instructions: response: "You will respond in a friendly but concise manner" orchestration: "For any revenue question, use Analyst; for policy questions, use Search" sample_questions: - question: "What was our revenue last quarter?" tools: - tool_spec: type: "cortex_analyst_text_to_sql" name: "" description: "" - tool_spec: type: "cortex_search" name: "" description: "" - tool_spec: type: "data_to_chart" name: "data_to_chart" description: "Generates visualizations from data" tool_resources: : semantic_view: ".." : name: ".." max_results: "5" filter: "@eq": region: "North America" title_column: "" id_column: "" $$; ``` ## Modifying an existing agent For instructions on modifying the configuration for an existing agent, including adding tools and updating other details, see [](#label-snowflake-agents-modify-agents). --- title: CKE document access history source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-knowledge-extensions/cke-access-history.md section: Snowflake Cortex (AI & ML) --- # CKE document access history This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/cortex-knowledge-extensions/cke-overview) - [](/user-guide/snowflake-cortex/cortex-knowledge-extensions/overview-tutorials) - [](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) To help providers know which documents are being accessed in their [Cortex Knowledge Extensions (CKE)](/user-guide/snowflake-cortex/cortex-knowledge-extensions/cke-overview), Snowflake provides the following features: - CKE access history data in the [](/sql-reference/data-sharing-usage/listing-access-history) in the [](#label-share-objects-accessed-array). - A [](/sql-reference/functions/system_encode_cke_primary_key) system function. - A [](/sql-reference/functions/system_cke_hash_function) system function. ## Prerequisites Because [primary keys](#label-cortex-search-primary-keys) define a unique identifier for each document, you must specify a primary key for the [Cortex Search Service](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) to get the access history. Modifying the primary key columns of an existing Cortex Search Service invalidates the previous CKE access history. To interpret the previous CKE access history, save a mapping from the old primary key columns to the new primary key columns. ## Understand document IDs Document IDs are composed of Cortex Search Service [primary keys](#label-cortex-search-primary-keys). To protect customer data, Snowflake encodes and hashes the primary key columns when tracking the access history. You can map the primary keys to the provided hashed document ID using the following functions: - [](/sql-reference/functions/system_encode_cke_primary_key) function: Transform and anonymize the primary key from the set of selected columns. - [](/sql-reference/functions/system_cke_hash_function) function: Hash the primary key. ## Example CKE access history in the LISTING_ACCESS_HISTORY view This example performs the following actions: - Retrieves only CKE access information from the [](/sql-reference/data-sharing-usage/listing-access-history) view and excludes all other events - Uses the [](/sql-reference/functions/system_encode_cke_primary_key) function to build an encoded representation of the CKE document's primary key columns - Retrieves the hash version and uses the [](/sql-reference/functions/system_cke_hash_function) to compute a hashed document ID for every primary key - Joins the computed hashed IDs and versions to the view to recover the original primary key columns Step 1. Create a daily access summary table that retrieves only CKE access information. ```sql CREATE TABLE IF NOT EXISTS cke_document_daily_access AS SELECT query_date, consumer_account_name, consumer_name, hashed_doc_id, hash_version, total_access_count FROM ( SELECT query_date, consumer_account_name, consumer_name, flattened.value::string AS hashed_doc_id, lah.share_objects_accessed[0]:"hashVersion"::string AS hash_version, COUNT(*) AS total_access_count FROM snowflake.data_sharing_usage.listing_access_history AS lah, LATERAL FLATTEN( input => lah.share_objects_accessed[0]:"hashedDocumentIds" ) AS flattened WHERE lah.share_objects_accessed[0]:"objectDomain" = 'Cortex Search Service' AND lah.share_objects_accessed[0]:"hashVersion" IS NOT NULL GROUP BY query_date, consumer_account_name, consumer_name, hashed_doc_id, hash_version ); ``` Step 2. Create a table to store the encoded primary keys. ```sql CREATE TABLE IF NOT EXISTS encoded_primary_keys AS ( SELECT pkCol1, pkCol2, SYSTEM$ENCODE_CKE_PRIMARY_KEY(pkCol1, pkCol2) AS encoded_primary_key FROM your_cortex_search_table ) ``` Step 3. From the table you created in the previous step, prepare hash versions and compute hashed IDs for your primary keys. Then join the `cke_document_daily_access` table with the hashed primary key view to recover the original primary key columns. ```sql WITH hash_versions AS ( SELECT DISTINCT hash_version AS hash_version FROM cke_document_daily_access ), hashed_primary_key AS ( SELECT pkCol1, pkCol2, hash_version, SYSTEM$CKE_HASH_FUNCTION(hash_version, encoded_primary_key) AS hashed_doc_id FROM encoded_primary_keys CROSS JOIN hash_versions ) SELECT pk.pkCol1, pk.pkCol2, a.query_date, a.consumer_account_name, a.consumer_name, a.total_access_count FROM cke_document_daily_access AS a JOIN hashed_primary_key AS pk ON a.hashed_doc_id = pk.hashed_doc_id AND a.hash_version = pk.hash_version; ``` --- title: Configure and interact with Agents source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-manage.md section: Snowflake Cortex (AI & ML) --- # Configure and interact with Agents This feature is not available in the People's Republic of China. You can build an agent with the following methods: - In %sf-web-interface% - Using the [Agents REST API](/user-guide/snowflake-cortex/cortex-agents-rest-api) - With the [Cortex Agents SQL](/sql-reference/commands-cortex-agent) commands You can then integrate the agent into your application to perform tasks or respond to queries. You must first create an agent object that contains information such as the metadata, tools, and orchestration instructions that the agent can use to perform a task or answer questions. You can then reference the agent object in your application to integrate the agent's functionality. You can configure a thread to maintain the context in memory, so that the client does not have to send the context at every turn of the conversation. Snowflake REST APIs support authentication via programmatic access tokens (PATs), key pair authentication using JSON Web Tokens (JWTs), and OAuth. For details, see [](/developer-guide/snowflake-rest-api/authentication). ## Create an agent Create an agent object by specifying the database and schema where the agent should be located, along with a name and description for the agent. In addition, specify the display name, avatar, and the color. These attributes are used by the client application to display the agent. The display name is also used as the handle to reference the agent in conversations. For best practices when creating an agent, see [Best Practices to Building Cortex Agents](https://www.snowflake.com/en/developers/guides/best-practices-to-building-cortex-agents/). The following examples show how to create an agent object from %sf-web-interface% or using the REST API:
**Method 1: Snowsight UI:** 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **AI & ML** %raa% **Agents**. 3. Select **Create agent**. 4. For **Agent object name**, specify a name for the agent that is displayed to users in the UI. 5. For **Display name**, specify a name for the agent that is displayed to admins in the agent list. 6. Select **Create agent**. 7. Prompt the agent with general knowledge requests. **Method 2: REST API:** 1. Create an agent object by specifying the database and schema where the agent will be created, as well as the parameters needed for the agent. You can also specify tool fields when creating the agent object. ```bash curl -X POST "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases//schemas//agents" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "name": "TransportationAgent", "comment": "This agent handles queries related to transportation methods and costs.", "models": { "orchestration": "claude-4-sonnet" } }' ``` **Method 3: SQL:** Create an agent object in the database and schema where the agent will be created. You can specify the agent properties and specification using the `FROM SPECIFICATION` clause in the CREATE AGENT command. For more information, see [](/sql-reference/sql/create-agent). ```sqlexample-yaml CREATE OR REPLACE AGENT myagent COMMENT = 'agent level comment' PROFILE = '{"display_name": "My Business Assistant", "avatar": "business-icon.png", "color": "blue"}' FROM SPECIFICATION $$ orchestration: budget: seconds: 30 tokens: 16000 instructions: response: "You will respond in a friendly but concise manner" orchestration: "For any revenue question use Analyst; for policy use Search" sample_questions: - question: "What was our revenue last quarter?" tools: - tool_spec: type: "cortex_analyst_text_to_sql" name: "Analyst1" description: "Converts natural language to SQL queries for financial analysis" - tool_spec: type: "cortex_search" name: "Search1" description: "Searches company policy and documentation" - tool_spec: type: "data_to_chart" name: "data_to_chart" description: "Generates visualizations from data" tool_resources: Analyst1: semantic_view: "db.schema.semantic_view" Search1: name: "db.schema.service_name" max_results: "5" filter: "@eq": region: "North America" title_column: "" id_column: "" columns_and_descriptions: TEXT: description: "The main text content of the document" type: "string" searchable: true filterable: false CATEGORY: description: "Document category. Values include: policy, guide, reference." type: "string" searchable: false filterable: true $$; ```
## Add tools After you've created the agent, you need to add tools and provide instructions on how to orchestrate across the tools. Agents support the following tool types:
- **Cortex Analyst:** You specify the semantic views so that Cortex Analyst can use these to retrieve structured data. The Agents can route across multiple semantic views to provide the response. When Cortex Analyst is invoked by an agent, it does not have access to open source LLM models. For a list of the models that Cortex Analyst can use when invoked by an agent, see the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf). - **Cortex Search:** You provide the Cortex Search indices as tools, along with column descriptions for filterable and searchable columns. The Cortex Agent uses the Cortex Search indices to retrieve unstructured data. - **Data to Chart:** You can enable the agent to automatically generate visualizations from data. When included in the tools array, the agent can create charts using Vega-Lite specifications in response to queries that would benefit from visual representation. - **Custom tools:** You can implement code for a specific business logic as a stored procedure or user defined function (UDF). Alternatively, you can use the custom tools to retrieve data from your backend systems using APIs. - **Web Search:** You can enable the agent to search the web and use those search results to generate responses and plan tasks.
You also specify the resources used by each tool. For example, on Cortex Analyst you specify the warehouse along with the timeout for SQL query execution. Similarly for Cortex Search, you specify the filters and column names used in the search query, along with the max results in the search response. For custom tools, you will provide the warehouse details.
**Method 1: Snowsight UI:** To modify the configuration for an existing agent, follow these steps: 1. In the navigation menu, select **AI & ML** %raa% **Agents**. 2.
From the list of agents, select the agent that you want to modify.
The configuration details for the agent are displayed.
3. Select **Edit**. 4. For **Description**, describe the agent and how users can interact with it. 5. To add sample questions that users can ask the agent, enter a sample question and select **Add a question**. 6. Select **Tools**. Add one or more of the following tools.
- **To add a semantic view in Cortex Analyst to the agent**: This section assumes that you already have a semantic view created. For information about semantic views and how to create one, see [](/user-guide/views-semantic/overview). 1. Find **Cortex Analyst** and select the respective **+ Add** button. 2. For **Name**, enter a name for the semantic view. 3. Select **Semantic view**. 4. Select the semantic view that the agent uses. 5. For **Warehouse**, select the warehouse that the agent uses to run queries. 6. For **Query timeout (seconds)**, specify the maximum time in seconds that the agent waits for a query to complete before timing out. 7. For **Description**, describe the semantic view. 8. Select **Add**. - **To add a Cortex Search service to the agent**: This section assumes that you've already created a Cortex Search service. For information about creating a Cortex Search service, see [](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview). You can also use a Cortex Knowledge Extension (CKE) that is shared with you. For a tutorial that uses a CKE, see [](/user-guide/snowflake-cortex/cortex-knowledge-extensions/tutorials/add-cke-to-snowflake-cowork-tutorial). 1. Find **Cortex Search Services** and select the respective **+ Add** button. 2. For **Name**, enter a name for the Cortex Search service. 3. For **Description**, describe the Cortex Search service. 4. For **Search service**, select the Cortex Search service that the agent uses. 5. Under **Tool details**, add **Columns Description** to help the agent effectively use the search service. Column descriptions are not required for all columns, but providing them for filterable and searchable columns is recommended to improve the quality of results. Provide a description that explains the column's content and sample values. 6. Select **Add**. - **To add a custom tool to the agent**: By adding custom tools, you can extend the functionality of your agents. With custom tools, the agent can call stored procedures and functions that you have defined to perform actions or do computations. This section assumes that you've already created a custom tool. For information about procedures and functions, see [](/developer-guide/extensibility).
1. Find **Custom tools** and select the respective **+ Add** button. 2. For **Name**, enter a name for the custom tool. 3. For **Resource type**, select whether the custom tool is a function or a procedure. For information about whether to use a function or procedure, see [](/developer-guide/stored-procedures-vs-udfs). 4. For **Custom tool identifier**, select the existing function or procedure that you want to add as a custom tool. 5. The related parameters for the function or procedure automatically appear. You can manually add parameters for the custom tool by adding a name, type, description, and selecting whether the parameter is required. You can also modify parameters that automatically populate. Snowflake Cortex does not support stored procedures and custom tools with a parameter of type `object`. 6. For **Warehouse**, select the warehouse that the agent uses to run the custom tool. You must manually select a warehouse. 7. For **Description**, describe the custom tool and how to use it. 8. Select **Add**. 9. After creating the custom tool, make sure users are granted USAGE privileges to the function or procedure that you added as a custom tool. When using stored procedures, agents maintain whether the procedure runs with owner's or caller's rights. For information about owner's and caller's rights, see [](/developer-guide/stored-procedure/stored-procedures-rights).
- **To add web search tool to the agent**: This section assumes that you've already enabled web search at the account level. For information about enabling web search at the account level, see [](#label-cortex-agents-web-search). 1. Find **Web search** and select the respective toggle to enable the feature.
7. Select **Save**. **Method 2: REST API:** To add tools to an agent using the REST API, add the following payloads as part of a request to [](#label-snowflake-agents-rest-api-update). You can also specify these fields when creating the agent object.
- **Add Cortex Analyst tool and tool resources**: The following example shows how to add a Cortex Analyst tool and tool resources to an existing agent object. 1. Add a Cortex Analyst tool ```bash curl -X PUT "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases//schemas//agents/" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "tools": [ { "tool_spec": { "description": "Analyst to analyze price", "type": "cortex_analyst_text_to_sql", "name": "Analyst1" } } ] }' ``` 2. Add a Cortex Analyst tool resource ```bash curl -X PUT "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases//schemas//agents/" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "tool_resources": { "Analyst1": { "semantic_model_file": "stage1", "semantic_view": "The name of the Snowflake native semantic model object", "execution_environment": {"type":"warehouse", "warehouse":"my_wh"} } } }' ``` - **Add Cortex Search tool and tool resources**: The following example shows how to add a Cortex Search tool and tool resources to an existing agent object. 1. Add a Cortex Search tool ```bash curl -X PUT "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases//schemas//agents/" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "tool_spec": { "type": "cortex_search", "name": "Search1" } }' ``` 2. Add a Cortex Search tool resource: ```bash curl -X PUT "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases//schemas//agents/" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "tool_resources": { "Search1": { "search_service": "db.schema.service_name", "filter": {"@eq": {"region": "North America"} }, "max_results": 10, "title_column": "TITLE", "columns_and_descriptions": { "TEXT": { "description": "The main text content of the document", "type": "string", "searchable": true, "filterable": false }, "CATEGORY": { "description": "Document category. Values include: policy, guide, reference.", "type": "string", "searchable": false, "filterable": true }, "AUTHOR": { "description": "Author name in format: firstname.lastname", "type": "string", "searchable": false, "filterable": true } } } } }' ``` The `columns_and_descriptions` field is a map of column names to column properties. Descriptions are not required for all columns, but providing them for filterable and searchable columns improves the quality of results. Each column entry must include: - `description` (string): A description of the column content and sample values. Include guidance on when and how to filter on this column. - `type` (string): The column data type. Use `"string"` or `"datetime"`. - `searchable` (boolean): Set to `true` for text index columns that can be searched. Vector index columns are not supported. - `filterable` (boolean): Set to `true` for attribute columns that can be used in filter conditions. - **Add data_to_chart tool**: The following example shows how to add the data to chart tool to an existing agent object. 1. Add the data_to_chart tool ```bash curl -X PUT "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases//schemas//agents/" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "tools": [ { "tool_spec": { "type": "data_to_chart", "name": "data_to_chart", "description": "Generates visualizations from data" } } ] }' ``` - **Add custom tool and tool resources**: The following example shows how to add a custom tool and tool resources to an existing agent object. 1. Add a custom tool ```bash curl -X PUT "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases//schemas//agents/" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "tools": [ { "tool_spec": { "description": "Custom tool", "type": "generic", "name": "custom1" } } ] }' ``` 2. Add a custom tool resource ```bash curl -X PUT "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases//schemas//agents/" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "tool_resources": { "Custom1": { "user-defined-function-argument": "argument1" } } }' ``` - **Add web_search tool**: The following example shows how to add the web_search tool to an existing agent object. This section assumes that you've already enabled web search at the account level. For information about enabling web search at the account level, see [](#label-cortex-agents-web-search). 1. Add the web_search tool ```bash curl -X PUT "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases//schemas//agents/" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "tools": [ { "tool_spec": { "type": "web_search", "name": "Web Search", } } ] }' ```
**Method 3: SQL:** You can update an agent object to add tools and tool resources using the ALTER AGENT command. For information about the ALTER AGENT command, see [](/sql-reference/sql/alter-agent). The new specification completely replaces the existing one. Fields that are not included in the new specification are removed. ```sqlexample-yaml ALTER AGENT MODIFY LIVE VERSION SET SPECIFICATION = $$ models: orchestration: claude-4-sonnet orchestration: budget: seconds: 30 tokens: 16000 instructions: response: "You will respond in a friendly but concise manner" orchestration: "For any revenue question use Analyst; for policy use Search" sample_questions: - question: "What was our revenue last quarter?" tools: - tool_spec: type: "cortex_analyst_text_to_sql" name: "Analyst1" description: "Converts natural language to SQL queries for financial analysis" - tool_spec: type: "cortex_search" name: "Search1" description: "Searches company policy and documentation" - tool_spec: type: "data_to_chart" name: "data_to_chart" description: "Generates visualizations from data" tool_resources: Analyst1: semantic_view: "db.schema.semantic_view" Search1: name: "db.schema.service_name" max_results: "5" filter: "@eq": region: "North America" title_column: "" id_column: "" $$; ```
## Specify orchestration Cortex Agents orchestrate the task by breaking it into a sequence of sub-tasks and identifying the right tool for each sub-task. You specify the LLM that the Agent should use to conduct this orchestration. You can also influence the orchestration by providing instructions. For example, consider an agent built to respond to retail product questions. You can use the orchestration instruction `"Use the search tool for all requests related to refunds"` to ensure the Agent only provides refund policy details (using Cortex Search) and does not actually calculate the refund amounts (using Cortex Analyst). You can also specify instructions to align the response to a brand or a tone, such as `"Always provide provide a concise response; maintain a friendly tone"`. **Method 1: Snowsight UI:** 1. Select **Orchestration**. 2. For the **Orchestration model**, select the model that the agent uses to handle orchestration. 3. For **Planning instructions**, provide instructions that influence tool selection by the agent based on user-provided input. These can include specific instructions about when to use each tool, or even to always use a tool at the beginning or end of a response. 4. For **Response instruction**, provide instructions that the model uses for response generation. For example, specify if you want the agent to prioritize chart creation, or to keep a certain tone with users. 5. For **Budget configuration**, you can specify time limit and token limit for the agent. The budget is the maximum amount of time or tokens that the agent can use to generate a response. After either one of the limits is reached, the agent will stop generating a response. Token limits are used only for orchestration and don't include tokens used by Cortex Analyst, Cortex Search, and other tools invoked. 6. Select **Save**. **Method 2: REST API:**
To update an agent using the REST API, add the following payloads as part of a request to [](#label-snowflake-agents-rest-api-update). You can also specify these fields when creating the agent object. The following procedure shows how to update the agent with planning and response instructions, and specify the LLM model used for orchestration.
1. Update the LLM model ```bash curl -X PUT "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases//schemas//agents/" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "models": { "orchestration": "llama3.3-70B" }' ``` 2. Specify the planning and response instructions ```bash curl -X PUT "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases//schemas//agents/" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "instructions": { "response": "Always provide a concise response and maintain a friendly tone.", "orchestration": "" } }' ``` **Method 3: SQL:** You can update an agent object to add orchestration information using the ALTER AGENT command. For information about the ALTER AGENT command, see [](/sql-reference/sql/alter-agent). ```sqlexample-yaml ALTER AGENT MODIFY LIVE VERSION SET SPECIFICATION = $$ models: orchestration: claude-4-sonnet orchestration: budget: seconds: 30 tokens: 16000 instructions: response: "You will respond in a friendly but concise manner" orchestration: "For any revenue question use Analyst; for policy use Search" sample_questions: - question: "What was our revenue last quarter?" tools: - tool_spec: type: "cortex_analyst_text_to_sql" name: "Analyst1" description: "Converts natural language to SQL queries for financial analysis" - tool_spec: type: "cortex_search" name: "Search1" description: "Searches company policy and documentation" - tool_spec: type: "data_to_chart" name: "data_to_chart" description: "Generates visualizations from data" tool_resources: Analyst1: semantic_view: "db.schema.semantic_view" Search1: name: "db.schema.service_name" max_results: "5" filter: "@eq": region: "North America" title_column: "" id_column: "" $$; ``` ## Set up access to the agent By default, Cortex Agents uses the user's default role and the default warehouse. If another user is using the agent, make sure that they've done the following: - Set a default role - Set a default warehouse - Granted USAGE on the agent to the default role For information about granting usage, see [](#label-cortex-agents-access-control). You must use the user's default role when calling or updating Cortex Agents. To allow another role to use the agent, grant USAGE on the agent to that role: ```sql GRANT USAGE ON AGENT .. TO ROLE ; ``` Set up access policies from Snowsight UI or using SQL so that users can access the Agent. Specify the role to provide access to the Agent. **Method 1: Snowsight UI:** 1. Select **Access**. 2. To give a role access to the agent, select **Add role**, then select the role from the dropdown menu. 3. Select **Save**. **Method 2: SQL:** ```sql GRANT USAGE ON AGENT myagent TO ROLE test_rl; ``` ## Review the agent After you have built the Agent, you can review the Agent to verify all parameters. **Method 1: Snowsight UI:** When reviewing agents from %sf-web-interface%, you can only view agents in the Agent Admin UI. You cannot view agents in the database object explorer. 1. In the navigation menu, select **AI & ML** %raa% **Agents**. 2. From the list of agents, select the agent that you want to view the details for. This opens a new page that gives an overview of the agent details. 3. To review all agent details, select **Next**. **Method 2: REST API:** You can list and describe agents using the REST APIs. 1. List all agents. ```bash curl -X GET "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases/{database}/schemas/{schema}/agents:" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ ``` 2. Describe the desired agent. ```bash curl -X GET "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases/{database}/schemas/{schema}/agents/{name}:" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ ``` **Method 3: SQL:** You can list and describe agents using SQL. 1. List all agents. ```sql SHOW AGENTS IN ACCOUNT; ``` 2. Describe the desired agent. ```sql DESCRIBE AGENT myagent; ``` ## Test the agent After you've created the agent, you can test it to see how it responds to user queries. You can also test the agent using [](#label-snowflake-data-agents-run-rest-api). To test the agent, follow these steps: 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **AI & ML** %raa% **Agents**. 3. Select the agent from the list of agents. 4. On the agent details page, enter a query in the agent playground. 5. Verify that the agent responds to the query as expected. If the agent does not respond as expected, modify the agent's configuration by following the steps in [](#label-snowflake-agents-modify-agents). ## Interact with the agent After creating the agent object, you can integrate the agent directly into your application using the REST API. To maintain context during the interaction, use a thread. The agent object and thread combined simplify the client application code. ### Create a thread Create a thread to maintain the context during a conversation. When the thread is created successfully, the system returns a `Thread ID`. ```bash curl -X POST "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/cortex/threads" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "origin_application": , }' ``` ### Send a request to the agent To interact with the Agent, you must pass the agent object, thread ID, and a unique `parent_message_id` as part of your REST API request. The initial `parent_message_id` should be `0`. ```bash curl -X POST "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases/{database}/schemas/{schema}/agents/{name}:run" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "thread_id": , "parent_message_id": , "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What are the projected transportation costs for the next three quarters? " } ] } ], "tool_choice": { "type": "required", "name": [ "Analyst1", "Search1" ] } }' ``` ## Collect feedback about the agent You can collect feedback from users about the responses given by the agent. This feedback can help you refine the agent as you iterate on your use case. Users can provide an objective rating (postive/negative), as well as more subjective detail with a message. Also, users can classify the feedback across one of many categories. ```bash curl -X POST "$SNOWFLAKE_ACCOUNT_BASE_URL/api/v2/databases//schemas//agents/:feedback:" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "request_id": "", "positive": true "feedback_message": "This answer was great", "categories":[ "category1", "category2", "category3" ], "thread_id": "" }' ``` ## Interact without an agent object In some cases, you may want to get started with Cortex Agents by using `agent:run` without an agent object. For example, this may be useful when you want to quickly try out a use case. For more information about the REST API, see [](#label-snowflake-lite-agents-run-rest-api). When interacting with an agent without creating an agent object, you must manually maintain the context for the agent with every request. --- title: Copy `arctic-extract` models between databases, schemas, and accounts source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/copy-arctic-extract-models.md section: Snowflake Cortex (AI & ML) --- # Copy `arctic-extract` models between databases, schemas, and accounts - [](/user-guide/account-replication-intro) - [](/sql-reference/sql/create-model) - [](/sql-reference/sql/alter-model-add-version) - [](/sql-reference/sql/create-replication-group) This topic explains how to copy fine-tuned `arctic-extract` models between databases or schemas in the same account or between different accounts in the same organization. For example, you might want to copy a model from a development account to a production account. ## Copy models between databases and/or schemas within an account 1. Create the model from the source model using the role that created the source model: To list the versions in a model, use [](/sql-reference/sql/show-versions-in-model). ```sql CREATE MODEL prod_db.prod_schema.invoices_model WITH VERSION V1 FROM MODEL dev_db.dev_schema.invoices_source_model VERSION V1; ``` 2. Optional: Add another version of the model: ```sql ALTER MODEL prod_db.prod_schema.invoices_model ADD VERSION V2 FROM MODEL dev_db.dev_schema.invoices_source_model VERSION V2; ``` 3. To enable the `prod_role` role to use the copied model, grant the OWNERSHIP privilege on the model to that role: ```sql GRANT OWNERSHIP ON MODEL prod_db.prod_schema.invoices_model TO ROLE prod_role; ``` ## Copy models between accounts You can replicate a model from a source account to one or more target accounts in the same organization. For more information about replication, see [](/user-guide/account-replication-intro). To replicate the model from a source account to a target account, you need to create a replication group in the source account to enable replication of the database in which the model was created to a target account, and set up the production user role. You must be a user with the ACCOUNTADMIN role to create a replication group and to set up the production user role. ### Replicate the database in which the model was created 1. Create a primary replication group in the source account: ```sql CREATE REPLICATION GROUP models_replication_group OBJECT_TYPES = DATABASES ALLOWED_DATABASES = dev_db ALLOWED_ACCOUNTS = org.production_account; ``` 2. Create a secondary replication group in a target account as a replica of the primary replication group in the source account: ```sql CREATE REPLICATION GROUP models_secondary_replication_group AS REPLICA OF org.dev_account.models_replication_group; ``` 3. Refresh the database in the target account from the source account: ```sql ALTER REPLICATION GROUP models_secondary_replication_group REFRESH; ``` 4. Optional: Specify the schedule for refreshing the secondary replication group so that the account is synchronized automatically every 10 minutes: ```sql ALTER REPLICATION GROUP models_secondary_replication_group SET REPLICATION_SCHEDULE = '10 MINUTE'; ``` ### Set up the production user role To ensure that the user working on the target production account (for example, a user with the `prod_role` role) can use the replicated model, follow these steps: 1. Grant the USAGE privilege on the source database and schema, and ownership on all models in that schema, to the `prod_role` role: ```sql GRANT USAGE ON DATABASE dev_db TO ROLE prod_role; GRANT USAGE ON SCHEMA dev_db.dev_schema TO ROLE prod_role; GRANT OWNERSHIP ON ALL MODELS IN SCHEMA dev_db.dev_schema TO ROLE prod_role; ``` 2. Optional: Grant ownership on all the future models that will be replicated: ```sql GRANT OWNERSHIP ON ALL FUTURE MODELS IN SCHEMA dev_db.dev_schema TO ROLE prod_role; ``` After you grant the required privileges, a user with the `prod_role` role must follow these steps: 1. Create the model from the source model: ```sql CREATE MODEL prod_db.prod_schema.invoices_model WITH VERSION V1 FROM MODEL dev_db.dev_schema.invoices_source_model VERSION V1; ``` 2. Optional: Add another version of the model: ```sql ALTER MODEL prod_db.prod_schema.invoices_model ADD VERSION V2 FROM MODEL dev_db.dev_schema.invoices_source_model VERSION V2; ``` The model in the target schema is a separate model object from the model in the replicated database. New versions are not copied automatically; you must add each version using [](/sql-reference/sql/alter-model-add-version). --- title: Cortex Agent code execution tool source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-code-execution-tool.md section: Snowflake Cortex (AI & ML) --- # Cortex Agent code execution tool - [](/user-guide/snowflake-cortex/cortex-agents) - [](/user-guide/snowflake-cortex/cortex-agents-manage) The Cortex Agent code execution is a built-in tool that enables an agent to execute code during a conversation. With access to a code execution tool enabled, your agents can execute scripts to process data, perform calculations, and produce visualizations. By default, the code execution tool runs in a sandboxed, isolated environment that can only access data in the current agent session. You enable the code execution tool by configuring it in an agent specification. The agent then decides during orchestration when to generate and run code based on the user's query. The code execution tool is also used when executing Python scripts as part of an agent skill. ## How the code execution works The agent uses the code execution alongside other configured tools and skills. During orchestration, the agent evaluates the user's query and determines whether code execution is the best approach. If so, the agent invokes the code execution tool. The agent then generates code and executes it in a secure sandbox. By default, the code execution tool environment is isolated and can only access data passed into the session. Additional read or write permissions can be granted in the agent specification. ### Default access scope The code execution tool's sandbox persists for a single session. Data provided in the conversation context is what's available to the code execution tool for operation. The sandbox persists imports, variables, and intermediate results across multiple executions within the session. ## Enabling the code execution tool To use the code execution tool with a Cortex Agent, the agent must have both the required access control permissions and agent specification section describing the code execution tool. ### Required Cortex Agent permissions The following permissions on a Cortex Agent affect both your ability to configure and query the agent:
| Privilege | Required for | | --------- | ------------------------------------------------------------------------------ | | USAGE | Allows invoking the agent, including code execution tool use | | MODIFY | Changing an agent specification to enable or configure the code execution tool | | OWNERSHIP | Full control over agent configuration and use |
### Agent specification You enable the code execution tool by adding the resources and configuration for it to an agent specification. The tool definition to add in the `tools` section of your agent specification is: ```yaml tools: - tool_spec: type: code_execution name: code_execution ``` Enable the tool by adding a `code_execution` section to `tool_resources` in your agent specification: ```yaml tool_resources: code_execution: ``` For full information on the agent specification format and instructions on how to modify an existing agent's specification, see [](/user-guide/snowflake-cortex/cortex-agents-manage). ## Default available libraries The default execution environment for the code execution tool uses Python 3.12, with the Python standard library available. The following additional libraries are also available by default:
| Library | Version | | -------- | -------- | | `numpy` | **TKTK** | | `pandas` | **TKTK** |
## Adding libraries through Artifact Repository You can use the Snowflake default [Artifact Repository](/developer-guide/udf/python/udf-python-packages) to retrieve packages from PyPI in the code execution tool environment. Add the `artifact_repositories` key to the `code_execution` resources in your agent specification, as a list containing an entry for `SNOWFLAKE.SNOWPARK.PYPI_SHARED_REPOSITORY`: ```yaml tool_resources: code_execution: artifact_repositories: - SNOWFLAKE.SNOWPARK.PYPI_SHARED_REPOSITORY ``` To access the PyPI repository, you must also assign the role `SNOWFLAKE.PYPI_REPOSITORY_USER` to the owner of the Cortex Agent. This gives the code execution tool access to retrieve any package published on PyPI. Use caution when granting this level of access. ## Enabling external access You can enable the code execution to access external endpoints over the internet by creating a network rule and external access integration, and then providing information on which integrations the code execution tool has access to in the agent specification. The following example demonstrates the creation of a new network rule (`github_access_rule`) and external access integration (`github_integration`) allowing access to `github.com` and some subdomains over HTTP and HTTPS: ```sql CREATE OR REPLACE NETWORK RULE github_access_rule MODE = EGRESS TYPE = HOST_PORT VALUE_LIST = ('github.com', 'api.github.com', 'raw.githubusercontent.com'); CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION github_integration ALLOWED_NETWORK_RULES = (github_access_rule) ENABLED = true; ``` To enable external access integrations, add the `external_access_integrations` key to the `code_execution` resources in your agent specification, containing a list of external access integrations the code execution tool can access. The following example agent specification snippet demonstrates giving the code execution tool access to the `github_integration` external access integration: ```yaml tool_resources: code_execution: external_access_integrations: - github_integration ``` For the full details of setting up network rules and external access integrations, see [](/developer-guide/external-network-access/creating-using-external-network-access). ## Known limitations Cortex Agent code execution tool is subject to the following known limitations: - **Single-session scope**: By default, the code execution tool can only access data in the current session. State isn't shared between sessions or across separate invocations. To persist information produced by the code execution tool, you'll need your own persistence store on Snowflake that the code execution tool has read and write access to. - **Access inheritance**: The code execution tool operates with the role privileges of the Cortex Agent owner. Make sure that the owner role of any agent with the code execution enabled is appropriately scoped. --- title: Cortex Agent evaluations source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-evaluations.md section: Snowflake Cortex (AI & ML) --- # Cortex Agent evaluations - [](/user-guide/snowflake-cortex/ai-observability) - [](/user-guide/snowflake-cortex/cortex-agents-manage) This feature is not available in the People's Republic of China. Cortex Agent evaluations allow you to monitor your agent's behavior and performance. Evaluate your agent against both ground truth and reference-free evaluation metrics. During evaluation, your agent's activity is traced and monitored so you can ensure that each step in the process advances towards your end goal. For evaluation methodology and dataset design guidance, see [Best Practices for Evaluating Cortex Agents](https://www.snowflake.com/en/developers/guides/best-practices-for-evaluating-cortex-agents/). Snowflake offers the following metrics to evaluate your agent against: - **Answer correctness** – How closely the actual response for a given input query to the agent matches the expected ground truth answer. - **Logical consistency** – Measures consistency across agent instructions, planning, and tool calls. This metric is *reference-free*, meaning you don't need to prepare any information in your dataset for evaluation. Snowflake also allows you to create custom evaluation metrics that use the LLM judging process to measure context critical to your Agent’s domain and use case. Custom metrics use an LLM prompt and scoring methodology, which are passed to the evaluation judging system to produce a score. For additional details about how agent evaluations are conducted on Snowflake, including the LLM judging system used for reference-free evaluations, see the Snowflake engineering blog [What’s Your Agent’s GPA? A Framework for Evaluating AI Agent Reliability](https://www.snowflake.com/en/engineering-blog/ai-agent-evaluation-gpa-framework/). For an example of running an Agent Evaluation programmatically, see the guide [Getting Started with Cortex Agent Evaluations](https://www.snowflake.com/en/developers/guides/getting-started-with-cortex-agent-evaluations/). ## Access control requirements The ability to run a Cortex Agent evaluation requires the role that runs the evaluation to have the following: - The DATABASE ROLE SNOWFLAKE.CORTEX_USER role - The EXECUTE TASK ON ACCOUNT permission - The USAGE permission on the database and schema containing your agent - The USAGE permission on the database and schema containing your evaluation data - If creating a dataset from an input table, CREATE DATASET ON SCHEMA - The following permissions on the current database and schema, which is where the evaluation will be run from: - USAGE - CREATE FILE FORMAT ON SCHEMA - CREATE TASK In Snowsight, agent evaluations are run on the database and schema of the agent. With SQL, agent evaluations are run on the session's database and schema. - The USAGE or OWNERSHIP privilege on your agent - The MONITOR or OWNERSHIP privilege on your agent - If using an agent evaluation configuration, READ privilege on the stage containing the configuration file. If the agent being evaluated uses tools, your role also needs access to all of them. Additionally, if working with evaluations in %sf-web-interface%, the role you use to run or an inspect an evaluation needs the USAGE privilege on your default warehouse. ## Prepare an evaluation dataset Before starting a Cortex Agent evaluation, prepare a table containing your evaluation inputs. This table is used to create a dataset for your evaluation to run against. To learn more about datasets on Snowflake, see [](/developer-guide/snowflake-ml/dataset). ### Cortex Code [Cortex Code](/user-guide/cortex-code/cortex-code) can help you create or update an evaluation dataset. Use the `dataset-curation` sub-skill of the Cortex Code `cortex-agent` skill in the CLI (see [Cortex Code CLI - Skills](#label-extensibility-skills)), or select **Create with Cortex Code** or **Manage datasets** on an agent's **Evaluations** tab in %sf-web-interface%, to: - Generate synthetic queries based on your agent configuration. - Import queries from production monitoring data. - Edit an existing dataset to add, remove, or modify queries using either source. Cortex Code can also run the evaluation against the dataset, so you can go from dataset to results in a single flow. ### Dataset format The dataset table has two columns: - **Input query** (VARCHAR) — the user query to evaluate. - **Ground truth** (VARIANT) — a JSON object describing the expected agent behavior. This is the single value the LLM judges compare against. The **answer correctness** system metric reads the `ground_truth_output` key from that JSON and compares its value to the agent's **streamed reply** — everything the user sees, including LLM thinking, response generation, and chart generation. Because the value is fed into an LLM prompt, treat it as a plain-language rubric: - If the correct answer is known and stable, state it and include any rounding, tolerance, units, formatting, or scoping the response must observe (for example, "the value must be within ±2% of 123.45"). - If the answer changes over time or has a particular shape, describe what a correct response should and shouldn't contain — including a format example like `"Output is in the following JSON format: ..."` if structure matters — in enough detail that two readers would agree on whether a given reply meets the bar. Custom metrics read the **entire** VARIANT through the `{{ground_truth}}` placeholder, regardless of key. Use this to check process criteria the streamed reply doesn't expose: for example, add a custom metric that references `{{tool_info}}` to verify which tools or tables the agent used. Keep output criteria in `ground_truth_output` and process criteria in the custom metric's own keys. For reference-free metrics like **logical consistency**, you don't need a ground truth value at all. Any data not consumed by your selected metrics is ignored, so the column can be left empty for runs that use only reference-free metrics. #### Ground truth examples The following examples show `ground_truth_output` values you can adapt for your own dataset. Each pairs an input query with the JSON you'd put in the ground truth VARIANT column. ##### Static factual query Use when the correct answer is known and stable. State the expected value and decide whether the response must match it exactly or whether rounding or a tolerance is acceptable. Add any other facets a correct reply must cover, such as scoping to the right date or excluding specific categories of records. **Input query:** `How many active customers does my business have as of December 31, 2025?` ```json { "ground_truth_output": "There are 1,000 active customers as of December 31, 2025. The response should reference that exact date (not a different date or 'as of today') and present the count as a factual number. Rounding to the nearest hundred or a value within ±1% is acceptable; values outside that range aren't. The count shouldn't include test or churned accounts." } ``` ##### Dynamic or live data query Use when you can't fix a number in advance but can describe what a good response looks like. **Input query:** `How many orders did customers place today?` ```json { "ground_truth_output": "The response should give a specific whole-number count of orders placed today and scope the count explicitly to today's date. It shouldn't return results for a different time period or claim data is unavailable without attempting retrieval, and it shouldn't hedge with phrases like 'approximately' or 'I think'. The count should be presented as a fact derived from the data." } ``` ##### Boundary or out-of-scope query Use when the agent should refuse rather than hallucinate. **Input query:** `What's the weather like in New York today?` ```json { "ground_truth_output": "The response should state that weather information is outside the agent's capabilities and ideally point to the kinds of questions the agent can help with. It shouldn't fabricate a forecast or present any temperatures or conditions." } ``` ##### Complex investigation Use when the response must connect multiple facts. **Logical consistency** (reference-free) catches contradictions in planning and tool use; `ground_truth_output` catches a coherent but factually wrong explanation. **Input query:** `Why did our checkout conversion rate drop between March 1–7, 2025?` ```json { "ground_truth_output": "The response should acknowledge that checkout conversion dropped during March 1–7, 2025, link the drop to the payment gateway timeout issue that began March 3, 2025, and note that mobile users were disproportionately affected (more than 70% of failed checkouts were mobile). It should also quantify the drop (conversion fell from ~4.2% to ~2.8%). It shouldn't attribute the drop to causes the data doesn't support, such as a marketing campaign change or seasonal trends. The causal chain (gateway timeouts to failed checkouts to conversion drop, concentrated on mobile) matters more than the order the facts appear in." } ``` | Scenario | What to put in `ground_truth_output` | | ------------------------- | --------------------------------------------------------------------- | | Known answer, static data | The specific value, any tolerance, and what to exclude | | Live or changing data | A description of what a correct response should and shouldn't contain | | Off-topic or refusal | What the agent should say, and that it shouldn't fabricate an answer | | Multi-fact investigation | Each fact the response should cover, plus explanations to avoid | #### Insert into a Snowflake table Populate the VARIANT column with [PARSE_JSON](/sql-reference/functions/parse_json). The following example creates `agent_evaluation_data` and inserts one input query with its expected answer: ```sql CREATE OR REPLACE TABLE agent_evaluation_data ( input_query VARCHAR, ground_truth VARIANT ); INSERT INTO agent_evaluation_data SELECT 'What was the temperature in San Francisco on August 2nd 2019?', PARSE_JSON(' { "ground_truth_output": "The temperature was 14 degrees Celsius in San Francisco on August 2nd, 2019." } '); ``` [OBJECT_CONSTRUCT](/sql-reference/functions/object_construct) and [ARRAY_CONSTRUCT](/sql-reference/functions/array_construct) return OBJECT and ARRAY, not VARIANT. Use [PARSE_JSON](/sql-reference/functions/parse_json), or wrap a value in [TO_VARIANT](/sql-reference/functions/to_variant), to guarantee the column type. ### Create a dataset from a Snowflake table (SQL) To create an evaluation dataset with SQL, call [SYSTEM$CREATE_EVALUATION_DATASET](/sql-reference/functions/system_create_evaluation_dataset). The column-mapping keys differ depending on how you create the dataset: - When you call `SYSTEM$CREATE_EVALUATION_DATASET` (SQL), use `query_text` and `expected_tools` in the mapping object. - When you define a dataset in the Agent Evaluation YAML (`dataset.column_mapping`), use `query_text` and `ground_truth`. ## Start an agent evaluation ### Cortex Code You can also run an evaluation through [Cortex Code](/user-guide/cortex-code/cortex-code). Use the `evaluate-cortex-agent` sub-skill of the Cortex Code `cortex-agent` skill in the CLI (see [Cortex Code CLI - Skills](#label-extensibility-skills)), or continue the same Cortex Code flow from **Prepare an evaluation dataset** in %sf-web-interface% directly into running the evaluation against your dataset. ### %sf-web-interface% Agent evaluations run as your currently selected role in %sf-web-interface%, not your default role. Make sure a role with the correct permissions is active before starting an evaluation. Begin your evaluation of a Cortex Agent by doing the following: 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **AI & ML** %raa% **Agents**. 3. Select the agent you want to conduct an evaluation of. 4. Select the **Evaluations** tab. 5. Select **New evaluation run**. The **New evaluation run** modal opens. 6. In the **Name** field, provide a name for your evaluation. This name should be unique for the agent being evaluated. 7. Optional: In the **Description** field, provide any comments for the evaluation. 8. Select **Next**. This advances to the **Select dataset** modal. 9. Select the dataset used to evaluate your agent. You can choose either **Existing dataset** or **Create new dataset**. To use an existing dataset: 1. From the **Database and schema** list, select the database and schema containing your dataset. 2. From the **Select dataset** list, select your dataset. To create a new dataset: 1. From the **Source table - Database and schema** list, select the database and schema containing the table you want to import to a dataset. 2. From the **Select source table** list, select your source table. 3. From the **New dataset location - Database and schema** list, select the database and schema to place your new dataset. 4. In the **Dataset name** field, enter your dataset name. This name needs to be unique among the schema-level objects in your selected schema. 10. Select **Next**. This advances to the **Select metrics** modal. 11. From the **Input query** list, select the column of your dataset which contains the input queries. 12. For each of the **System metrics**, change the toggle to active for any metric you want included in your evaluation. Select the column of your dataset containing the ground truth for your evaluation. 13. (Optional) To conduct a custom evaluation, toggle on **Custom metrics**. 14. Select the database and schema containing the stage where your custom evaluation configuration is stored. 15. Select the stage where your custom evaluation configuration is stored. 16. Select the YAML configuration file for your custom evaluation. In %sf-web-interface%, only the custom evaluation definitions are loaded from your YAML configuration. The rest of the YAML file must still be valid. For the evaluation YAML specification, see [](#label-cortex-agent-evaluation-yaml-spec). 17. For each custom metric, change the toggle to active if you want it included in your evaluation. Select the column of your dataset containing the ground truth for this evaluation. 18. Select **Create** to create the evaluation and begin the evaluation process. At any point, you can select **Cancel** to cancel creating the evaluation, or select **Prev** to return to the previous modal. ### SQL To start or retrieve information on an evaluation with SQL, use the [](/sql-reference/functions/execute_ai_evaluation) function. This function has the following required arguments: - `evaluation_job`: A string value of 'START', 'STATUS', or 'DELETE'. - `run_parameters`: A SQL [OBJECT](#label-data-type-object) containing the key `run_name`, with a value of the name of your run. - `config_file_path:` A stage file path pointing to your run configuration YAML file. This path can't be a signed URL. For the evaluation YAML specification, see [](#label-cortex-agent-evaluation-yaml-spec). Use the `evaluation_job` value 'START' to start an evaluation. The following example starts a run called `run-1` using the agent evaluation configuration from `@eval_db.eval_schema.metrics/agent_evaluation_config.yaml`: ```sql CALL EXECUTE_AI_EVALUATION( 'START', OBJECT_CONSTRUCT('run_name', 'run-1'), '@eval_db.eval_schema.metrics/agent_evaluation_config.yaml' ); ``` After a run starts, you can query its progress with the `evaluation_job` value 'STATUS'. This call returns a table in the format used for [AI Observability Runs](#label-ai-observability-runs). The following example queries the status of the agent evaluation started from the previous example: ```sql CALL EXECUTE_AI_EVALUATION( 'STATUS', OBJECT_CONSTRUCT('run_name', 'run-1'), '@eval_db.eval_schema.metrics/agent_evaluation_config.yaml' ); ``` To delete an evaluation run, use the `evaluation_job` value 'DELETE'. The following example deletes the `run-1` run for the agent defined by the same configuration file: ```sql CALL EXECUTE_AI_EVALUATION( 'DELETE', OBJECT_CONSTRUCT('run_name', 'run-1'), '@eval_db.eval_schema.metrics/agent_evaluation_config.yaml' ); ``` You can call the EXECUTE_AI_EVALUATION function from a [Task](/user-guide/tasks-intro) to regularly run an evaluation or check the status of one. ## Inspect evaluation results Evaluation results include information about the requested metrics, details of the agent's threads of reasoning, and information about the LLM planning stage for each executed trace in the thread. ### Cortex Code In the [Cortex Code](/user-guide/cortex-code/cortex-code) CLI, the `cortex-agent` skill provides two sub-skills for working with completed evaluations: - `investigate-cortex-agent-evals`: Inspect evaluation runs and find any issues in your configuration or data. - `optimize-cortex-agent`: Use results from completed evaluations to suggest and test changes that improve your agent's performance. For more information about Cortex Code skills, see [Cortex Code CLI - Skills](#label-extensibility-skills). ### %sf-web-interface% The **Evaluations** tab for an agent in %sf-web-interface% gives you an overview of every evaluation run and its summary results. To view evaluation results in %sf-web-interface%: 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **AI & ML** %raa% **Agents**. 3. Select the agent you want to conduct an evaluation of. 4. Select the **Evaluations** tab. #### Evaluation runs listing The summary of run information for each run includes: - `RUN NAME` – The name of the evaluation run. - `# OF RECORDS` – The number of queries performed and answered as part of the run. - `STATUS` – The status of the evaluation run, which is one of: - %sf-icon-circle-check% – All inputs were evaluated and results are available. - **A spinner is displayed** – The run is in progress, with no information available yet. - %sf-icon-warning% – The run experienced an error at some point. Some or all metrics may be unavailable for the run. - `DATASET` – The name of the dataset used for the evaluation. - `AVG DURATION` – The average duration of time taken to execute an input query for the run. - `LOGICAL CONSISTENCY` – Average over all inputs of the logical consistency evaluation for the run, if requested. - `DESCRIPTION` – The description of the evaluation run. - `CREATED` – The time at which the run was created and started. Each custom metric evaluated for this run also receives its own column, defined by the evaluation metric `name` value. For more information on custom metrics, see [](#label-agent-evaluation-custom-metric). #### Evaluation run overview When you select an individual run in %sf-web-interface%, you're presented with the run overview. This overview includes summary averages for each metric evaluated during the run, and a summary of each input execution. The overview for each input execution includes: - `STATUS` – The status of the evaluation run, which is one of: - %sf-icon-circle-check% – All inputs were evaluated and results are available. - **A spinner is displayed** – The run is in progress, with no information available yet. - %sf-icon-warning% – The run experienced an error at some point. Some or all metrics may be unavailable for the run. - `INPUT` – The input query used for the evaluation. - `OUTPUT` – The output produced by the agent. - `DURATION` – The length of time taken to process the input and produce output. - `LOGICAL CONSISTENCY` – The logical consistency evaluation for the input, if requested. - `EVALUATED` – The time at which the input was processed. Each custom metric evaluated for this run also receives its own column, defined by the evaluation metric `name` value. For more information about custom metrics, see [](#label-agent-evaluation-custom-metric). #### View details (errors and metric warnings) After you open a run from the evaluation runs listing, select **View details** on the right to open the detailed view for that run. Scroll down in this view to find **error logs** and other diagnostic information when something fails or returns partial results. In the per-input table for the run, if metric computation has a problem for a specific row, a **warning** indicator can appear on the left side of that row. Hover over the warning to see details about the metric issue. #### Record details When you select an individual input in %sf-web-interface%, you're presented with the **Record details** view. This view includes three panes: **Evaluation results**, **Thread details**, and **Trace details**. ##### Evaluation results Your evaluation results are presented here in detail. Each metric has its own presentation box of overall average across inputs, which can be selected to display a popover containing more information. This popover contains a breakdown of the number of runs which performed at high accuracy (80% or more accurate), medium accuracy (30% or more accurate, but not high accuracy), and which failed. ##### Thread details The information logged during the execution of each agent thread. This includes planning and response generation by default, as well as a thread trace for each tool that the agent invoked during that thread. ##### Trace details Each trace pane includes input, processing, and output information relevant to that stage of agent execution. This information is the same as that provided by [agent monitoring](#label-cortex-agent-log-info). ### SQL **Observability redaction and evaluations** The **READ UNREDACTED AI OBSERVABILITY EVENTS TABLE** account privilege and default **redaction** of certain raw fields in `AI_OBSERVABILITY_EVENTS` apply to **Cortex Agent monitoring** in %sf-web-interface% and to **observability** user-defined table functions used on the **monitoring** data path, as described in [](/user-guide/snowflake-cortex/cortex-agents-monitor) and [](/release-notes/bcr-bundles/un-bundled/bcr-read-unredacted-ai-observability-events). This does **not** change **Cortex Agent evaluation** run execution, how metrics are computed, or how evaluation results and scores are shown in the **Evaluations** experience. To retrieve raw evaluation details, use the [](/sql-reference/functions/get_ai_evaluation_data-snowflake-local) function. This function has the following required arguments: - `database`: The database containing the agent. - `schema`: The schema containing the agent. - `agent_name`: The name of the agent. - `agent_type`: `CORTEX AGENT` or `EXTERNAL AGENT`. This value is case-insensitive. - `run_name`: The name of the evaluation run to retrieve. This function returns a table of event data described in [](#label-cortex-agent-evaluations-results-format). The following example displays the full evaluation details for a run called `run-1`, where the agent is named `evaluated_agent` stored on the schema `eval_db.eval_schema`: ```sql SELECT * FROM TABLE(SNOWFLAKE.LOCAL.GET_AI_EVALUATION_DATA( 'eval_db', 'eval_schema', 'evaluated_agent', 'CORTEX AGENT', 'run-1') ); ``` ### Query traces for a single record To access a single record from an evaluation trace, use the [](/sql-reference/functions/get_ai_record_trace-snowflake-local) function. This function has the following required arguments: - `database`: The database containing the agent. - `schema`: The schema containing the agent. - `agent_name`: The name of the agent. - `agent_type`: `CORTEX AGENT` or `EXTERNAL AGENT`. This value is case-insensitive. - `record_id`: The record ID to filter by. This function returns a table of event data described in [](#label-cortex-agent-evaluations-results-format). The following example displays the trace for the record `9346efc3-5dd6-4038-9b1a-72ca3d3b768c`, where the agent is named `evaluated_agent` stored on the schema `eval_db.eval_schema`: ```sql SELECT * FROM TABLE(SNOWFLAKE.LOCAL.GET_AI_RECORD_TRACE( 'eval_db', 'eval_schema', 'evaluated_agent', 'CORTEX AGENT', '9346efc3-5dd6-4038-9b1a-72ca3d3b768c' )); ``` ### Query evaluation errors and warnings for a run To access logs for warnings and errors that happened during an evaluation run, use the [](/sql-reference/functions/get_ai_observability_logs-snowflake-local) function. This function has the following required arguments: - `database`: The database containing the agent. - `schema`: The schema containing the agent. - `agent_name`: The name of the agent. - `agent_type`: `CORTEX AGENT` or `EXTERNAL AGENT`. This value is case-insensitive. This function returns a table of event data described in [](#label-cortex-agent-evaluations-results-format). The following example checks for errors and warnings for a run called `run-1`, where the agent is named `evaluated_agent` stored on the schema `eval_db.eval_schema`: ```sql SELECT * FROM TABLE(SNOWFLAKE.LOCAL.GET_AI_OBSERVABILITY_LOGS( 'eval_db', 'eval_schema', 'evaluated_agent', 'CORTEX AGENT') ) WHERE TRUE AND (record:"severity_text"='ERROR' or record:"severity_text"='WARN') AND record_attributes:"snow.ai.observability.run.name"='run-1'; ``` The fields of `record` and `record_attributes` are subject to change, but the fields `record:"severity_text"` and `record_attributes:"snow.ai.observability.run.name"` are guaranteed to be present in AI Observability logs. ## Agent Evaluation YAML specification To define the YAML file to configure an Agent Evaluation, including defining custom metrics, there are three top-level keys: - (Optional) `dataset`: A definition of how to create a dataset for the evaluation. This value is optional when using a YAML specification to start an evaluation in %sf-web-interface%, or when using an existing dataset. - `evaluation`: Settings for the agent to be evaluated. - `metrics`: The metrics recorded during an evaluation run, including definitions for custom metrics. ### Dataset definition The `dataset` value defines a new dataset from existing table data, mapping columns for the input query and ground truth. For the structure required for your `ground_truth` column, see [](#label-agent-evaluation-dataset-format). The keys for the `dataset` value are: - `dataset_type`: The string constant "CORTEX AGENT". This value is case-insensitive. - `table_name`: The fully qualified name of the table to use for the dataset's contents. - `dataset_name`: The name of the created dataset. - `column_mapping`: The mapping of the required evaluation input column `query_text` and output column `ground_truth` to columns of the table to create the dataset from. The resulting dataset is stored in the same database and schema as the table it's constructed from. When you call [](/sql-reference/functions/execute_ai_evaluation) with `START` and the YAML still contains `dataset:`, Snowflake attempts to **create** the dataset on every run. If a dataset with the same `dataset_name` already exists, the run can fail (for example with an error that a dataset or internal dataset version already exists). That can happen even when you only change `run_name` between runs, or after a previous attempt failed after the dataset was created. **Pattern for repeated runs on the same dataset:** Remove the entire `dataset:` top-level block from the YAML. Keep `evaluation:` (with `source_metadata` referencing the existing `dataset_name`) and `metrics:`. This matches how you run another evaluation against an existing dataset without re-importing the table. **When you need a new dataset** from the same or updated source table (for example after you change rows), use a **new** `dataset_name` in `dataset:`, or create a dataset with [SYSTEM$CREATE_EVALUATION_DATASET](/sql-reference/functions/system_create_evaluation_dataset) and reference that name in `evaluation.source_metadata` without embedding `dataset:` in the YAML you use for the run. The following example dataset definition shows a dataset named `evaluation_input` created from the `evals_db.evals_schema.evaluation_data` table, using the `user_question` as input and `expected_outcome` to define ground truth: ```yaml dataset: dataset_type: "CORTEX AGENT" table_name: "evals_db.evals_schema.evaluation_data" dataset_name: "evaluation_input" column_mapping: query_text: "user_question" ground_truth: "expected_outcome" ``` ### Agent configuration The `evaluation` value sets the configuration for the agent to conduct an evaluation against. The keys for the `evaluation` value are: - `agent_params`: A dictionary describing the agent to conduct the evaluation for. This value uses the keys: - `agent_name`: The name of the agent to evaluate. - `agent_type`: The string constant "CORTEX AGENT". This value is case-insensitive. - (Optional) `run_params`: Metadata for identifying this evaluation run. This value uses the keys: - (Optional) `label`: The label for this evaluation. - (Optional) `description`: A detailed description of the evaluation. - `source_metadata`: A dictionary describing the dataset used for the evaluation. This value uses the keys: - `type`: The string constant `dataset`. This value is case-sensitive. - `dataset_name`: The name of the dataset to use. The following example agent configuration runs an agent named `evaluated_agent` with the label `Basic evaluation`, using the dataset `evaluation_input`: ```yaml evaluation: agent_params: agent_name: "evaluated_agent" agent_type: "CORTEX AGENT" run_params: label: "Basic evaluation" source_metadata: type: "dataset" dataset_name: "evaluation_input" ``` Note that the agent name is relative to the current database and schema. You can also provide the fully qualified name of the agent. ### Metrics selection The `metrics` value is a sequence of metrics to evaluate, including your own custom metric definitions. The accepted values for pre-defined metrics are: - `answer_correctness`: Measure how closely the expected ground truth answer for a given input query matches the actual response streamed from the agent. - `logical_consistency`: Measure consistency across agent instructions, planning, and tool calls. This metric is *reference-free* and doesn't use a dataset. #### Defining a custom metric You can define your own custom metric by providing an identifier, prompt, and score ranges. The prompt you provide is passed to an LLM judge along with run traces to conduct your custom evaluation. Custom metrics have the following required key-value pairs: - `name`: The name of the metric. - `score_ranges`: A mapping that defines low, medium, and high-quality score ranges. This mapping uses the keys: - `min_score`: The score range used to identify low-quality results, as a two-element sequence of the inclusive lower bound to exclusive upper bound. - `median_score`: The score range used to identify medium-quality results, as a two-element sequence of the inclusive lower bound to inclusive upper bound. - `max_score`: The score range used to identify high-quality results, as a two-element sequence of the exclusive lower bound to inclusive upper bound. - `prompt`: The prompt template to pass to the LLM judge along with the agent run trace data.
This template must include a scoring mechanism which produces a numeric value represented in the ranges provided for `score_ranges`.
A custom metric's prompt is able to reference the trace data generated by the agent during an evaluation run. Snowflake passes the entire trace as input to the LLM judge, but you can emphasize certain information by using a replacement string that references data in a GET_AI_RECORD_TRACE column directly. The following replacement strings are available: | Replacement string | GET_AI_RECORD_TRACE column | | --------------------- | -------------------------- | | `{{input}}` | INPUT | | `{{output}}` | OUTPUT | | `{{ground_truth}}` | GROUND_TRUTH | | `{{tool_info}}` | TOOL | | `{{start_timestamp}}` | START_TIMESTAMP | | `{{duration}}` | DURATION_MS | | `{{span_id}}` | SPAN_ID | | `{{span_type}}` | SPAN_TYPE | | `{{span_name}}` | SPAN_NAME | | `{{llm_model}}` | LLM_MODEL | | `{{error}}` | ERROR | | `{{status}}` | STATUS | #### Metrics configuration example The following example defines a metrics configuration that enables answer correctness and logical consistency checks, and also defines a custom `relevance` metric which returns a score between 1-10 based on how ground truth compares against agent output: ```yaml metrics: # Built-in metrics - "answer_correctness" - "logical_consistency" # Custom metric with prompt - name: "relevance" score_ranges: min_score: [1, 3] median_score: [4, 6] max_score: [7, 10] prompt: | Evaluate the relevance of the agent's response to the user's query. Rate from 1-10 where: 1 = Completely irrelevant 4 = Somewhat irrelevant 6 = Neutral 8 = Mostly relevant 10 = Highly relevant and on-topic You can compare the {{output}} with the {{ground_truth}} to help you understand if the contents are relevant or not Consider: - Does the response address the user's question? - Is the information provided appropriate to the context? - Are there any tangential or off-topic elements? ``` ### Full example configuration Combining all of the previous example sections gives a full Agent Evaluation configuration: ```yaml # Optional: Create dataset before running evaluation dataset: dataset_type: "CORTEX AGENT" table_name: "EVALS_DB.EVALS_SCHEMA.EVALUATION_DATA" dataset_name: "EVALUATION_INPUT" column_mapping: query_text: "user_question" ground_truth: "expected_outcome" # Evaluation task configuration evaluation: agent_params: agent_name: "evaluated_agent" agent_type: "CORTEX AGENT" run_params: label: "Basic evaluation" source_metadata: type: "dataset" dataset_name: "EVALUATION_INPUT" metrics: # Built-in metrics (simple strings) - "answer_correctness" - "logical_consistency" # Custom metric definition - name: "relevance" score_ranges: min_score: [1, 3] median_score: [4, 6] max_score: [7, 10] prompt: | Evaluate the relevance of the agent's response to the user's query. Rate from 1-10 where: 1 = Completely irrelevant 4 = Somewhat irrelevant 6 = Neutral 8 = Mostly relevant 10 = Highly relevant and on-topic You can compare the {{output}} with the {{ground_truth}} to help you understand if the contents are relevant or not Consider: - Does the response address the user's question? - Is the information provided appropriate to the context? - Are there any tangential or off-topic elements? ``` ### Upload configuration to a stage Agent Evaluation configurations are required to have a specific file format for Snowflake to parse them. The following snippet demonstrates creating the required `yaml_file_format` on the schema `evals_db.evals_schema`, then creates the stage `evaluation_config` to upload an agent configuration to: ```sql CREATE OR REPLACE FILE FORMAT evals_db.evals_schema.yaml_file_format TYPE = 'CSV' FIELD_DELIMITER = NONE RECORD_DELIMITER = '\n' SKIP_HEADER = 0 FIELD_OPTIONALLY_ENCLOSED_BY = NONE ESCAPE_UNENCLOSED_FIELD = NONE; CREATE OR REPLACE STAGE evals_db.evals_schema.evaluation_config FILE_FORMAT = evals_db.evals_schema.yaml_file_format; ``` Upload your configuration to a created stage through %sf-web-interface% by navigating to In the navigation menu, select **Ingestion** %raa% **Add Data** and selecting **Load files into a Stage**. You can also use the SQL [](/sql-reference/sql/put) command to upload a local YAML file. The following example demonstrates copying the local file `/Users/dev/evaluation_config.yaml` to the stage `evals_db.evals_schema.evaluation_config`: ```sql PUT file:///Users/dev/evaluation_config.yaml @evals_db.evals_schema.evaluation_config AUTO_COMPRESS='false' OVERWRITE=TRUE; ``` If you create your YAML in a [Workspace](/user-guide/ui-snowsight/workspaces), you can copy it from your active workspace to a stage. The following example copies the file `evaluation_config.yaml` from your workspace to the stage `evals_db.evals_schema.evaluation_config`: ```sql COPY FILES INTO @evals_db.evals_schema.evaluation_config FROM 'snow://workspace/USER$.PUBLIC.DEFAULT$/versions/live' FILES=('custom_metric_config.yaml'); ``` Snowflake recommends keeping your YAML file uncompressed. ## Evaluation results table format Functions which return information about a Cortex Agent evaluation all produce a table with the following columns:
The `GROUND_TRUTH` column contains the full JSON from your dataset's ground truth VARIANT, serialized as a string. In **custom metric** prompts, the `{{ground_truth}}` replacement string is substituted with that same serialized content, so a custom LLM judge can use any JSON shape you stored (not only keys such as `ground_truth_output` or `ground_truth_invocations`). **System metrics** still require JSON that matches what each metric expects (for example, `ground_truth_output` for answer correctness). For dataset column requirements, see [](#label-agent-evaluation-dataset-format). ## Model availability Agent Evaluations currently only supports the following models, using cross-region inference. Snowflake automatically chooses from these models based on your account settings.
| Model | Cross Cloud (Any Region) | AWS US | AWS US Commercial Gov | AWS EU | AWS APJ | | ----------------- | ------------------------ | ------ | --------------------- | ------ | ------- | | `claude-4-sonnet` | %cm% | %cm% | %cm% | %cm% | %cm% |
## Known limitations Cortex Agent evaluations are subject to the following limitations: - **Agent response times and throughput**: The number of inputs that can be processed during an evaluation is constrained by agent response times and the amount of trace detail. If you experience timeouts or long delays in your evaluation, split your evaluation data. For example, if you have queries which are guaranteed to invoke many different tools, you can partition data by common tool invocation. If you have a custom evaluation that results in timeouts, refine or shorten your prompt. You may also want to consider splitting custom evaluations to only focus on one specific element of your agent's output. - **Ground truth staleness**: Depending on how you word your input queries, results may drift over time and result in less accurate evaluation results. In particular you should try and scope input queries to specific, absolute dates and times. As an example, both of the input queries `What was our revenue?` and `What was our revenue for the first quarter?` will experience drift, while the query `What was our revenue between January and March of 2025?` is scoped to a specific window of time that can be consistently referenced in the evaluation data. ## Cost Considerations Agent Evaluations run a Cortex Agent to create output for evaluation, and LLM judges to compute the evaluation metrics. You're charged for each run of the agent against a ground truth query. The evaluation's LLM judges are run by the [](/sql-reference/functions/ai_complete) function, and you incur charges based on the model Snowflake selects for judging. Additionally, you're charged for the following: - Warehouse charges for tasks used to manage evaluation runs - Warehouse charges for queries used to compute evaluation metrics - Storage charges for datasets and evaluation results - Warehouse charges to retrieve evaluation results viewed in %sf-web-interface% For more information on estimating costs, see [](/user-guide/cost-understanding-overall). Refer to the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf) for full cost information. --- title: Cortex Agent versioning source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-versioning.md section: Snowflake Cortex (AI & ML) --- # Cortex Agent versioning - [](/user-guide/snowflake-cortex/cortex-agents) - [](/user-guide/snowflake-cortex/cortex-agents-manage) Cortex Agent versioning enables a lifecycle management model that lets you develop, test, and deploy agents through distinct versions. Each agent has a **live version** — a mutable working copy you use for development — and can have any number of **named versions** — immutable snapshots you use for stable deployments. By committing the live version, you create a named version that captures the agent's configuration at a point in time. You can then route API requests to any version by name, alias, or shortcut. This model separates the development workflow from the production workflow. You iterate on the live version, commit it when ready, assign an alias like `production` to the committed version, and route traffic to that alias. If you need to roll back, you point the alias to a previous named version. ## How agent versioning works An agent's version lifecycle follows a commit-based model: 1. When you create an agent, it automatically creates a committed `VERSION$1` and a `LIVE` version with the same spec. 2. You create or modify the agent's live version during development. 3. When the agent is ready for deployment, you commit the live version. Snowflake creates a named version with a system-assigned identifier in the format `VERSION$N` (for example, `VERSION$2`, `VERSION$3`). 4. You assign an alias or set the named version as the default. 5. Interaction requests target the named version — either by name, alias, or shortcut — for stable, repeatable behavior. After a commit, the live version isn't automatically recreated. You explicitly create a new live version when you're ready to resume development. You can also create named versions directly from a stage or Git repository without going through the live version. This supports workflows where changes are merged offline and then imported as a new version. ## Live versions The live version is the mutable, editable state of an agent. You use it during development to iterate on the agent's configuration, instructions, tools, and skills. Each agent can have at most one live version at a time. You create a live version in one of the following ways: - **From agent creation**: When you create a new agent, a live version is automatically created. - **From the last committed version**: Restore the live version from the most recent named version to continue development from a known state. - **From the %sf-web-interface% UI**: The UI can issue the creation of a new live version. ```sql -- Create a live version from the last committed version ALTER AGENT my_agent ADD LIVE VERSION FROM LAST COMMENT = 'Resuming development from v3'; ``` You can optionally assign an alias to the live version at creation time: ```sql -- Create a live version with an alias ALTER AGENT my_agent ADD LIVE VERSION dev FROM LAST; ``` The live version is designed for interactive development. Snowflake recommends committing the live version before using it in production to ensure you have an immutable reference point. ## Named versions A named version is an immutable snapshot of the agent's configuration at the time of the commit. Once created, a named version can't be modified — you can only update its metadata (comment or alias) or drop it entirely. This immutability makes named versions the foundation for stable, reproducible deployments. Snowflake assigns each named version a system identifier in the format `VERSION$N`, where `N` increments with each commit: ```sql -- Commit the live version to create a named version ALTER AGENT my_agent COMMIT COMMENT = 'Production release for Q1'; ``` This creates `VERSION$2` (or the next available number). You can also create named versions directly from a stage or Git repository: ```sql -- Create a named version from a stage ALTER AGENT my_agent ADD VERSION FROM @my_stage/agents/my_agent COMMENT = 'Imported from feature branch'; ``` To view all versions on an agent: ```sql SHOW VERSIONS IN AGENT my_agent; ``` To remove a named version you no longer need: ```sql ALTER AGENT my_agent DROP VERSION VERSION$1; ``` You can only drop named versions. You can't drop the live version. ## Aliases An alias is a human-readable label that you assign to a version. Aliases make it easier to reference versions in API calls and stage operations without knowing the system-assigned version number. Common alias patterns include `production`, `staging`, `canary`, and `rollback`. You assign an alias to a named version or the live version: ```sql -- Assign an alias to a named version ALTER AGENT my_agent MODIFY VERSION VERSION$3 SET ALIAS = production; -- Assign an alias to the live version ALTER AGENT my_agent MODIFY LIVE VERSION SET ALIAS = dev; ``` Once assigned, you can use the alias anywhere a version identifier is accepted — in API calls, stage URIs, and SQL commands. Alias behavior: - Each alias must be unique within an agent. - Aliases are case-sensitive if created with double-quoted identifiers; otherwise they are stored in uppercase. - You can reassign an alias from one version to another to redirect traffic without changing the calling code. For example, to promote a new version to production, reassign the `production` alias: ```sql -- Point the production alias to the latest version ALTER AGENT my_agent MODIFY VERSION VERSION$4 SET ALIAS = production; ``` All API calls that target the `production` alias now route to `VERSION$4` without any change to the calling application. ## Version shortcuts Snowflake provides built-in shortcuts for referencing versions without knowing the exact version name or alias. You can use these shortcuts in the API endpoint and in stage URIs: | Shortcut | Description | | --------- | -------------------------------------------- | | `LIVE` | The current live version | | `FIRST` | The first committed named version | | `LAST` | The most recently committed named version | | `DEFAULT` | The version set as the default for the agent | Shortcuts are useful for automation scripts and CI/CD pipelines where the exact version number isn't known ahead of time. For example, you can always run the latest committed version with the `LAST` shortcut, or target the version that the agent owner designated as the default. ## Default version Unless you set it explicitly, the `DEFAULT` version is the latest committed version. You can also designate one version as the default for the agent. The default version is the version that the agent uses when no version is specified in an API call: ```sql -- Set the default version ALTER AGENT my_agent SET DEFAULT_VERSION = 'VERSION$3'; ``` You can also set the default to a system shortcut such as `FIRST` or `LAST`: ```sql ALTER AGENT my_agent SET DEFAULT_VERSION = LAST; ``` Setting a default version allows you to control which version serves traffic without requiring callers to specify a version in every request. When you promote a new version, update the default to redirect all unversioned API calls. ## Versioning and CI/CD Agent versioning integrates with CI/CD workflows by supporting version creation from external sources — stages and Git repositories — and providing aliases and shortcuts for environment routing. A typical CI/CD workflow for agents follows this pattern: 1. **Develop**: Edit the agent's live version in the %sf-web-interface% UI or through SQL. Test interactively. 2. **Commit**: When the agent is ready, commit the live version to create an immutable named version. 3. **Test**: Route test traffic to the new named version using its system ID or a `staging` alias. 4. **Promote**: Reassign the `production` alias to the new version once testing passes. 5. **Roll back**: If issues arise, reassign the `production` alias to the previous named version. For teams that manage agent configurations in Git, the workflow shifts to an import model: 1. **Develop**: Edit agent configuration files in a Git repository. 2. **Merge**: Review and merge changes through your standard pull request process. 3. **Import**: Create a named version from the Git-connected stage, bypassing the live version entirely. 4. **Deploy**: Assign the `production` alias to the imported version. ```sql -- Import a version from a Git-connected stage after a merge ALTER AGENT my_agent ADD VERSION FROM @my_repo/tags/v2.1/agents/my_agent COMMENT = 'Automated deploy from CI pipeline'; ``` You can also create a new agent directly from a stage as part of an infrastructure-as-code workflow: ```sql CREATE AGENT my_agent COMMENT = 'Deployed by CI pipeline' FROM @my_stage/agents/my_agent; ``` ## Stage operations Each agent version has an internal versioned stage path that you can access through the `snow://agent/` URI scheme. This lets you inspect the files that make up a version — including the agent specification, skill definitions, and supporting scripts. The URI format is: ``` snow://agent//versions//[] ``` The `` segment accepts a system version ID (`VERSION$N`), a user-defined alias, or the keyword `live`. ```sql -- List all files in the production version LIST snow://agent/my_agent/versions/production/; -- Download the agent spec from a specific version GET snow://agent/my_agent/versions/VERSION$2/agent.yaml file:///tmp/; ``` Stage operations are read-only and useful for auditing, debugging, and comparing versions. ## Run a specific version You can send a request to a specific version of an agent using the versioned API endpoint: ``` POST /api/v2/databases/{database}/schemas/{schema}/agents/{name}/versions/{version}:run ``` The `{version}` path parameter accepts any of the following: | Identifier type | Example | | ------------------- | ---------------------------------- | | System version name | `VERSION$2` | | User-defined alias | `production` | | Shortcut | `FIRST`, `LAST`, `DEFAULT`, `LIVE` | By default, the API streams responses as server-sent events (SSE). To receive a single JSON response, set `stream` to `false` in the request body. ## Limitations The following limitations apply to Cortex Agent versioning: - **One live version**: Each agent can have at most one live version at a time. - **Live version not auto-created**: After you commit the live version, a new live version isn't automatically created. You must create one explicitly. - **Named versions are immutable**: You can't modify the configuration of a named version. You can only update metadata (comment, alias) or drop it. - **Drop named only**: You can drop named versions but not the live version. - **Alias uniqueness**: Each alias must be unique within an agent. Assigning an alias that already exists on another version results in an error. - **Case sensitivity**: Aliases are case-sensitive when created with double-quoted identifiers; otherwise they are stored in uppercase. --- title: Cortex Agents source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents.md section: Snowflake Cortex (AI & ML) --- # Cortex Agents This feature is not available in the People's Republic of China. ## Overview Cortex Agents orchestrate across both structured and unstructured data sources to deliver insights. They plan tasks, use tools to execute these tasks, and generate responses. Agents use %cortex-analyst% (structured) and Cortex Search (unstructured) as tools, along with LLMs, to analyze data. Cortex Search extracts insights from unstructured sources, while %cortex-analyst% generates SQL to process structured data. In addition, you can use stored procedures and user defined functions (UDFs) to implement custom tools. A comprehensive support for tool identification and tool execution enables delivery of sophisticated applications grounded in enterprise data. The workflow involves four key components: 1. **Planning**: Applications often switch between processing data from structured and unstructured sources. For example, consider a conversational app designed to answer user queries. A business user may first ask for top distributors by revenue (structured) and then switch to inquiring about a contract (unstructured). Cortex Agents can parse a request to orchestrate a plan and arrive at the solution or response. 1. **Explore options**: When the user poses an ambiguous question (for example, "Tell me about Acme Supplies"), the agent considers different permutations - products, location, or sales personnel - to disambiguate and improve accuracy. 2. **Split into subtasks**: Cortex Agents can split a task or request (for example, “What are the differences between contract terms for Acme Supplies and Acme Stationery?”) into multiple parts for a more precise response. 3. **Route across tools**: The agent selects the right tool - Cortex Analyst or Cortex Search - to ensure governed access and compliance with enterprise policies. 2. **Tool use**: With a plan in place, the agent retrieves data efficiently. Cortex Search extracts insights from unstructured sources, while Cortex Analyst generates SQL to process structured data. A comprehensive support for tool identification and tool execution enables delivery of sophisticated applications grounded in enterprise data. 3. **Reflection**: After each tool use, the agent evaluates results to determine the next steps - asking for clarification, iterating, or generating a final response. This orchestration allows it to handle complex data queries while ensuring accuracy and compliance within Snowflake's secure perimeter. 4. **Monitor, evaluate, and iterate**: After deployment, you can track metrics, analyze performance, perform evaluations, and refine behavior for continuous improvements. By monitoring and refining your agent, you can continuously improve performance and response accuracy. For tutorials to help you get started, see [](/user-guide/snowflake-cortex/cortex-agents-tutorials). While Snowflake strives to provide high quality responses, the accuracy of the LLM responses or the citations provided are not guaranteed. You should review all answers from the Agents API before serving them to your users. ## Access control requirements To make a request to Cortex Agent via agent:run API, you can use a role that has the SNOWFLAKE.CORTEX_USER or SNOWFLAKE.CORTEX_AGENT_USER role granted. The CORTEX_USER provides access to all Covered AI features including Cortex Agents whereas CORTEX_AGENT_USER provides access to the Agents feature. You must use the user's default role when calling or updating Cortex Agents. To allow another role to edit the agent, grant USAGE on the database, schema, and agent to that role. ```sql GRANT USAGE ON DATABASE to ROLE ; GRANT USAGE ON SCHEMA . to ROLE ; GRANT USAGE ON AGENT .. to ROLE ; ``` To use Cortex Agents with a semantic model, you also need the following privileges: | Privilege | Object | Notes | | ------------ | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | CREATE AGENT | Schema | Required to create the Cortex Agent. | | USAGE | Cortex Search service | Required to run the Cortex Search services in the Cortex Agents request. | | USAGE | Database, schema, table | Required for access the objects referenced in the Cortex Agents semantic model. | | OWNERSHIP | Agent | OWNERSHIP is a special privilege on an object that is automatically granted to the role that created the object, but can also be transferred using the [](/sql-reference/sql/grant-ownership) command to a different role by the owning role (or any role with the MANAGE GRANTS privilege). In a managed access schema, only the schema owner (for example. the role with the OWNERSHIP privilege on the schema) or a role with the MANAGE GRANTS privilege can grant or revoke privileges on objects in the schema, including future grants. | | MODIFY | Agent | Required to update the Cortex Agent. | | MONITOR | Agent | Required to view threads, logs, and traces of the Cortex Agent. | | USAGE | Agent | Required to query the Cortex Agent to generate responses. | Requests to the Cortex Agents API must include an authorization token. For details on how to authenticate to the API, see [](/developer-guide/snowflake-rest-api/authentication). Note that the example in this topic uses a session token to authenticate to a Snowflake account. **Limiting access to specific roles** By default, the CORTEX_USER role is granted to the PUBLIC role. The PUBLIC role is automatically granted to all users and roles. If you don't want all users to have this privilege, you can revoke access to the PUBLIC role and grant access to specific roles. For more information, see [](#label-cortex-llm--privileges). To provide selective access to Cortex Agents so that only a subset of users have access to the feature, use the CORTEX_AGENTS_USER role. **Limiting access using the Cortex Agents user role** To provide selective access to Cortex Agents for specific users, use the SNOWFLAKE.CORTEX_AGENT_USER database role. This role includes the privileges needed to call the Cortex Agent API. If your user roles have the CORTEX_USER role, you must revoke access to the CORTEX_USER role. To revoke the CORTEX_USER database role from your user roles, run the following command using the ACCOUNTADMIN role: ```sql REVOKE DATABASE ROLE SNOWFLAKE.CORTEX_USER FROM ROLE agent; ``` To provide access to Cortex Agents, use the ACCOUNTADMIN role to do the following: 1. Grant the SNOWFLAKE.CORTEX_AGENT_USER database role to a custom role. 2. Assign this custom role to users. You can't grant database roles directly to users. For more information, see [](/sql-reference/sql/grant-database-role). The following example: 1. Creates the custom role, `cortex_agent_user_role`. 2. Grants it the CORTEX_AGENT_USER database role. 3. Assigns this role to `example_user`. ```sql USE ROLE ACCOUNTADMIN; CREATE ROLE cortex_agent_user_role; GRANT DATABASE ROLE SNOWFLAKE.CORTEX_AGENT_USER TO ROLE cortex_agent_user_role; GRANT ROLE cortex_agent_user_role TO USER example_user; ``` You can also grant access to Cortex Agents through existing roles. For example, if you have an `agent` role used by agents in your organization, you can grant access with a single GRANT statement: ```sql GRANT DATABASE ROLE SNOWFLAKE.CORTEX_AGENT_USER TO ROLE agent; ``` ## Authentication Snowflake REST APIs support authentication via programmatic access tokens (PATs), key pair authentication using JSON Web Tokens (JWTs), and OAuth. For details, see [](/developer-guide/snowflake-rest-api/authentication). Cortex Agents uses models that might not be available in all regions. To access these models, you will have to enable cross-region inference, if feasible. For more information, see [](#label-cortex-llm-availability). Cortex Agent APIs are not supported from within a Streamlit in Snowflake (SiS) application using a warehouse runtime. To call Cortex Agent APIs from a SiS app, use a container runtime instead. For more information, see [](/developer-guide/streamlit/app-development/runtime-environments). ## Cost considerations
Cortex Agents incur charges for the orchestration and use of tools. - The orchestration usage is charged based on the tokens used. - Cortex Analyst is charged per token. - Cortex Search charges depend on the size of the index and the time it has persisted. - Warehouse charges depend on the size of the warehouse and how long it runs.
For more information, see the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf). Also, use of custom tools may incur [warehouse costs](/user-guide/cost-understanding-compute). ## Models You can use the following models with Cortex Agents. If the model is not available in the local region, you must use cross-region inference. When creating an agent, we recommend selecting **auto** for the model. With this option, Cortex automatically selects the highest quality model for your account, and the quality automatically improves as new models become available. - *auto* - *claude-haiku-4-5* - *claude-sonnet-4-5* - *claude-sonnet-4-6* - *claude-4-sonnet* - *openai-gpt-4.1* The following tables show the models that are available for each region: **Cross-region and Cross-cloud:**
***** Indicates a preview function or model. Preview features are not suitable for production workloads. ## Cortex Agent Concepts Cortex Agents use Cortex Analyst, Cortex Search and custom tools to plan tasks and generate responses. You can influence the orchestration with instructions. You can also specify attributes to dynamically select a tool based on business logic. During an interaction, Agents use a thread to maintain context. A thread provides an easy retrieval of the entire conversation context for use in application logic. You can collect feedback from end-users as you continuously iterate and refine the Agent. An explicit feedback mechanism (positive/negative rating) coupled with subjective feedback (text) allows you to capture user inputs throughout the lifecycle of the Agent. ### Agent object The agent configuration includes all metadata, orchestration settings, and tool details that are stored in the agent object. You can use the agent object to interact with the agent. ### Threads Threads persist the context of your interactions with the agent, so you don't have to maintain context on the client application. To use threads, you create a thread object and reference the thread ID in the agent interactions. ### Orchestration Cortex Agents use LLM-based orchestration to plan tasks and generate responses. You can control the orchestration with the following settings: #### Models For information about the models you can use with Cortex Agents for orchestration, see [](#label-cortex-agents-models). #### Instructions Response instructions allow you to configure the agent responses to a brand and tone of your preference. #### Sample questions You can use these questions to seed the conversation in your client application. These are common questions that can get users started with the interaction. ### Tools Cortex Agents can orchestrate across both structured and unstructured data. Also, custom tools allow agents to interact with other backend systems or implement custom logic. Tool names must be between 1 and 64 characters. If you're using a semantic view as a tool, the semantic view name is used as the tool name. #### Cortex Analyst semantic view You can use Cortex Analyst to create SQL queries from natural language. To use Cortex Analyst, you must create a Semantic Model. For more information, see [](#label-copilot-create-semantic-model). #### Cortex Search Service Use Cortex Search to search through your data. For more information, see [](/sql-reference/sql/create-cortex-search). Agents can dynamically adjust the following search parameters if the user's query requires it: filter conditions, metadata columns to retrieve, number of results, per-index queries for multi-index services, and time-decay settings. The DEFAULT_ROLE of the querying user must have USAGE privilege on the Cortex Search Service, as well as the database and schema in which it resides. #### Custom tools You can use stored procedures and user defined functions (UDF) to implement custom business logic as a tool. For more information, see [](/developer-guide/stored-procedure/stored-procedures-overview) and [](/developer-guide/udf/udf-overview). ### Thinking and reflection The Agent emits events throughout the interaction, providing insights into the reasoning process. These steps cover the initial splitting of tasks, sequencing into sub-tasks, and selection of tools for the sub-task. In addition, the agent also surfaces its reflections about tool results and how these influence further orchestration. ### Monitor, evaluate, and iterate You can collect feedback from the end user as a rating (positive/negative), along with any subjective inputs (as text). These can be used to refine and improve the agent over the lifecycle. For more information on how to perform monitoring and evaluation with native Snowflake features, see [](/user-guide/snowflake-cortex/cortex-agents-monitor) and [](/user-guide/snowflake-cortex/cortex-agents-evaluations). ## Web search Before providing web search access to your agents, an ACCOUNTADMIN role must first enable web search access at the account level. To properly enable web search: 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **AI & ML** %raa% **Agents**. 3. Select **Settings**. 4. Select the Web search toggle to enable the feature, as shown below. ![Enable web search toggle](/static/images/cortex-code/enable-websearch.png) After enabling web search at the account level, you can use the web search tool in your agents. For more information, see [](#label-snowflake-agents-create). Cortex Agents use the Brave Web Search API to query the web and retrieve results for real-time information during an interaction. The agent creates a query based on the user’s input and any relevant context from the interaction. The API returns results from Brave Search's independent web index. The query and the results leave Snowflake and traverse the public internet. The agent then incorporates the relevant results into its response alongside any data from other configured tools. Snowflake has enabled zero data retention (ZDR) with Brave, which means no search queries are stored by Brave for any length of time. This applies to the search query text, the results returned, and any metadata associated with the request. ZDR simplifies compliance obligations and reduces risk — because the data is never stored. ## Interact with agents Cortex Agents support two distinct methods of interacting with agents through the REST API: - **Configure an agent object to interact with the agent**: With this method, you first configure an agent object that can be reused for the entire interaction. Configuring an agent object simplifies client code and enables CI/CD for enterprise-ready applications. - **Interact without an agent object**: With this method, you must pass the agent configuration as part of every interaction request. Interaction without an agent object allows you to quickly try out use cases and experiment with different scenarios. For information about these methods, see [](/user-guide/snowflake-cortex/cortex-agents-manage). ## Legal notices Where your configuration of Cortex Agents uses a model provided on the [Model and Service Flow-down Terms](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/ai-features/open-source-model-flow-down-terms/), your use of that model is further subject to the terms for that model on that page. The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [](/guides-overview-ai-features). --- title: Cortex Agents for Microsoft Teams and Microsoft 365 Copilot source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-teams-integration.md section: Snowflake Cortex (AI & ML) --- # Cortex Agents for Microsoft Teams and Microsoft 365 Copilot This feature is not available in the People's Republic of China. ## Introduction For most teams, accessing timely data insights means context-switching between dedicated analytics platforms and communication tools, leading to delays and reduced productivity. Integrating an agentic AI system into Microsoft Teams can bring the answers directly to where conversations and decisions happen, accelerating the flow of information across your business. But building a secure, in-chat analytics solution that is both powerful and intuitive is a significant undertaking. Fortunately, Snowflake has built one for you. The Snowflake Cortex Agents integration for Microsoft Teams and Microsoft 365 Copilot embeds Snowflake's conversational AI agents into your business communication platform. Business teams and non-technical users can interact with their Snowflake structured and unstructured data using simple, natural language to receive direct answers and visualizations without leaving their Teams chats or the broader Microsoft 365 ecosystem. The integration is available via [Microsoft AppSource](https://appsource.microsoft.com/en-us/product/Office365/WA200008996) for seamless deployment. Use the following sections to set up the integration and start using it to get value from your data. For a Quickstart guide, see [Getting Started with Cortex Agents for Microsoft Teams and Microsoft 365 Copilot](https://quickstarts.snowflake.com/guide/getting_started_with_the_microsoft_teams_and_365_copilot_cortex_app). When you use this integration, you are directing Snowflake to send or receive data between the Snowflake Service and Microsoft services (including Microsoft Teams and Microsoft 365 Copilot). Snowflake is not responsible for the privacy, security, or integrity of data once it leaves the Snowflake Service boundary. Your use of Microsoft Teams or Microsoft 365 Copilot, and any data you process with it, is governed solely by the terms between you and Microsoft. ### Key features - **Seamless analytics via natural language.** Delight your business decision-makers by empowering them to get insights themselves within the Microsoft Teams and Microsoft 365 Copilot interfaces. You can discover trends and analyze data without technical expertise or waiting for a custom dashboard to be built. Users can ask questions conversationally and receive accurate, LLM-powered answers in text, tabular, or chart form on the fly, dramatically accelerating data-driven decision-making. - **Dual interfaces for comprehensive workflows.** Cortex Agents for Microsoft Teams offer two distinct interfaces to support different business needs. Use the standard Teams Application for dedicated, in-depth analysis within a Teams Bot application chat, or leverage the Microsoft 365 Copilot Agent to bring targeted Snowflake insights into your wider conversational workflow within the Microsoft 365 Copilot ecosystem. - **Powered by Snowflake Cortex Agents.** This integration is powered by the Snowflake Cortex Agents API, which handles the complexities of generating accurate, reliable insights from your data. The agentic system intelligently interprets user requests and generates responses, saving your teams from having to build complex conversational AI patterns or manage underlying models. You can reuse the same agents you use with [%sf-intelligence%](/user-guide/snowflake-cortex/snowflake-cowork), avoiding duplicate configuration and governance effort. - **Enterprise-grade security and governance.** Built on Snowflake's privacy-first foundation, the integration ensures you can confidently explore AI-driven use cases. This means: - **Your data stays within Snowflake's governance boundary.** User prompts are sent to the Cortex Agents API, but the underlying data queried to generate an answer never leaves Snowflake's secure environment. The resulting SQL query is executed within your Snowflake virtual warehouse. - **Seamless integration with Snowflake's privacy and governance features.** The integration fully respects Snowflake's role-based access control (RBAC). All queries executed on behalf of a user adhere to their established permissions, guaranteeing that users can only see data they are authorized to access. ## Regional availability and limitations The Cortex Agents integration for Microsoft Teams and Microsoft 365 Copilot is available across all Snowflake public cloud deployments. However, there are some regional considerations and current limitations you should be aware of: ### Consent for accounts outside Azure US East 2 When connecting a Snowflake account that is based in a region other than Azure US East 2, administrators are prompted to accept a consent notification during the account setup process. This consent acknowledges that the bot backend infrastructure processes user prompts and bot responses through service hosted in Azure US East 2 region. To withdraw consent, the account must be removed by an administrator through the Teams application interface. The following is the exact consent you will be asked to accept when connecting your Snowflake account to the Teams bot: ``` Data Processing. Use of this integration requires an intermediate processing (but not storage) step in Snowflake's Azure East US 2 region, regardless of the region where your Snowflake account is located. By proceeding, you are authorizing Snowflake to process your data within Snowflake's Azure East US 2 region. For more information on this behavior, please refer to . ``` ### Private Link Private Link configurations are not supported. You must disable Private Link to use this integration. ### Sovereign cloud regions The integration is not available for Snowflake accounts in sovereign cloud regions. ## Set up integration Cortex Agent's Microsoft Teams integration allows organization administrators to connect multiple Snowflake accounts to the Teams and Copilot workspaces in their organizations. Setting up the integration involves a few simple steps, summarized below: 1. **Tenant-wide setup by Azure administrator.** The integration requires a one-time setup by a Microsoft Azure administrator to grant consent for the Snowflake application within the Microsoft Entra ID (formerly Azure Active Directory) tenant. This step enables secure OAuth 2.0 authentication for the integration. 2. **Snowflake security integration.** After the Azure administrator has completed the tenant-wide setup, a Snowflake administrator must configure a security integration for each individual Snowflake account that they wish to connect to the Microsoft Teams or M365 Copilot application. This step ensures that the integration can securely access the necessary data within each Snowflake account. 3. **Linking accounts to the bot.** Once the security integration is configured, the Snowflake administrator can link the Snowflake account to the Microsoft Teams or M365 Copilot bot. This step allows the bot to access the data and functionality of the Snowflake account, enabling users to interact with their data directly within Teams or Copilot. ### Prerequisites Before you begin the integration process, make sure you have established the following: - **Administrator access.** Setup requires administrative access on both Snowflake and your Microsoft tenant. - **Snowflake administrative privileges:** Your Snowflake user must have access to the ACCOUNTADMIN or SECURITYADMIN role. These permissions are required to create the necessary security integration object in your Snowflake account. - **Microsoft administrative privileges:** You Azure user must have Global Administrator privileges (or an equivalent role) for your Microsoft Entra ID tenant. These privileges are required to grant the necessary tenant-wide admin consent for the application. - **Microsoft tenant ID:** You need your organization's Microsoft tenant ID to configure the Snowflake security integration. For more information on finding your organization's Tenant ID, see [Get subscription and tenant IDs in the Azure portal](https://learn.microsoft.com/en-us/azure/azure-portal/get-subscription-tenant-id). - **Individual User Accounts:** Every end user must have their own Microsoft and Snowflake user accounts. - **End-user licensing:** Users must have the appropriate Microsoft licenses to access Microsoft Teams. A Copilot license is also required if you plan to use the integration with Microsoft 365 Copilot. ### Step 1: Tenant-wide Entra ID configuration To enable secure authentication for Cortex Agents, a Microsoft Azure administrator must grant consent for two applications hosted in Snowflake's tenant, creating a *service principal* for each application within your Entra ID tenant. The two applications are: - **Cortex Agents Bot OAuth Resource:** Represents the protected Snowflake API and defines the access permissions (scopes) for client applications. - **Cortex Agents Bot Snowflake OAuth Client:** Represents the client application, in this case the Teams application back end service, that calls the Snowflake API after requesting an access token. Instructions for granting consent for these applications are provided below. The process is very similar for both applications, but the specific permissions and scopes differ slightly. #### Granting consent for OAuth Resource principal To grant consent for the Cortex Agents Bot OAuth Resource application service principal: 1. In your browser, navigate to `https://login.microsoftonline.com//adminconsent?client_id=5a840489-78db-4a42-8772-47be9d833efe`, where tenant-id is your organization's Microsoft tenant ID. If you are not already signed in, you are prompted to do so. A **Permission requested** dialog appears, showing the permission that the application requires. 2. Select **Accept** to grant the requested permission. #### Granting consent for OAuth Client principal This process displays two dialogs. Each is similar to the one for the OAuth Resource principal, but the permissions requested are different. To grant consent for the Cortex Agents Bot Snowflake OAuth Client application service principal: 1. In your browser, navigate to `https://login.microsoftonline.com//adminconsent?client_id=bfdfa2a2-bce5-4aee-ad3d-41ef70eb5086`, where tenant-id is your organization's Microsoft tenant ID. A **Permissions requested (1 of 2)** dialog appears, showing one set of permissions that the application requires. 2. Select **Accept** to grant the requested permissions. The second permission dialog appears (**Permissions requested (2 of 2)**). 3. Select **Accept** to grant the requested permissions. You may see an error message stating that a required query string parameter was missing, like the following. ```text { "error": { "code": "ServiceError", "message": "Missing required query string parameter: code. Url = https://unitedstates.token.botframework.com/.auth/web/redirect?admin_consent=True&tenant=" } } ``` You can safely ignore this error. Consent was still granted successfully. To be sure, confirm the permissions were granted successfully by following the instructions in the next section. #### Confirming permission grants After granting consent for both applications, you can confirm that the permissions were granted successfully by checking the **Enterprise applications** section of the Microsoft Entra ID portal. 1. Log in to the [Microsoft Entra admin center](https://entra.microsoft.com/) if necessary. 2. Navigate to Enterprise Applications by typing “enterprise applications” in the search box, then selecting **Enterprise applications** in the results. 3. In the **All applications** list, find the two applications for which you just granted consent: Snowflake Cortex Agents Bot OAuth Resource and Snowflake Cortex Agents Bot OAuth Client. An easy way to do this is to search for "Snowflake Cortex Agent." If both applications appear in the list, permissions have been correctly granted. If one or both applications are missing, try granting consent again. ### Step 2: Snowflake security integration Integrating Snowflake with Microsoft Teams requires a [security integration](/sql-reference/sql/create-security-integration) that establishes cryptographic trust between your Snowflake account and your Entra ID tenant. This process requires: - Enabling Entra ID as an external OAuth provider in Snowflake. - Choosing or creating at least one Cortex Agent object for the integration. - Granting required roles and privileges so intended users can invoke the agent. #### Enabling Entra ID as an external OAuth provider A Snowflake security integration object represents an integration with an external OAuth provider, in this case Microsoft Entra ID. This integration allows Snowflake to authenticate users who are logged into Microsoft Teams or Copilot. The following SQL statement is an annotated template for creating the integration. This command must be executed by a role with ACCOUNTADMIN privileges. Replace the tenant-id placeholders with your Microsoft Tenant ID. ```sql CREATE OR REPLACE SECURITY INTEGRATION entra_id_cortex_agents_integration TYPE = EXTERNAL_OAUTH ENABLED = TRUE EXTERNAL_OAUTH_TYPE = AZURE EXTERNAL_OAUTH_ISSUER = 'https://login.microsoftonline.com//v2.0' EXTERNAL_OAUTH_JWS_KEYS_URL = 'https://login.microsoftonline.com//discovery/v2.0/keys' EXTERNAL_OAUTH_AUDIENCE_LIST = ('5a840489-78db-4a42-8772-47be9d833efe') EXTERNAL_OAUTH_TOKEN_USER_MAPPING_CLAIM = ('email', 'upn') EXTERNAL_OAUTH_SNOWFLAKE_USER_MAPPING_ATTRIBUTE = 'email_address' EXTERNAL_OAUTH_ANY_ROLE_MODE = 'ENABLE' ``` See [](/sql-reference/sql/create-security-integration-oauth-external) for a complete reference of the parameters available for this command. Together, the EXTERNAL_OAUTH_TOKEN_USER_MAPPING_CLAIM and EXTERNAL_OAUTH_SNOWFLAKE_USER_MAPPING_ATTRIBUTE parameters link an Entra ID identity to a Snowflake identity. For authentication to succeed, the value of the specified claim in the JWT must exactly match the value of the specified attribute on a user object in Snowflake. The two main configurations Snowflake recommends are: - Mapping by User Principal Name (UPN): Set the EXTERNAL_OAUTH_TOKEN_USER_MAPPING_CLAIM parameter to 'upn' and the EXTERNAL_OAUTH_SNOWFLAKE_USER_MAPPING_ATTRIBUTE parameter to 'LOGIN_NAME'. - Mapping by email address: Set the EXTERNAL_OAUTH_TOKEN_USER_MAPPING_CLAIM parameter to 'email' and the EXTERNAL_OAUTH_SNOWFLAKE_USER_MAPPING_ATTRIBUTE parameter to 'EMAIL_ADDRESS'. The example statement above uses the email address mapping configuration, but also specifies UPN in the EXTERNAL_OAUTH_TOKEN_USER_MAPPING_CLAIM parameter, allowing you to change the mapping method by changing only the EXTERNAL_OAUTH_SNOWFLAKE_USER_MAPPING_ATTRIBUTE. The example statement also enables EXTERNAL_OAUTH_ANY_ROLE_MODE, so that the user's default role is used. For more information on OAuth scopes, see [](#label-ext-oauth-scopes). #### User provisioning requirements To ensure successful authentication using the mapping configuration described previously, make sure that a strict one-to-one mapping exists between Entra ID users and Snowflake users. Designate or create a Snowflake user for every Entra ID user who will use the integration. Each Entra ID user must map to exactly one Snowflake user. For email mapping, the Entra ID primary email must exactly match the Snowflake user's EMAIL_ADDRESS. For UPN mapping, the Entra ID UPN must exactly match the Snowflake user's LOGIN_NAME. To reduce manual administration effort, you can optionally configure automatic user provisioning and deprovisioning from Entra ID to Snowflake. See [Configure automatic provisioning](https://learn.microsoft.com/en-us/entra/identity/saas-apps/snowflake-provisioning-tutorial). #### Create and configure the Cortex Agents After you create the security integration, ensure that at least one [Cortex Agent Object](/user-guide/snowflake-cortex/cortex-agents-rest-api) exists in your Snowflake account for the Teams or Microsoft 365 Copilot integration to use. If you already have a working agent that you want to use, no further action is required for this step. To create a new agent, follow the [instructions](#label-snowflake-agents-create). If you already use %sf-intelligence% and have created agents for that experience, you can reuse those agents with the Microsoft Teams and Microsoft 365 Copilot integration. You don't need to recreate or reconfigure them; any changes you make to an agent (such as instructions, tools, underlying objects, or privileges) are immediately reflected across all three interfaces. ##### Grant required privileges to users Make sure the role under which the integration will run (each user's default role or permitted secondary roles) has the grants described in the [access control requirements section](#label-cortex-agents-access-control). ### Step 3: Setting up the Teams app and connecting your Snowflake account The final step in the integration process is to set up the Microsoft Teams application and connect it to the Snowflake users who will use it. This requires you to complete the following tasks: - Install the Cortex Agents app from the Teams store - Connect your Snowflake account to the Teams application #### Install the app from the Teams store All users must install the Cortex Agents app from the Microsoft Teams store. To install the app, search for "Snowflake Cortex Agents" in the Teams app store, then click **Add** to install the app. Depending on your organization's Microsoft Teams policies, a Teams Administrator may need to approve the app before it is available to users. See [Overview of app management and governance in Teams admin center](https://learn.microsoft.com/en-us/microsoftteams/manage-apps) for instructions. #### Connect your Snowflake account to the Teams app The first user to interact with the Cortex Agents app in Teams is prompted to connect their Snowflake account to the app. This user must have the ACCOUNTADMIN or SECURITYADMIN role in Snowflake for this step to succeed. To recap, every user's default role in Snowflake must have the required privileges to access the agent's objects, as described in the [access control requirements section](#label-cortex-agents-access-control) of the Cortex Agents topic. Security integrations block the main Snowflake administrative roles by default. Therefore, you cannot use administrative roles such as ACCOUNTADMIN as the default role for the user that will set up the Teams bot. For information on this restriction, see [BLOCKED_ROLES_LIST](#label-oauth-blocked-roles-list) in the CREATE SECURITY INTEGRATION topic. Snowflake recommends you create a dedicated, non-administrative role with the required permissions and set it as the default for the setup user. Alternatively, use the [SECONDARY ROLES](/sql-reference/sql/use-secondary-roles) mechanism to grant the additional permissions without altering the user's primary default role, as follows: ```sql GRANT ROLE TO USER ; ALTER USER SET DEFAULT_SECONDARY_ROLES = ('ALL'); ``` To set up the Teams bot, follow these steps: 1. Click **I'm the Snowflake administrator**, below the notice stating that an administrator needs to configure Snowflake for the Teams enticement, to begin the process. 2. Provide your Snowflake account URL where indicated, and select **Connect Snowflake account**. To find your account URL, log in to Snowsight and click the account selector in the bottom left corner of the page. The hostname portion of the URL is displayed at the top of the menu and is in the format your-organization-your-account. The full URL is your-organization-your-account.snowflakecomputing.com. The configuration wizard verifies that the URL leads to a valid Snowflake instance and confirms that your user has access to it and has the required administrative privileges. If your account is in a region other than Azure US East 2, you are prompted to accept a consent notification during this process. After the setup passes final validation, the Teams app is connected to your Snowflake account and the agents are ready to use. After you have connected your Snowflake account to the Cortex Teams app, you can connect additional Snowflake accounts to the same app by logging into the Teams app with a user that has the necessary privileges and issuing the "add new account" command in the chat. ## Using the Cortex Agents After the integration is set up, the bot appears in the Microsoft Teams interface, allowing your users to interact with it in a private chat. Users can ask questions in natural language, and the bot responds with answers based on Snowflake data. In Microsoft 365 Copilot, your users can interact with the agents in the context of their broader workflows, asking questions and receiving answers about their Snowflake data within the Copilot interface. ### Available commands In addition to asking natural language questions, Cortex Agent bots accept predefined commands from Microsoft Teams chat. These commands help manage accounts and agents within the Teams interface. The following commands are available:
| Command | Description | | -------------------------- | ----------------------------------------------------------------------------------------------------------------------- | | `Help` | Display a list of available commands and usage instructions. | | `Choose agent` | Switch between available Cortex Agents within the current account. Displays a list of agents you have access to. | | `Logout` | Log out from the current account. | | `Show configured accounts` | Display a list of all configured Snowflake accounts. | | `Clear context` | Clear agent's internal chat history. | | `Starter prompts` | Explore example questions you can ask the chosen agent. | | `Admin Panel` | Display a list of available admin commands for your Snowflake account. | | `Add account` | Connect an additional Snowflake account to the Teams app. Requires administrative privileges on the Snowflake account. | | `Describe account` | Display information about the current Snowflake account. Displays a list of accounts with admin privileges to describe. | | `Remove account` | Disconnect a Snowflake account from the Teams app. Requires administrative privileges. |
Commands are case-insensitive and can be entered conversationally in the Teams chat. For example, you can send `Help` or `help` in the chat to access the help command. ### Feedback on answers (Teams only) Users can provide qualitative feedback on the agent's responses directly in the Microsoft Teams interface (for example, marking an answer as helpful or not helpful and optionally adding a comment). Users can also review the feedback they have previously submitted. For instructions, see [](#label-cortex-agents-view-feedback). The feedback capability is available only in Microsoft Teams and is not supported in the Microsoft 365 Copilot experience. ### Switching between accounts and agents You can connect multiple Snowflake accounts to the integration. Each connected account can expose one or more Cortex Agents. Once the accounts are connected, users can switch among accounts and agents in the Teams UI with a single click; no need to re-authenticate or re-enter connection details. Switching between accounts and agents makes it easier to compare insights across business domains (for example, sales vs. marketing) while preserving each user's security context. You can also switch among agents in an account conversationally (for example, by entering "Choose agent") if you prefer a command interaction instead of the UI. ## Security considerations The Cortex Agents integration for Microsoft Teams is designed with security in mind, leveraging Snowflake's existing security features and Microsoft Entra ID's authentication capabilities. The integration ensures that user data remains secure and that access is controlled through Snowflake's role-based access control (RBAC) system. ### End-to-end authentication flow To understand the security implications of using the Cortex Agents integration for Microsoft Teams, it is important to understand the end-to-end authentication flow. This process involves the following steps: - **User interaction:** A user sends a message to the Snowflake Cortex Agents bot in Microsoft Teams. - **Authentication trigger:** The bot's back end service (the "Client" app) initiates an OAuth 2.0 flow, redirecting the user to the Microsoft Entra ID. - **User authentication:** The user signs in to their Microsoft account with their corporate credentials, satisfying any MFA or Conditional Access policies enforced by their tenant. - **Token issuance:** Entra ID provides a short-lived authorization code. The bot's backend securely exchanges this code for a JWT access token. - **API call to Snowflake:** The bot back end calls the Snowflake Cortex Agents API, including the access token in the `Authorization: Bearer` header. - **Snowflake token validation:** The Snowflake service receives the request and validates the JWT against the policy defined in the Snowflake security integration object. ### Role-Based Access Control Because it uses the Cortex Agents API under a specific user role, the Teams integration executes Cortex Agents requests with the exact privileges of the user's designated Snowflake role. The agent inherits all existing data governance controls, including: - **Role-Based Access Control:** The agent can only access databases, schemas, tables, and warehouses that the user's role permits them to use. - **Data masking policies:** The agent respects dynamic data masking policies, granting access only when allowed by the user's role. - **Row-Level access policies:** The agent enforces row-level security policies. The agent cannot bypass any existing Snowflake security controls, and users cannot access data that they are not already authorized to see. ### Network policies The integration supports Snowflake [network policies](/user-guide/network-policies) by forwarding the client IP address received from Microsoft to Snowflake for policy enforcement. Network policies allow administrators to control inbound access to the Snowflake service by restricting connections based on IP addresses and other network identifiers. The Cortex Agents integration for Microsoft Teams and Microsoft 365 Copilot does not create, modify, or activate any network policies on your Snowflake account; it only respects the network policies that exist in your Snowflake instance. Network policy configuration is entirely under the control of your Snowflake account administrators. When a user signs in to the Cortex Agents bot, Microsoft issues a token that includes an `ipaddr` claim representing the user's IP address at the time of sign-in. The integration forwards this IP address to Snowflake with each request, allowing Snowflake to enforce any network policies that rely on client IP information. Microsoft might periodically issue additional tokens with the same IP address for the duration of the user's session. The IP address claim in the token is updated only when a user completely signs out and back in within the bot. The IP address used for network policy enforcement reflects the user's address at the time of Microsoft sign-in and does not update if the user changes their IP address (for example, by connecting to a different network or by connecting to or disconnecting from a VPN) during their session with the bot, unless otherwise controlled by your Microsoft tenant configuration. Snowflake continues enforcing network policies against the original IP address until the user explicitly signs out of the bot and signs back in. In Snowsight, a client IP change typically invalidates the session immediately when network policies are enabled. In the Microsoft Teams and Microsoft 365 Copilot integration, session persistence and IP refresh behavior are controlled by Microsoft. ## Current limitations
OAuth identity provider must be Entra ID
The integration exclusively supports Microsoft Entra ID as the identity provider for authentication and requires a direct one-to-one mapping between Entra ID users and Snowflake users. Organizations that use another primary IdP (for example, Okta or another SAML/OIDC provider) can enable this integration by configuring standard identity federation between that provider and Microsoft Entra ID. In this federated model, the primary IdP handles the user's sign-in, after which Entra ID issues the final token required by the integration.
Default user role reliance
The integration's functionality is tied to each user's default Snowflake role due to an architectural constraint in the Cortex Agents API, which determines session permissions based on the role context established during authentication. Therefore, the user's default role must be granted all necessary privileges on the underlying objects for the agent to function correctly. While Snowflake's [secondary roles](#label-access-control-role-enforcement) feature can help to broaden data access, the primary execution context is governed by the user's default role.
## Troubleshooting If you encounter issues with the Cortex Agents integration for Microsoft Teams, check the following sections for possible solutions. ### Privilege and access issues The user's default role must have the required privileges to access the objects used or accessed by the agent. Error messages caused by access issues typically include the phrase "database object does not exist or not authorized." Troubleshooting such issues involves checking that user's default role is set to a role that has the required privileges. #### Default role setting The first step in troubleshooting access issues is to check the user's default role setting. To verify this setting, use the DESCRIBE USER command. Check the DEFAULT_ROLE property in the output. If the user's default role is incorrect, change it using the ALTER USER command. ```sql ALTER USER SET DEFAULT_ROLE = ''; ``` If changing the user's primary DEFAULT_ROLE is not feasible, you can use the Snowflake's secondary roles mechanism. A user can perform actions using the combined privileges of their primary and active secondary roles. This lets you to grant an additional, integration-specific role to the user without altering their primary role. To add a secondary role for the Cortex Agents integration, use SQL commands like the following. ```sql GRANT ROLE TO USER ; ALTER USER SET DEFAULT_SECONDARY_ROLES = ('ALL'); ``` #### Required permissions Make sure the role under which the integration will run (each user's default role or permitted secondary roles) has the grants described in the [access control requirements section](#label-cortex-agents-access-control). ### Security integration issues A Snowflake security integration connects the Microsoft Entra ID tenant to the Snowflake account. The issues in this section are related to the security integration. #### Invalid OAuth access token (error code 390303) This error can indicate that one or more property values in the security integration are incorrect, preventing Snowflake from validating the access token received from Entra ID. To rectify this, check the following fields in the security integration. In particular, make sure the tenant ID is correct in the URLs. - **EXTERNAL_OAUTH_ISSUER:** This must be set to the correct Entra ID issuer URL, which is in the format https://login.microsoftonline.com/tenant-id/v2.0, where tenant-id is your organization's Microsoft tenant ID. - **EXTERNAL_OAUTH_JWS_KEYS_URL:** This must be set to the correct JWS keys URL, which is in the format https://login.microsoftonline.com/tenant-id/discovery/v2.0/keys, where tenant-id is your organization's Microsoft tenant ID. - **EXTERNAL_OAUTH_AUDIENCE_LIST:** This must include the correct audience for the Cortex Agents Bot OAuth Resource application, which is the application ID `5a840489-78db-4a42-8772-47be9d833efe`. Update any incorrect values using the ALTER SECURITY INTEGRATION command. #### Incorrect username or password (error code 390304) This error message points to a mismatch between the user identifier sent by Entra ID and the corresponding user's record in Snowflake, usually because the Entra ID user identity does not map to exactly one Snowflake user. This can happen when the Snowflake user does not exist, when the mapped UPN or email address is incorrect, or when the mapping resolves to multiple Snowflake users (for example, if the mapping is performed using email address and multiple users share the same address). The error message includes the UPN and email of the user attempting to log in. Use this information to verify the affected user's configuration using the DESCRIBE USER command. Make sure the user's NAME or EMAIL property matches the value of the same property in Entra ID for the corresponding user. When using email address mapping, each user in the Snowflake account that will use the integration must have a unique email address. #### Role not listed in the access token or was filtered out (error code 390317) This error occurs when Snowflake cannot assign a role to the user based on the information in the OAuth access token. The access token is configured with the `session:role-any` scope, which allows the user to assume any of their assigned roles in Snowflake. However, the security integration must be explicitly configured to permit this behavior. Use the DESCRIBE SECURITY INTEGRATION command to check the value of the EXTERNAL_OAUTH_ANY_ROLE_MODE property, then change it to `ENABLE` or `ENABLE_FOR_LOGIN`. ```sql DESCRIBE SECURITY INTEGRATION entra_id_cortex_agents_integration; ALTER SECURITY INTEGRATION entra_id_cortex_agents_integration SET EXTERNAL_OAUTH_ANY_ROLE_MODE = 'ENABLE'; ``` #### Role specified in the connect string is not granted to this user (error code 390186) This error occurs when Snowflake security integration doesn't allow the user's default role to use the security integration. To resolve this, check the following properties in the output of DESCRIBE SECURITY INTEGRATION: - EXTERNAL_OAUTH_ALLOWED_ROLES_LIST: If the parameter is enabled, verify that it contains the user's default role. - EXTERNAL_OAUTH_BLOCKED_ROLES_LIST: If the parameter is enabled, verify that it does not contain the user's default role. ### Network policy issues If a user is blocked by a network policy when using the Cortex Agents integration for Microsoft Teams or Microsoft 365 Copilot, try the following steps: 1. **Verify that the user's IP address is allowlisted.** Confirm that the user's current IP address is included in the account's network policy. A simple way to test this is to have the user log in to their Snowflake account directly at [Snowflake](https://app.snowflake.com/). If the user can log in successfully, their IP address is allowlisted. 2. **Check for IPv6 addresses.** If you encounter an IPv6 address in an error related to a network policy, this indicates that Microsoft is sending an IPv6 address as a claim within the authentication token. To allow IPv6 addresses, create a network rule with `TYPE = IPV6` and add it to your network policy. The account parameter `ENABLE_IPV6_NETWORK_RULES` must be set to `TRUE`. For more information, see [](/user-guide/network-rules). 3. **Refresh the Entra ID token.** The bot may be using a token with an outdated IP address. To force a token refresh, have the user type `/logout` in the chat window, then type `/login` and sign in to Microsoft again. --- title: Cortex Agents REST API source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-rest-api.md section: Snowflake Cortex (AI & ML) --- # Cortex Agents REST API This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/cortex-agents) Requests to the Cortex Agent REST API time out after 15 minutes. You can use the Cortex Agent REST API to create, manage, and interact with Cortex Agent Objects in your Snowflake account. ## Create Cortex Agent `POST /api/v2/databases/{database}/schemas/{schema}/agents` Creates a new Cortex Agent Object with the specified attributes and specification. ### Request #### Path parameters
#### Query parameters
#### Request headers
#### Request body
**Example** ```json { "name": "MY_AGENT", "comment": "An agent to answer questions about all my data", "profile": { "display_name": "My Agent" }, "models": { "orchestration": "claude-4-sonnet" }, "instructions": { "response": "You will respond in a friendly but concise manner", "orchestration": "For any query related to revenue we should use Analyst; For all policy questions we should use Search" }, "orchestration": { "budget": { "seconds": 30, "tokens": 16000 } }, "tools": [ { "tool_spec": { "type": "generic", "name": "get_revenue", "description": "Fetch the delivery revenue for a location.", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } } }, "required": [ "location" ] } } ], "tool_resources": { "get_revenue": { "type": "function", "execution_environment": { "type": "warehouse", "warehouse": "MY_WH" }, "identifier": "DB.SCHEMA.UDF" } } } ``` ### Response A successful response returns a JSON object with details about the status of Cortex Agent creation. #### Response body ```json {"status": "Agent xxxx successfully created."} ``` ## Describe Cortex Agent `GET /api/v2/databases/{database}/schemas/{schema}/agents/{name}` Describes a Cortex Agent. ### Request #### Path parameters
#### Request headers
### Response A successful response returns a JSON object describing the Cortex Agent. #### Response headers
#### Response body The response body contains the details of the Cortex Agent. ```json { "agent_spec": "{\"models\":{\"orchestration\":\"llama3.1-70B\"},\"experimental\":{\"foo\":\"bar\",\"nested\":{\"key\":\"value\"}},\"orchestration\":{\"budget\":{\"seconds\":30,\"tokens\":16000}},\"instructions\":{\"response\":\"You will respond in a friendly but concise manner\",\"orchestration\":\"For any revenue question use Analyst; for policy use Search\",\"sample_questions\":[{\"question\":\"question 1\"},{\"question\":\"question 2\"},{\"question\":\"question 3\"}]},\"tools\":[{\"tool_spec\":{\"type\":\"cortex_analyst_text_to_sql\",\"name\":\"Analyst1\",\"description\":\"test\"}},{\"tool_spec\":{\"type\":\"cortex_analyst_sql_exec\",\"name\":\"SQL_exec1\"}},{\"tool_spec\":{\"type\":\"cortex_search\",\"name\":\"Search1\"}},{\"tool_spec\":{\"type\":\"web_search\",\"name\":\"web_search_1\"}},{\"tool_spec\":{\"type\":\"generic\",\"name\":\"get_weather\",\"input_schema\":{\"type\":\"object\",\"properties\":{\"location\":{\"type\":\"string\",\"description\":\"The city and state\"}},\"required\":[\"Location\"]}}}],\"tool_unable_to_answer\":\"I don't know the answer to that\",\"tool_resources\":{\"Analyst1\":{\"semantic_model_file\":\"stage1\"},\"Analyst2\":{\"semantic_view\":\"db.schema.semantic_view\"},\"Search1\":{\"name\":\"db.schema.service_name\",\"Max_results\":\"5\",\"filter\":{\"@eq\":{\"region\":\"North America\"}},\"Title_column\":\"\",\"ID_column\":\"\"},\"SQL_exec1\":{\"Name\":\"my_warehouse\",\"Timeout\":\"30\",\"AutoExecute\":\"true\"},\"web_search\":{\"name\":\"web_search_1\",\"Function\":\"db/schema/search_web\"}}}", "name": "MY_AGENT1", "database_name": "TEST_DATABASE", "schema_name": "TEST_SCHEMA", "owner": "ACCOUNTADMIN", "created_on": "1967-06-23T07:00:00.123+00:00" } ``` ## Update Cortex Agent `PUT /api/v2/databases/{database}/schemas/{schema}/agents/{name}` Updates an existing Cortex Agent with the specified attributes and specification. ### Request #### Path parameters
#### Request headers
#### Request body
**Example** ```json { "comment": "An agent to answer questions about all my data", "profile": { "display_name": "My Agent" }, "models": { "orchestration": "claude-4-sonnet" }, "instructions": { "response": "You will respond in a friendly but concise manner", "orchestration": "For any query related to revenue we should use Analyst; For all policy questions we should use Search" }, "orchestration": { "budget": { "seconds": 30, "tokens": 16000 } }, "tools": [ { "tool_spec": { "type": "generic", "name": "get_revenue", "description": "Fetch the delivery revenue for a location.", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } } }, "required": [ "location" ] } } ], "tool_resources": { "get_revenue": { "type": "function", "execution_environment": { "type": "warehouse", "warehouse": "MY_WH" }, "identifier": "DB.SCHEMA.UDF" } } } ``` ### Response A successful response returns a JSON object with details about the status of Cortex Agent update. #### Response body ```json {"status": "Agent xxxx successfully updated."} ``` ## List Cortex Agents `GET /api/v2/databases/{database}/schemas/{schema}/agents` Lists the Cortex Agents under the specified database and schema. ### Request #### Path parameters
#### Query parameters
#### Request headers
### Response A successful response returns a JSON array of Cortex Agent resources. #### Response headers
#### Response body ```json [ { "name": "my_agent", "database": "TEST_DB", "schema": "TEST_SCHEMA", "created_on": "2024-06-01T12:00:00Z", "owner": "ACCOUNTADMIN", "comment": "Sample agent", "profile": {"display_name": "My Agent", "avatar": null, "color": null} }, { "name": "another_agent", "database": "TEST_DB", "schema": "TEST_SCHEMA", "created_on": "2024-06-02T08:30:00Z", "owner": "SYSADMIN", "comment": "", "profile": {"display_name": "Another Agent", "avatar": null, "color": null} } ] ``` ## Delete Cortex Agent `DELETE /api/v2/databases/{database}/schemas/{schema}/agents/{name}` Deletes a Cortex Agent with the specified name. If the `ifExists` parameter is set to `true`, the operation succeeds even if the agent does not exist. Otherwise, the operation fails if the agent cannot be deleted. ### Request #### Path parameters
#### Query parameters
#### Request headers
### Response A successful response returns a confirmation message. #### Response body ```json { "status": "Request successfully completed" } ``` ## Schemas # `AgentInstructions`
**Example** ```json { "response": "You will respond in a friendly but concise manner", "orchestration": "For any query related to revenue we should use Analyst; For all policy questions we should use Search" } ``` ## `AgentProfile` The profile information for a Data Cortex agent.
**Example** ```json { "display_name": "My Agent" } ``` ## `BudgetConfig`
**Example** ```json { "seconds": 30, "tokens": 16000 } ``` ## `ExecutionEnvironment` Configuration for server-executed tools.
**Example** ```json { "type": "warehouse", "warehouse": "MY_WAREHOUSE", "query_timeout": 60 } ``` ## `ModelConfig`
**Example** ```json { "orchestration": "claude-4-sonnet" } ``` ## `OrchestrationConfig`
**Example** ```json { "budget": { "seconds": 30, "tokens": 16000 } } ``` ## `Tool` Defines a tool that can be used by the agent. Tools provide specific capabilities like data analysis, search, or generic functions.
**Example** ```json { "tool_spec": { "type": "generic", "name": "get_revenue", "description": "Fetch the delivery revenue for a location.", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } } }, "required": [ "location" ] } } ``` ## `ToolInputSchema`
**Example** ```json { "type": "object", "description": "Input for my custom tool", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "items": {}, "required": [ "location" ] } ``` ## `ToolResource`
**cortex_analyst_text_to_sql:** Configuration for text-to-SQL analysis tool. Provides parameters for SQL query generation and execution. Exactly one of semantic_model_file or semantic_view must be provided.
**Example** ```json { "semantic_model_file": "@db.schema.stage/semantic_model.yaml", "semantic_view": "db.schema.semantic_view", "execution_environment": { "type": "warehouse", "warehouse": "MY_WAREHOUSE", "query_timeout": 60 } } ``` **cortex_search:** Configuration for search functionality. Defines how document search and retrieval should be performed.
**Example** ```json { "search_service": "database.schema.service_name", "title_column": "account_name", "id_column": "account_id", "filter": { "@eq": { "": "" } } } ``` **generic:**
**Example** ```json { "type": "function", "execution_environment": { "type": "warehouse", "warehouse": "MY_WAREHOUSE", "query_timeout": 60 }, "identifier": "MY_DB.MY_SCHEMA.MY_UDF" } ``` **web_search:** Configuration for web search functionality.
**Example** ```json { "max_results": 20 } ``` ## `ToolSpec` Specification of the tool's type, configuration, and input requirements.
**Example** ```json { "type": "generic", "name": "get_weather", "description": "lorem ipsum", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "required": [ "location" ] } } ``` --- title: Cortex Agents Run API source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-run.md section: Snowflake Cortex (AI & ML) --- # Cortex Agents Run API This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/cortex-agents) - [](/sql-reference/functions/data_agent_run-snowflake-cortex) Requests to the Cortex Agent REST API time out after 15 minutes. There are two methods to interact with an Agent: - Build an agent object and reference this agent object in a request to the `agent:run` API. - Call `agent:run` directly without an agent object. You provide the configuration in the request body of `agent:run`. `agent:run` supports **streaming responses by default**. To disable streaming and receive a single JSON response, set `stream` to `false`. You can also run agents using SQL with the [DATA_AGENT_RUN](/sql-reference/functions/data_agent_run-snowflake-cortex) function. The SQL function returns a non-streaming JSON response and doesn't require a REST client. For most use cases, Snowflake recommends the REST API. ## Agent run request with agent object `POST /api/v2/databases/{database}/schemas/{schema}/agents/{name}:run` Sends a user query to the agent object and returns its response. By default, the API streams responses as server-sent events (SSE). To receive a single JSON response, set `stream` to `false` in the request body. You can't set, update, or overwrite the `models`, `instructions`, and `orchestration` fields using this request. To update these fields, you must use [](#label-snowflake-agents-rest-api-update). ### Path parameters
### Request headers
### Request body
**Example** ```json { "thread_id": 0, "parent_message_id": 0, "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What is the total revenue for 2023?" } ], "status": "completed", "error": { "code": "399504", "message": "Error during execution" } } ], "stream": false, "tool_choice": { "type": "auto", "name": [ "analyst_tool", "search_tool" ] } } ``` The request body supports an optional `stream` boolean field: - If `stream` is omitted, it defaults to `true` and the response is streamed as SSE events. - If `stream` is `false`, the API returns a single JSON object (see [](#label-snowflake-agents-run-non-streaming-response)). ## Agent run without an agent object `POST /api/v2/cortex/agent:run` Sends a user query to the Cortex Agents service provided in the request body and returns its response. Interacts with the agent without creating an agent object. Before September 1st, 2025, the request and response schemas for the `agent:run` API were different from the schema listed in this document. Previously, the orchestration was static and the same sequence of tools was used to generate an answer. `agent:run` now has an updated schema for both the request and response. In addition, the API now dynamically orchestrates and iterates to arrive at the final response. We recommend using the schema described in this document for an improved end-user experience. To use the legacy schema and behavior, use the following schema: ```json { "model": "claude-4-sonnet", "messages": [ {"role":"user", "content": [] } ] } ``` ### Request headers
### Request body
**Example** ```json { "thread_id": 0, "parent_message_id": 0, "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What is the total revenue for 2023?" } ], "status": "completed", "error": { "code": "399504", "message": "Error during execution" } } ], "stream": false, "tool_choice": { "type": "auto", "name": [ "analyst_tool", "search_tool" ] }, "models": { "orchestration": "claude-4-sonnet" }, "instructions": { "response": "You will respond in a friendly but concise manner", "orchestration": "For any query related to revenue we should use Analyst; For all policy questions we should use Search" }, "orchestration": { "budget": { "seconds": 30, "tokens": 16000 } }, "tools": [ { "tool_spec": { "type": "generic", "name": "get_revenue", "description": "Fetch the delivery revenue for a location.", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } } }, "required": [ "location" ] } } ], "tool_resources": { "get_revenue": { "type": "function", "execution_environment": { "type": "warehouse", "warehouse": "MY_WH" }, "identifier": "DB.SCHEMA.UDF" } } } ``` The request body supports an optional `stream` boolean field: - If `stream` is omitted, it defaults to `true` and the response is streamed as SSE events. - If `stream` is `false`, the API returns a single JSON object (see [](#label-snowflake-agents-run-non-streaming-response)). ## Streaming responses The `agent:run` API provides streaming responses. The server streams back events. This allows you to display responses in your application, token-by-token, as they are generated by the Agent. Each event streamed in the API response has a strictly typed schema. You can find a list of all of the events in the following section and select to which ones you'd like to subscribe. The last event sent by the API is a `response` event. This event contains the entire agent output. You can use this as the agent's final response. For any non-streaming clients, you can subscribe to this event because it is the logical aggregation of all prior events. If you don't want to use streaming responses, wait for the `response` event and ignore all prior events. The majority of the other events streamed can be split into two categories: *Delta* and *Content Items*. *Delta* events represent a single token generated by the Agent. By listening to these events, you can create a typewriter effect. The main delta events are *response.thinking.delta*, which represents a reasoning token, and *response.text.delta*, which represent an answer token. *Content Item* events represent elements from the *content* array in the final agent response. Make sure your application can handle unknown event types. **Example Response** ``` event: response.status data: {"message":"Planning the next steps","status":"planning"} event: response.thinking.delta data: {"content_index":0,"text":"\nThe user is asking for a"} event: response.thinking.delta data: {"content_index":0,"text":" chart showing the"} ... ... ... event: response.status data: {"message":"Reviewing the results","status":"reasoning_agent_stop"} event: response.status data: {"message":"Forming the answer","status":"proceeding_to_answer"} ``` # `response` Event streamed when the final response is available. This is the last event emitted, it represents the aggregation of all other events previously streamed.
**Example** ```json { "role": "assistant", "content": [ { "type": "asset" } ], "warnings": [ { "message": "Unable to fetch tools from MCP server 'foo'. Response quality may be degraded.", "code": "003001" } ], "metadata": { "usage": { "tokens_consumed": [ { "model_name": "llama3.1-70b", "input_tokens": { "total": 175, "cache_read": 50, "cache_write": 25, "uncached": 100 }, "output_tokens": { "total": 75 }, "context_window": 128000 } ] }, "run_id": "4264-83472", "thread_id": 4264, "user_message_id": 83472, "assistant_message_id": 83473 }, "status": "completed" } ``` ## `response.text` An event streamed when a text content block is done streaming, including all the aggregated deltas for a particular content index.
**Example** ```json { "content_index": 0, "text": "Lorem ipsum dolor...", "annotations": [ { "type": "cortex_search_citation", "index": 0, "search_result_id": "cs_61987ff6-6d56-4695-83c0-1e7cfed818c7", "doc_id": "4ac085cb-82d0-4eb4-94f3-2672aa0599a2", "doc_title": "Earnings Report", "text": "The revenue for 2025 was..." } ], "is_elicitation": false } ``` ## `response.text.delta` Event streamed when a new output text delta is generated.
**Example** ```json { "content_index": 0, "text": "Hello", "is_elicitation": false } ``` ## `response.text.annotation` Event streamed when an annotation is added to a text content.
**Example** ```json { "content_index": 0, "annotation_index": 0, "annotation": { "type": "cortex_search_citation", "index": 0, "search_result_id": "cs_61987ff6-6d56-4695-83c0-1e7cfed818c7", "doc_id": "4ac085cb-82d0-4eb4-94f3-2672aa0599a2", "doc_title": "Earnings Report", "text": "The revenue for 2025 was..." } } ``` ## `response.thinking` An event streamed when a thinking content block is done streaming, including all the aggregated deltas for a particular content index.
**Example** ```json { "content_index": 0, "text": "To answer your question I must...", "signature": "lorem ipsum" } ``` ## `response.thinking.delta` Event streamed when a thinking delta is generated.
**Example** ```json { "content_index": 0, "text": "lorem ipsum", "signature": "lorem ipsum" } ``` ## `response.tool_use` An event streamed when the agent requests a tool use.
**Example** ```json { "content_index": 0, "tool_use_id": "toolu_123", "type": "cortex_analyst_text_to_sql", "name": "my_cortex_analyst_semantic_view", "input": { "location": "San Francisco, CA" }, "client_side_execute": "true", "permission": { "options": [ "Allow Once", "Deny" ] } } ``` ## `response.tool_result` Event streamed when a tool finishes executing, including the tool result.
**Example** ```json { "content_index": 0, "tool_use_id": "toolu_123", "type": "cortex_analyst_text_to_sql", "name": "my_cortex_analyst_semantic_view", "content": [ { "type": "json", "json": { "answer": 42 } } ], "status": "success" } ``` ## `response.tool_result.status` Status update for a specific tool use.
**Example** ```json { "tool_use_id": "toolu_123", "tool_type": "cortex_analyst_text_to_sql", "status": "Executing SQL", "message": "Executing query 'SELECT * FROM my_table'", "details": {} } ``` ## `response.tool_result.analyst.delta` An delta event streamed for the Cortex Analyst tool execution
**Example** ```json { "content_index": 0, "tool_use_id": "toolu_123", "tool_type": "cortex_analyst_text_to_sql", "tool_name": "my_cortex_analyst_semantic_view", "delta": { "text": "The...", "think": "Thinking...", "sql": "SELECT...", "sql_explanation": "This...", "query_id": "707787a0-a684-4ead-adb0-3c3b62b043d9", "verified_query_used": false, "result_set": { "statementHandle": "707787a0-a684-4ead-adb0-3c3b62b043d9", "resultSetMetaData": { "partition": 0, "numRows": 0, "format": "jsonv2", "rowType": [ { "name": "my_column", "type": "VARCHAR", "length": 0, "precision": 0, "scale": 0, "nullable": false } ] }, "data": [ [ "row1 col1", "row1 col2" ], [ "row2 col1", "row2 col2" ] ] }, "suggestions": { "index": 0, "delta": "What..." } } } ``` ## `response.table` An event streamed when a table content block is added.
**Example** ```json { "content_index": 0, "tool_use_id": "toolu_123", "query_id": "6ac75378-6337-48a6-80ab-6de48dd680eb", "result_set": { "statementHandle": "707787a0-a684-4ead-adb0-3c3b62b043d9", "resultSetMetaData": { "partition": 0, "numRows": 0, "format": "jsonv2", "rowType": [ { "name": "my_column", "type": "VARCHAR", "length": 0, "precision": 0, "scale": 0, "nullable": false } ] }, "data": [ [ "row1 col1", "row1 col2" ], [ "row2 col1", "row2 col2" ] ] }, "title": "Revenue by Month" } ``` ## `response.chart` An event streamed when a chart content block is added.
**Example** ```json { "content_index": 0, "tool_use_id": "toolu_123", "chart_spec": "{\"$schema\":\"https://vega.github.io/schema/vega-lite/v5.json\",\"data\":{...},\"mark\":\"bar\"}" } ``` ## `response.status` Status update for the agent execution.
**Example** ```json { "status": "executing_tool", "message": "Executing tool `my_analyst_tool`" } ``` ## `response.warning` Sent when a non-fatal warning occurs. The stream continues after this event.
**Example** ```json { "message": "Unable to fetch tools from MCP server 'foo'. Response quality may be degraded.", "code": "003001" } ``` ## `error` Sent when a fatal error is encountered.
**Example** ```json { "code": "399504", "error_code": "lorem ipsum", "message": "Error during execution", "request_id": "61987ff6-6d56-4695-83c0-1e7cfed818c7" } ``` ## `metadata` Metadata about the request. This event is sent when a message is added to the thread. It is useful for getting the `parent_message_id` to use in following requests to the Agents API.
**Example** ```json { "metadata": { "role": "user", "message_id": 83472, "run_id": "4264-83472" } } ``` ## Schemas # `AgentInstructions`
**Example** ```json { "response": "You will respond in a friendly but concise manner", "orchestration": "For any query related to revenue we should use Analyst; For all policy questions we should use Search" } ``` ## `Annotation`
**cortex_search_citation:**
**Example** ```json { "type": "cortex_search_citation", "index": 0, "search_result_id": "cs_61987ff6-6d56-4695-83c0-1e7cfed818c7", "doc_id": "4ac085cb-82d0-4eb4-94f3-2672aa0599a2", "doc_title": "Earnings Report", "text": "The revenue for 2025 was..." } ``` ## `BudgetConfig`
**Example** ```json { "seconds": 30, "tokens": 16000 } ``` ## `ChartContent`
**Example** ```json { "tool_use_id": "toolu_123", "chart_spec": "{\"$schema\":\"https://vega.github.io/schema/vega-lite/v5.json\",\"data\":{...},\"mark\":\"bar\"}" } ``` ## `CortexAnalystSuggestionDelta`
**Example** ```json { "index": 0, "delta": "What..." } ``` ## `CortexAnalystToolResultDelta`
**Example** ```json { "text": "The...", "think": "Thinking...", "sql": "SELECT...", "sql_explanation": "This...", "query_id": "707787a0-a684-4ead-adb0-3c3b62b043d9", "verified_query_used": false, "result_set": { "statementHandle": "707787a0-a684-4ead-adb0-3c3b62b043d9", "resultSetMetaData": { "partition": 0, "numRows": 0, "format": "jsonv2", "rowType": [ { "name": "my_column", "type": "VARCHAR", "length": 0, "precision": 0, "scale": 0, "nullable": false } ] }, "data": [ [ "row1 col1", "row1 col2" ], [ "row2 col1", "row2 col2" ] ] }, "suggestions": { "index": 0, "delta": "What..." } } ``` ## `ExecutionEnvironment` Configuration for server-executed tools.
**Example** ```json { "type": "warehouse", "warehouse": "MY_WAREHOUSE", "query_timeout": 60 } ``` ## `InputTokens` Input token breakdown by cache usage.
**Example** ```json { "total": 175, "cache_read": 50, "cache_write": 25, "uncached": 100 } ``` ## `Message` Represents a single message in the conversation. Can be either from the user or the assistant.
**Example** ```json { "role": "user", "content": [ { "type": "text", "text": "What is the total revenue for 2023?" } ], "status": "completed", "error": { "code": "399504", "message": "Error during execution" } } ``` ## `MessageContentItem`
**chart:**
**Example** ```json { "type": "chart", "chart": { "tool_use_id": "toolu_123", "chart_spec": "{\"$schema\":\"https://vega.github.io/schema/vega-lite/v5.json\",\"data\":{...},\"mark\":\"bar\"}" } } ``` **permission_decision:** A user's decision to grant or deny permission for a tool execution that had permission options. Sent by the client in the next request after receiving a tool_use event with non-empty permission options.
**Example** ```json { "type": "permission_decision", "permission_decision": { "tool_use_id": "toolu_abc123", "decision": "Allow Once", "reason": "I don't want to modify production config files" } } ``` **table:**
**Example** ```json { "type": "table", "table": { "tool_use_id": "toolu_123", "query_id": "6ac75378-6337-48a6-80ab-6de48dd680eb", "result_set": { "statementHandle": "707787a0-a684-4ead-adb0-3c3b62b043d9", "resultSetMetaData": { "partition": 0, "numRows": 0, "format": "jsonv2", "rowType": [ { "name": "my_column", "type": "VARCHAR", "length": 0, "precision": 0, "scale": 0, "nullable": false } ] }, "data": [ [ "row1 col1", "row1 col2" ], [ "row2 col1", "row2 col2" ] ] }, "title": "Revenue by Month" } } ``` **text:**
**Example** ```json { "text": "Lorem ipsum dolor...", "annotations": [ { "type": "cortex_search_citation", "index": 0, "search_result_id": "cs_61987ff6-6d56-4695-83c0-1e7cfed818c7", "doc_id": "4ac085cb-82d0-4eb4-94f3-2672aa0599a2", "doc_title": "Earnings Report", "text": "The revenue for 2025 was..." } ], "is_elicitation": false, "type": "text" } ``` **thinking:**
**Example** ```json { "type": "thinking", "thinking": { "text": "To answer your question I must...", "signature": "lorem ipsum" } } ``` **tool_result:**
**Example** ```json { "type": "tool_result", "tool_result": { "tool_use_id": "toolu_123", "type": "cortex_analyst_text_to_sql", "name": "my_cortex_analyst_semantic_view", "content": [ { "type": "json", "json": { "answer": 42 } } ], "status": "success" } } ``` **tool_use:**
**Example** ```json { "type": "tool_use", "tool_use": { "tool_use_id": "toolu_123", "type": "cortex_analyst_text_to_sql", "name": "my_cortex_analyst_semantic_view", "input": { "location": "San Francisco, CA" }, "client_side_execute": "true", "permission": { "options": [ "Allow Once", "Deny" ] } } } ``` ## `MessageError` Error details associated with a message that terminated with an error.
**Example** ```json { "code": "399504", "message": "Error during execution" } ``` ## `Metadata`
**Example** ```json { "role": "user", "message_id": 83472, "run_id": "4264-83472" } ``` ## `ModelConfig`
**Example** ```json { "orchestration": "claude-4-sonnet" } ``` ## `OrchestrationConfig`
**Example** ```json { "budget": { "seconds": 30, "tokens": 16000 } } ``` ## `OutputTokens` Output token details.
**Example** ```json { "total": 75 } ``` ## `PermissionDecision` Contains the user's decision on whether to allow a specific tool execution. The decision field must match one of the options from the tool_use event's permission.options list (e.g. \"Allow Once\" to approve, \"Deny\" to deny). If denied, an optional reason can be provided which will be shown to the LLM.
**Example** ```json { "tool_use_id": "toolu_abc123", "decision": "Allow Once", "reason": "I don't want to modify production config files" } ``` ## `ResponseMetadata` Metadata about the response, including usage information.
**Example** ```json { "usage": { "tokens_consumed": [ { "model_name": "llama3.1-70b", "input_tokens": { "total": 175, "cache_read": 50, "cache_write": 25, "uncached": 100 }, "output_tokens": { "total": 75 }, "context_window": 128000 } ] }, "run_id": "4264-83472", "thread_id": 4264, "user_message_id": 83472, "assistant_message_id": 83473 } ``` ## `ResultSet`
**Example** ```json { "statementHandle": "707787a0-a684-4ead-adb0-3c3b62b043d9", "resultSetMetaData": { "partition": 0, "numRows": 0, "format": "jsonv2", "rowType": [ { "name": "my_column", "type": "VARCHAR", "length": 0, "precision": 0, "scale": 0, "nullable": false } ] }, "data": [ [ "row1 col1", "row1 col2" ], [ "row2 col1", "row2 col2" ] ] } ``` ## `ResultSetMetaData`
**Example** ```json { "partition": 0, "numRows": 0, "format": "jsonv2", "rowType": [ { "name": "my_column", "type": "VARCHAR", "length": 0, "precision": 0, "scale": 0, "nullable": false } ] } ``` ## `RowType`
**Example** ```json { "name": "my_column", "type": "VARCHAR", "length": 0, "precision": 0, "scale": 0, "nullable": false } ``` ## `TableContent`
**Example** ```json { "tool_use_id": "toolu_123", "query_id": "6ac75378-6337-48a6-80ab-6de48dd680eb", "result_set": { "statementHandle": "707787a0-a684-4ead-adb0-3c3b62b043d9", "resultSetMetaData": { "partition": 0, "numRows": 0, "format": "jsonv2", "rowType": [ { "name": "my_column", "type": "VARCHAR", "length": 0, "precision": 0, "scale": 0, "nullable": false } ] }, "data": [ [ "row1 col1", "row1 col2" ], [ "row2 col1", "row2 col2" ] ] }, "title": "Revenue by Month" } ``` ## `ThinkingContent`
**Example** ```json { "text": "To answer your question I must...", "signature": "lorem ipsum" } ``` ## `TokensConsumed` Token consumption for a specific model.
**Example** ```json { "model_name": "llama3.1-70b", "input_tokens": { "total": 175, "cache_read": 50, "cache_write": 25, "uncached": 100 }, "output_tokens": { "total": 75 }, "context_window": 128000 } ``` ## `Tool` Defines a tool that can be used by the agent. Tools provide specific capabilities like data analysis, search, or generic functions.
**Example** ```json { "tool_spec": { "type": "generic", "name": "get_revenue", "description": "Fetch the delivery revenue for a location.", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } } }, "required": [ "location" ] } } ``` ## `ToolChoice`
**Example** ```json { "type": "auto", "name": [ "analyst_tool", "search_tool" ] } ``` ## `ToolInputSchema`
**Example** ```json { "type": "object", "description": "Input for my custom tool", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "items": {}, "required": [ "location" ] } ``` ## `ToolResource`
**cortex_analyst_text_to_sql:** Configuration for text-to-SQL analysis tool. Provides parameters for SQL query generation and execution. Exactly one of semantic_model_file or semantic_view must be provided.
**Example** ```json { "semantic_model_file": "@db.schema.stage/semantic_model.yaml", "semantic_view": "db.schema.semantic_view", "execution_environment": { "type": "warehouse", "warehouse": "MY_WAREHOUSE", "query_timeout": 60 } } ``` **cortex_search:** Configuration for search functionality. Defines how document search and retrieval should be performed.
**Example** ```json { "search_service": "database.schema.service_name", "title_column": "account_name", "id_column": "account_id", "filter": { "@eq": { "": "" } } } ``` **generic:**
**Example** ```json { "type": "function", "execution_environment": { "type": "warehouse", "warehouse": "MY_WAREHOUSE", "query_timeout": 60 }, "identifier": "MY_DB.MY_SCHEMA.MY_UDF" } ``` **web_search:** Configuration for web search functionality.
**Example** ```json { "max_results": 20 } ``` ## `ToolResult`
**Example** ```json { "tool_use_id": "toolu_123", "type": "cortex_analyst_text_to_sql", "name": "my_cortex_analyst_semantic_view", "content": [ { "type": "json", "json": { "answer": 42 } } ], "status": "success" } ``` ## `ToolResultContent`
**json:**
**Example** ```json { "type": "json", "json": { "answer": 42 } } ``` **text:**
**Example** ```json { "type": "text", "text": "The answer is 42" } ``` ## `ToolSpec` Specification of the tool's type, configuration, and input requirements.
**Example** ```json { "type": "generic", "name": "get_weather", "description": "lorem ipsum", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "required": [ "location" ] } } ``` ## `ToolUse`
**Example** ```json { "tool_use_id": "toolu_123", "type": "cortex_analyst_text_to_sql", "name": "my_cortex_analyst_semantic_view", "input": { "location": "San Francisco, CA" }, "client_side_execute": "true", "permission": { "options": [ "Allow Once", "Deny" ] } } ``` ## `ToolUsePermission` Permission metadata for a tool use. A non-empty options list means the client must prompt the user for approval before the tool is executed.
**Example** ```json { "options": [ "Allow Once", "Deny" ] } ``` ## `UsageMetadata` Usage information for this request.
**Example** ```json { "tokens_consumed": [ { "model_name": "llama3.1-70b", "input_tokens": { "total": 175, "cache_read": 50, "cache_write": 25, "uncached": 100 }, "output_tokens": { "total": 75 }, "context_window": 128000 } ] } ``` ## `Warning`
**Example** ```json { "message": "Unable to fetch tools from MCP server 'foo'. Response quality may be degraded.", "code": "003001" } ``` ## Non-streaming response (stream: false) To receive a **single non-streaming JSON response**, set `stream` to `false` in the request body and set the request `Accept` header to `application/json`. The response body is the same object as the `response` event payload in streaming mode (that is, it corresponds to the JSON returned in the SSE `response` event's `data` field). **Example response** ```json { "role": "assistant", "content": [ { "thinking": { "text": "\nThe user is asking about types of products...\n" }, "type": "thinking" }, { "tool_use": { "client_side_execute": false, "input": { "has_time_column": false, "need_future_forecasting_data": false, "original_query": "what are some types of products?", "previous_related_tool_result_id": "", "query": "What are the different types or categories of products?" }, "name": "semantic_view_a", "tool_use_id": "", "type": "cortex_analyst_text_to_sql" }, "type": "tool_use" }, { "tool_result": { "content": [ { "json": { "query_id": "", "result_set": { "data": [ ["Electronics", "3", "3"], ["Furniture", "2", "2"] ], "resultSetMetaData": { "format": "jsonv2", "numRows": 2, "partition": 0 }, "statementHandle": "" }, "sql": "WITH __table_a AS (...) SELECT ...", "text": "The question is clear and I can answer it with the following SQL." }, "type": "json" } ], "name": "semantic_view_a", "status": "success", "tool_use_id": "", "type": "cortex_analyst_text_to_sql" }, "type": "tool_result" }, { "text": "Based on the data available, there are 2 main types of products...", "type": "text" } ] } ``` --- title: Cortex Agents shared resource budgets source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-shared-budgets.md section: Snowflake Cortex (AI & ML) --- # Cortex Agents shared resource budgets - [](/user-guide/snowflake-cortex/cortex-agents) - [](/user-guide/snowflake-cortex/cortex-agents-manage) - [](/user-guide/budgets/budget-shared-resources) A shared resource-level budget lets you set a credit spending limit for a subset of users on a %cortex-agent% object. Unlike a resource-level budget — which applies to the entire object and is typically aggregated use across the entire account — a shared resource-level budget targets a specific group of users (for example, a team) identified by a tag. This allows multiple teams to share the same %cortex-agent% object while maintaining independent spending limits per team. To use a shared resource-level budget, you create a budget, add the %cortex-agent% object as a "shared resource", associate a user tag with the budget, and then apply that tag to users. Snowflake tracks credit consumption per tagged user group and evaluates spending against the budget independently. When a user is subject to multiple budgets, each budget is evaluated independently and the user is stopped by whichever threshold is reached first. ## How shared resource-level budgets work Shared resource-level budgets use a user-tag-based attribution model. Instead of tagging the %cortex-agent% object itself, you tag the users who access it. Snowflake tracks credit consumption for each tagged user group against the associated budget. The enforcement flow is: 1. You create a budget and set a monthly spending limit in credits for the user group. 2. You add the %cortex-agent% object as a shared resource on the budget. 3. You create a tag and associate it with the budget using `SET_USER_TAGS`. 4. You apply the tag to individual users who belong to the group. 5. Snowflake tracks credit consumption for the tagged users on the shared resource. 6. When spending reaches a configured threshold, Snowflake executes the stored procedure you defined for that threshold. Shared resources don't have tags applied at the object level. The tag is applied to users, not to the %cortex-agent% object. This is different from resource-level budgets, where the tag is applied directly to the object. ## Set up a shared resource-level budget Follow these steps to create a budget for a subset of users on a shared %cortex-agent% object. ### Step 1: Create a budget Create a budget object in the schema where you manage budgets: ```sql -- Create a budget object for the user group USE SCHEMA budgets_db.budgets_schema; CREATE SNOWFLAKE.CORE.BUDGET my_budget(); ``` ### Step 2: Set the spending limit Set the monthly credit spending limit for the entire user group: ```sql -- Set a 500-credit monthly spending limit for the user group CALL my_budget!SET_SPENDING_LIMIT(500); ``` ### Step 3: Create a cost center tag Create a tag to identify the cost center associated with the user group: ```sql -- Create a tag with allowed cost center values CREATE TAG cost_mgmt_db.tags.cost_center ALLOWED_VALUES 'finance_dept', 'marketing_dept', 'hr_dept' COMMENT = 'cost_center tag'; ``` ### Step 4: Add the %cortex-agent% object as a shared resource Add the %cortex-agent% object to the budget as a shared resource. This tells Snowflake which resource to track spending against for the tagged users: Add all objects as a shared resource on the budget ```sql CALL budgets_db.budgets_schema.my_budget!ADD_SHARED_RESOURCE( 'CORTEX AGENT' ); ``` Alternatively, you can add the specific object as a shared resource on the budget ```sql CALL budgets_db.budgets_schema.my_budget!ADD_SHARED_RESOURCE( 'CORTEX AGENT', (SELECT SYSTEM$REFERENCE('CORTEX AGENT', 'myagent')) ); ``` ### Step 5: Associate the user tag with the budget Add the user tag to the budget so that Snowflake tracks spending for tagged users against this budget. In the example below, the finance team is limited to 500 credits per month. ```sql -- Associate the cost center tag with the budget as a user tag CALL budgets_db.budgets_schema.my_budget!SET_USER_TAGS( [ [(SELECT SYSTEM$REFERENCE('TAG', 'cost_mgmt_db.tags.cost_center', 'SESSION', 'APPLYBUDGET')), 'finance_dept'] ], 'UNION'); ``` ### Step 6: Tag the users Apply the tag to each user who belongs to the group. Snowflake tracks credit consumption for these users against the shared resource budget: ```sql -- Tag users who belong to the finance department ALTER USER IF EXISTS "user_1" SET TAG cost_center = 'finance_dept'; ALTER USER IF EXISTS "user_2" SET TAG cost_center = 'finance_dept'; ALTER USER IF EXISTS "user_3" SET TAG cost_center = 'finance_dept'; ``` After you complete these steps, Snowflake tracks credit consumption for `user_1`, `user_2`, and `user_3` on the %cortex-agent% object against the `my_budget` budget with a 500-credit monthly limit. You can set up multiple budgets — one for HR, another for marketing — each with its own spending limit and cost center tag. ## Configure threshold actions You can attach stored procedures that execute when spending by the user group reaches specific thresholds. Thresholds are expressed as a percentage of the spending limit and apply to the monthly budget period. ### Set custom actions at thresholds Configure stored procedures to run at different spending thresholds for the user group: ```sql -- Alert at 80% of the user group budget CALL budgets_db.budgets_schema.my_budget!ADD_CUSTOM_ACTION( SYSTEM$REFERENCE( 'PROCEDURE', 'budgets_db.budgets_schema.sp_budget_alert(string, string, number)' ), ARRAY_CONSTRUCT('CA_1', 'finance_dept', 80), 'ACTUAL', 80 ); -- Block access at 100% of the user group budget CALL budgets_db.budgets_schema.my_budget!ADD_CUSTOM_ACTION( SYSTEM$REFERENCE( 'PROCEDURE', 'budgets_db.budgets_schema.sp_revoke_group_access(string)' ), ARRAY_CONSTRUCT('finance_dept'), 'ACTUAL', 100 ); ``` ### Review configured actions List all custom actions configured on a budget: ```sql -- View all custom actions on the budget CALL budgets_db.budgets_schema.my_budget!GET_CUSTOM_ACTIONS(); ``` ### Example: Stored procedure to revoke access for a user group The following stored procedure revokes a role from a group of users to block their access to the %cortex-agent% object. Snowflake recommends creating a dedicated role per user group to support this pattern: ```sql -- Create a stored procedure that revokes access for a user group CREATE OR REPLACE PROCEDURE budgets_db.budgets_schema.sp_revoke_group_access( dept_name STRING ) RETURNS STRING LANGUAGE SQL AS BEGIN EXECUTE IMMEDIATE 'REVOKE ROLE ca_' || dept_name || '_role FROM ROLE ' || dept_name || '_role'; RETURN 'Access revoked for group ' || dept_name; END; ``` When the user group's spending reaches 100% of the budget, Snowflake calls this stored procedure. The procedure revokes the dedicated role, which removes access to the %cortex-agent% object for all users in that group. ## Revoke and reinstate access In some cases you need to reinstate access for a user group after a budget breach — for example, for specific users during peak season. You can configure thresholds beyond 100% (up to 1000%) to handle these exception scenarios. ### Budget breach and access revocation When spending by the user group reaches the 100% threshold, the configured stored procedure executes and revokes access for that group: ```sql -- At 100%: access is revoked automatically for the user group -- Users tagged with finance_dept can no longer access the Cortex Agent object ``` ### Reinstate access for exceptions Configure a threshold beyond 100% with a stored procedure that reinstates access. This allows you to raise the effective budget for exception periods: ```sql -- You have to add grants for this procedure GRANT USAGE ON DATABASE budgets_db TO APPLICATION SNOWFLAKE; GRANT USAGE ON SCHEMA budgets_db.budgets_schema TO APPLICATION SNOWFLAKE; GRANT USAGE ON PROCEDURE budgets_db.budgets_schema.sp_revoke_ca_access(STRING, STRING) TO APPLICATION SNOWFLAKE; -- Issue a reinstatement for a subset of users sp_reinstate_group_access (ca_name, power_user_role) -- Set another threshold at 200% as a hard stop CALL budgets_db.budgets_schema.my_budget!ADD_CUSTOM_ACTION( SYSTEM$REFERENCE( 'PROCEDURE', 'budgets_db.budgets_schema.sp_revoke_group_access(string)' ), ARRAY_CONSTRUCT('finance_dept'), 'ACTUAL', 200 ); ``` The reinstatement stored procedure grants the role back to the user group: ```sql -- Create a stored procedure that reinstates access for a user group CREATE OR REPLACE PROCEDURE budgets_db.budgets_schema.sp_reinstate_group_access( ca_name STRING, power_user_role STRING ) RETURNS STRING LANGUAGE SQL AS BEGIN EXECUTE IMMEDIATE 'GRANT ROLE ca_' || ca_name || '_role TO ROLE ' || power_user_role || '_role'; RETURN 'Access reinstated for group ' || power_user_role; END; ``` In this example: - At 100%, access is revoked automatically for the finance department users. - An admin reinstates access for certain power users (as indicated by `power_user_role`). - At 200%, the revocation procedure runs again as a hard stop. You can configure thresholds at any percentage up to 1000%. ## Monitor usage View credit consumption per user on the shared %cortex-agent% object using the budget's usage reporting method: ```sql -- View usage for the current month CALL budgets_db.budgets_schema.my_budget!GET_SERVICE_TYPE_USAGE_V2( '2026-02', '2026-03' ); ``` The output includes the following columns: | Column | Description | | --------------- | --------------------------------------------------------------------------------- | | SERVICE_TYPE | The service category (CORTEX_AGENTS) | | ENTITY_TYPE | The object type (CORTEX AGENT) | | ENTITY_ID | The internal identifier of the %cortex-agent% object | | NAME | The name of the %cortex-agent% object | | CREDITS_USED | The total credits consumed during the specified period (sum of compute and cloud) | | CREDITS_COMPUTE | The number of compute credits used | ## Budget enforcement latency Budget calculations and threshold enforcement are conducted periodically: 1. Snowflake calculates credit consumption for the tagged user group on the shared resource. 2. The system evaluates spending against all configured thresholds. 3. If a threshold is reached, the associated stored procedure executes. 4. Usage dashboards update with the latest figures. If the low latency budget is enabled, budgets are enforced within two hours after the budget is exceeded. Otherwise, it may take up to eight hours after the budget is exceeded for enforcement. To reduce the refresh interval, you can trigger budget execution more frequently, such as every 60 minutes. There is an inherent delay between when credits are consumed and when the budget system detects the threshold breach. During the enforcement interval, spending can exceed the configured threshold before the action is executed. Plan your thresholds accordingly. For example, set an alert at 80% to give you time to respond before the 100% action is triggered. ## Budget precedence When a user is subject to multiple budgets — for example, both a resource-level budget on the %cortex-agent% object and a shared resource-level budget for the user's team — each budget is evaluated independently. The user's usage is limited by whichever threshold is reached first. For example, consider the following configuration: - A resource-level budget on `CA_1` with a 1,000-credit limit and a block action at 100%. - A shared resource-level budget for the finance department on %cortex-agent% with a 500-credit limit and a block action at 100%. If the finance department reaches 500 credits (100% of their shared budget) before the overall %cortex-agent% object reaches 1,000 credits, the shared budget's block action triggers first and revokes access for the finance department. Other teams that use `CA_1` continue to have access until the resource-level budget threshold is reached. Conversely, if the overall %cortex-agent% object reaches 1,000 credits before the finance department reaches 500 credits, the resource-level budget's block action triggers and revokes access for all users — including the finance department — even though their team budget hasn't been exhausted. ## Limitations The following limitations apply to shared resource-level budgets for %cortex-agent%: - **No tags on shared resources**: Shared resources don't have tags applied at the object level. Tags are applied to users only. This is different from resource-level budgets, where the tag is applied to the %cortex-agent% object. - **Individual user tagging**: You must tag each user individually. There is no bulk operation to tag all users in a role or group. - **Enforcement latency**: Budget enforcement runs on a periodic cycle and may take up to eight hours to enforce the budget after the budget is exceeded. Spending can exceed a threshold during the interval before the action triggers. - **Role-based access revocation**: To revoke access for a user group at a threshold, you must create a dedicated role for the group. Direct block actions on individual users aren't yet supported. - **Monthly period**: Budgets operate on a monthly cycle. You can't configure custom budget periods. - **Tag latency**: When you change a tag on a user, it can take up to eight hours for the change to be reflected in budgets that use tags. --- title: Cortex Agents tutorials source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-tutorials.md section: Snowflake Cortex (AI & ML) --- # Cortex Agents tutorials Use Cortex Agents to get insights from both structured and unstructured data sources. You can use the following tutorials to help you get started with Cortex Agents: - [Getting Started with Cortex Agents](https://quickstarts.snowflake.com/guide/getting_started_with_cortex_agents/index.html?index=../..index#0) - [Getting Started with Snowflake Cortex Agents API and React](https://quickstarts.snowflake.com/guide/getting_started_with_snowflake_agents_api_and_react/index.html?index=../..index#0) - [Getting Started with Cortex Agents and Slack](https://quickstarts.snowflake.com/guide/integrate_snowflake_cortex_agents_with_slack/index.html#0) - [Getting Started with Cortex Agents for Microsoft Teams and Microsoft 365 Copilot](https://quickstarts.snowflake.com/guide/getting_started_with_the_microsoft_teams_and_365_copilot_cortex_app) - [Best Practices to Building Cortex Agents](https://www.snowflake.com/en/developers/guides/best-practices-to-building-cortex-agents/) - [Best Practices for Evaluating Cortex Agents](https://www.snowflake.com/en/developers/guides/best-practices-for-evaluating-cortex-agents/) For more information about Cortex Agents, see [](/user-guide/snowflake-cortex/cortex-agents). --- title: Cortex AI Function Studio source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/ai-function-studio.md section: Snowflake Cortex (AI & ML) --- # Cortex AI Function Studio Available to accounts in [select regions](#label-cortex_llm_availability). This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/aisql) - [](/user-guide/cortex-code/cortex-code) - [Managing Cortex AI Function costs with Account Usage](/user-guide/snowflake-cortex/ai-func-cost-management) Cortex AI Function Studio features a Cortex Code Skill for creating, evaluating, and optimizing production-ready Cortex AI Functions for unstructured data workflows. It provides a structured development lifecycle that automates prompt engineering, model selection, evaluation, and optimization. Cortex AI Function Studio provides two primary interfaces for authoring, evaluating, and optimizing AI Functions: - **Cortex Code CLI:** A command-line experience built for AI and Data Engineers, supporting scriptable workflows, agentic task definition, and rapid iteration within development environments. - **Snowsight AI Studio (Guided):** A native Snowflake UI built for Analysts and Data Scientists that provides a guided, no-code experience for creating, benchmarking, optimizing, and deploying AI Functions. ## Create Getting started is as simple as prompting Cortex Code in Snowsight or the Cortex Code CLI: ```text /cortex-ai-function-studio ``` This command initiates the AI Function Studio workflow. You can also enter with a direct request (for example, "summarize themes from my PDF documents" or "build a function to classify my support tickets") and the skill will route you directly to the relevant workflow without showing the menu. ```text Welcome to the Cortex AI Function Studio — your one-stop shop for AI-powered analytics on unstructured data in Snowflake. I can help you work with Snowflake's AI functions — whether you want to use a built-in function (AI_CLASSIFY, AI_EXTRACT, AI_FILTER, AI_TRANSLATE, etc.) for immediate results, or build a custom AI function tailored to your domain. For custom functions, the intended workflow is create → evaluate → optimize. During creation, you choose how to build: Direct (simple AI_COMPLETE call) or Agent Research (I research and propose approaches with SQL pre/post-processing — you can also specify your own strategy). After building, evaluate against labeled data, then optimize with automated function body optimization and model selection. What would you like to do? 1. Create — Build a new custom AI function 2. Evaluate — Test an existing AI function's performance 3. Optimize — Tune prompts and compare models for better accuracy 4. Demo — Interactive walkthrough with example use cases 5. Check Status — Check on an async evaluation or optimization job 6. Built-in AI Functions — Use a native Snowflake AI function (no setup, immediate SQL) ``` **Define task:** Users specify the AI function's objective, including the task description, expected inputs, and desired output format (for example: summaries, structured JSON, classifications, or generated answers). AI Function Studio supports multimodal workflows based on model availability, including text, document, and image inputs. The AI Function Studio automatically selects a model for the task, though you can override the selection. In this example, because the staged files are PDFs, the system infers that a multimodal, document-capable model is required. ```text Now I have all the context I need. Your PDFs are in @my_docs stage — these are document files, so I need a model that supports PDFs. Per the multimodal reference, the best document models are: gemini-2.5-flash > gemini-3.1-pro > claude-sonnet-4-5. ``` The system and user prompts used in your Custom AI Function are fully transparent. At this stage, the prompt has not yet been evaluated or optimized against your test data. That evaluation occurs after the function is created. As part of the creation workflow, AI Function Studio automatically generates and runs smoke tests to validate the function behavior. For example, smoke tests can automatically validate that the function returns outputs in the expected structure. Once the function is registered, it can be used like any other Cortex AI Function! Custom AI Functions created using Cortex Code in Snowsight or via the Cortex Code CLI are visible in the Snowsight **AI & ML** %raa% **AI Functions** page. Created Custom AI Functions can also be listed via SNOWFLAKE.ACCOUNT_USAGE query with their associated built-in tag ```sql SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCES WHERE TAG_NAME='CUSTOM_AI_FUNCTION_UDF_TAG' AND DOMAIN='FUNCTION' ``` ## Evaluate Evaluation and optimization are optional steps in the AI Function Studio workflow. After the function is created, the Studio guides you through available next steps such as testing, evaluation, and optimization. These workflows can also be revisited later at any time. AI Function Studio benchmarks candidate function configurations against representative datasets to measure accuracy, consistency, and overall performance. Depending on the data available, there are three evaluation paths: - **Labeled Dataset (Ground Truth):** If you already have a dataset with known expected outputs, AI Function Studio uses it as the evaluation baseline to calculate accuracy and quality metrics. - **Label Generation:** If you have input data but no labeled outputs, AI Function Studio can automatically generate evaluation labels using a state-of-the-art reasoning model. By default, the system selects the most capable available model for label generation and can recommend alternatives when needed. - **Synthetic Dataset Generation:** If no evaluation dataset exists, AI Function Studio can generate synthetic evaluation data based on the task definition. The system creates representative examples and expected outputs to bootstrap benchmarking and optimization workflows. Once the evaluation dataset is prepared, AI Function Studio applies configurable evaluation metrics to compare candidate prompts, models, and function configurations. You can select the evaluation strategy and metrics that best align with your use case. For example, AI Function Studio recommends using LLM as a judge for a document summarization task: ```text For context, since your function identifies themes from PDF documents (an open-ended task), llm_judge is the recommended metric — it uses an LLM to judge whether the predicted themes are correct/complete compared to the expected output. The other metrics (exact_match, fuzzy_match, etc.) are better suited for classification or fixed-answer tasks. ``` After the evaluation completes, AI Function Studio generates a detailed results summary, including insights into low-scoring records to support human-in-the-loop review and analysis. ## Optimize AI Function Studio includes a managed AI optimization engine that automatically improves function quality using advanced optimization techniques such as the Genetic-Pareto Algorithm. Rather than relying on manual prompt engineering, the optimizer systematically explores and evaluates alternative prompts, models, and workflow strategies to improve accuracy and overall performance. ### Prompt iterations AI Function Studio supports multiple optimization budgets that control how extensively the system searches for improvements to your AI Function. Higher budgets explore a broader range of prompt, model, and workflow variations to maximize quality: ### Model selection You can choose how many models available in your account AI Function Studio should evaluate and optimize against your baseline function. Selecting more models increases evaluation time and overall compute cost, since each prompt iteration is executed independently for every selected model. In addition, operating on multimodal files (including but not limited to PDF, MP3, or MP4 files) further increases evaluation time. For example, using the **medium** optimization budget (~12 iterations) with **6 selected models** results in each model processing approximately **6-7 evaluation records across all 12 optimization iterations**. This enables the system to benchmark multiple prompt and workflow variations across different model families and cost/performance tiers. After optimization completes, AI Function Studio generates a comparative analysis showing how different models and prompt optimization strategies performed against the evaluation dataset. This allows teams to quantify quality improvements, compare model tradeoffs, and identify the best-performing configuration for their specific workload. In the example below, iterative optimization improved overall summarization quality across multiple models, with Gemini 2.5 Flash achieving the strongest performance. AI Function Studio can also help explain *why* certain models outperform others for a specific workload. By analyzing evaluation results, model characteristics, and document patterns, the system provides actionable insights into quality differences across models. ```text Yes, there are several reasons gemini-2.5-flash outperforms the other models on your German-language PDF magazines: Massive context window (1M tokens) — Your PDFs are 4-13 MB multi-page magazine documents. Gemini can process the entire document at once, while Claude models have a 200K token context that may truncate longer documents, losing themes from later pages. Strong multilingual capabilities — Gemini models have excellent German language understanding. Your fluter magazines are entirely in German, and Gemini handles reading German text in PDF layout (columns, sidebars, captions) very well. Native document processing — Gemini's document handling (up to 1,000 pages for flash, 3,000 for pro) is more mature for PDFs. It natively parses the visual layout, whereas Claude models may struggle with complex magazine layouts with mixed text, images, and sidebars. Consistency — For the theme_overlap metric, what matters is producing the same themes across runs. Gemini-2.5-flash appears more deterministic in its theme extraction from these specific documents. The reason gemini-3.1-pro scored lower (13.8%) despite also being Gemini is unclear — it may be that the Pro model's larger capacity leads to more diverse/creative theme interpretations that diverge more from the expected baseline. ``` Once an optimized configuration is selected, the AI Function can be deployed with a single click. As new models become available, teams can rerun optimization workflows independently without rebuilding the entire function pipeline end-to-end. ## Evaluation metrics guidelines Choose the evaluation metric that best matches your task type: ## Built-in AI functions In addition to custom AI functions, AI Function Studio can help you use Snowflake's built-in Cortex AI functions directly. If your task maps to a built-in function, you can get immediate results with no setup — just SQL. Supported built-in functions: AI_CLASSIFY, AI_FILTER, AI_EXTRACT, AI_COMPLETE, AI_PARSE_DOCUMENT, AI_SUMMARIZE_AGG, AI_AGG, AI_SENTIMENT, AI_TRANSLATE, AI_EMBED, AI_SIMILARITY, AI_REDACT, AI_TRANSCRIBE. AI Function Studio looks up the latest Snowflake documentation for the correct syntax and helps you construct queries against your data. If accuracy on a built-in function isn't sufficient or you need more control over cost/quality (model selection, prompt optimization), you can escalate to a custom AI function. ## Known limitations - **Audio and video modalities are not yet supported.** AI Function Studio currently supports text, document, and image inputs only. Support for audio and video inputs is planned for a future release. ## Cost considerations - **Development phase:** Authoring, evaluation, and optimization are billed by two parts: - The tokens processed by the models used during the experimentation process. - [Cortex Code usages](/user-guide/cortex-code/cortex-code). - **Production phase:** Once registered, a Custom AI Function is billed according to the underlying models it uses. There is no additional surcharge for the function abstraction itself. To monitor and control costs, we recommend: - Using the `SNOWFLAKE.ACCOUNT_USAGE.CORTEX_AI_FUNCTIONS_USAGE_HISTORY` view and associated examples in [Managing Cortex AI Function costs with Account Usage](/user-guide/snowflake-cortex/ai-func-cost-management). - **Cost/quality tradeoffs:** During optimization, AI Function Studio evaluates multiple models across different cost and performance tiers. This allows teams to select configurations that balance accuracy requirements against per-token costs — for example, using a smaller model that achieves acceptable accuracy at significantly lower cost. To get the number of tokens consumed by your custom AI function, issue the following query: ```sql SELECT m.value:key:CUSTOM_AI_FUNCTION_NAME::STRING AS func_name, m.value:key:metric::STRING AS metric_type, SUM(m.value:value::NUMBER) AS token_number FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_AI_FUNCTIONS_USAGE_HISTORY c, LATERAL FLATTEN(input => c.METRICS) m WHERE c.START_TIME >= DATEADD('day', -30, CURRENT_TIMESTAMP()) AND func_name ILIKE '%%' GROUP BY 1, 2 ORDER BY func_name DESC; ``` --- title: Cortex AI Functions: Audio source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/ai-audio.md section: Snowflake Cortex (AI & ML) --- # Cortex AI Functions: Audio This feature is not available in the People's Republic of China. - [AI_TRANSCRIBE](/sql-reference/functions/ai_transcribe) Cortex AI Audio provides advanced LLM-powered audio processing capabilities, including: - **Transcription:** Convert spoken language to text. - **Speaker identification:** Determine who is speaking in each part of a multi-speaker audio file. - **Timestamp extraction:** Identify the timestamp of each spoken word. These capabilities are available through the [](#label-cortex-ai-audio-ai-transcribe) function. Because AI_TRANSCRIBE is managed and hosted inside Snowflake, you can easily integrate audio processing into your data workflows without onerous setup or infrastructure management. The AI_TRANSCRIBE function also processes audio tracks in video files. ## AI_TRANSCRIBE [AI_TRANSCRIBE](/sql-reference/functions/ai_transcribe) is a fully managed SQL function that transcribes audio and video files stored in a stage, extracting text, timestamps, and speaker information. See [](#label-cortex-llm-media-files) for information on creating a stage suitable for storing files for processing by AI_TRANSCRIBE. Under the hood, AI_TRANSCRIBE orchestrates optimized AI models for transcription and speaker diarization, processing audio files of up to two hours in length. AI_TRANSCRIBE is horizontally scalable, allowing efficient batch processing by processing multiple files at the same time. Audio can be processed directly from object storage to avoid unnecessary data movement. By default, AI_TRANSCRIBE converts audio files to clean, readable text. You can also specify a timestamp granularity to extract timestamps for each word or change of speaker. Word-level timestamps are useful for applications such as subtitles or for letting the user to jump to specific parts of the audio by clicking words in the transcript. Speaker-level timestamps are useful for understanding who said what in meetings, interviews, or phone calls. | Timestamp granularity mode | Result | | -------------------------- | --------------------------------------------------------------------- | | Default | Transcription of entire audio file in one piece | | Word | Transcription with timestamps for each word | | Speaker | Indicates who is speaking, and a timestamp, at each change of speaker | ### Supported languages AI_TRANSCRIBE supports the following languages, which are automatically detected. Files can contain multiple supported languages. Language detection requires audio to begin within the first five seconds of the file. For best results, trim excess silence before uploading.
- Arabic - Bulgarian - Cantonese - Catalan - Chinese - Czech - Dutch - English - French - German - Greek - Hebrew - Hindi - Hungarian - Indonesian - Italian - Japanese - Korean - Latvian - Malay - Norwegian - Polish - Portuguese - Romanian - Russian - Serbian - Slovenian - Spanish - Swedish - Thai - Turkish - Ukrainian
### Supported media formats AI_TRANSCRIBE supports the following audio and video file formats: | Audio | FLAC, MP3, MP4, OGG, WAV, WEBM | | ----- | ------------------------------ | | Video | MKV, MP4, OGV, WEBM | Video files must contain at least one audio track in FLAC, MP3, OPUS, VORBIS, or WAV format. ## Examples ### Text transcription The following example transcribes [an audio file](/static/samples/cortex/audio/consultation.wav) stored in the `financial_consultation` stage, returning a text transcript of the entire file. The [TO_FILE function](/sql-reference/functions/to_file) converts the staged file to a file reference. ```sql SELECT AI_TRANSCRIBE(TO_FILE( '@financial_consultation', 'consultation.wav')); ``` Response: ```text {"audio_duration":321.78,"text":"Good afternoon, Robert. Thanks for calling in today. I understand you had some concerns about your portfolio you wanted to discuss. Yes, I'm really worried. I've been watching the news and the market's been all over the place lately. I'm thinking maybe I should just sell everything, all my stocks and mutual funds and put it all in bonds or CDs. At least then I could sleep at night. I can definitely understand that concern, Robert. Market volatility can be unsettling, especially when you're seeing those daily swings in the headlines. Before we talk about any major moves, can you help me understand what specifically is driving this anxiety? Is it the recent tech sector pullback or something more general? It's everything. I'm 52 years old and I keep thinking about what happened in 2008. I lost so much then and I'm worried we're heading for another crash with this new administration. I can't afford to lose my retirement savings. Those are absolutely valid concerns, and I appreciate you sharing that context. That was a really challenging time for everyone. Let me ask you this. When we last reviewed your portfolio in March, we had you allocated at about 70% equities and 30% bonds, correct? And your target retirement age is still 62%. That's right. But honestly, 70% in stocks feels way too risky right now. I'm thinking more like 20% stocks, 80% bonds, maybe even less in stocks. I understand that instinct, Robert. Let's walk through this together. First, I want to remind you of something important. Your current portfolio is already designed with volatility in mind. You're not in individual stocks. You're in diversified index funds and some actively managed funds across different sectors and even international markets. but they're still going down. My quarterly statement showed I was down 8% this quarter alone. You're absolutely right, and that's painful to see, but let's put this in perspective. Over the past 12 months, even with this recent volatility, your portfolio is still up about 3%. The market has given back some gains, but we're not in crisis territory. Remember, we built your allocation specifically because you have 10 years until retirement. That time horizon is actually your biggest asset here. So you're saying I should just do nothing? Not exactly nothing, but I am suggesting we don't make dramatic changes based on short-term market movements. However, I do hear your concern about risk tolerance. What if we made a smaller adjustment? Instead of going to 20% stocks, what if we moved to 60% stocks and 40% bonds? That would reduce your equity exposure by 10%, which might help you sleep better, but wouldn't take you completely out of the growth potential you need for retirement. That actually sounds more reasonable, but I'm still worried about losing more money. I understand completely. Let me ask you this. What's your bigger worry, the volatility of the next year, two or two, or having enough money to retire comfortably at 62? Because if we get too conservative now, inflation alone could erode your purchasing power over the next decade. I didn't really thought about inflation that way. I guess I've been so focused on not losing money that I forgot about the money I might not make. Exactly. And remember, Robert, you're not alone in this. I've had this conversation with many clients over the past few weeks. The ones who stayed disciplined during previous market downturns are generally glad they did. What if we also set up a plan where we review your portfolio monthly for the next few months? That way you'll have regular check-ins and won't feel like you're just riding this out blindly. Monthly reviews would definitely help. And maybe the 60-40 split is a good compromise. I just, I don't want to be stupid about this. Overt, wanting to protect your retirement isn't stupid. It's exactly what you should be thinking about. The key is making sure we're protecting it in the right way. Staying invested in a diversified portfolio, even with some volatility, has historically been the best way to preserve and grow wealth over time. okay, I think I can live with moving to 60% stocks, but if things get really bad... If things get really bad, we'll talk again. That's what I'm here for. And remember, we'll be reviewing this monthly anyway. You're not locked into anything forever. But I do want to emphasize that market timing is incredibly difficult, even for professionals. The goal isn't to avoid all volatility. It's to stay invested long enough to benefit from the market's long-term upward trend. All right, Sarah, let's do the rebalancing to 60-40 and I'll try to stop checking my account balance every day. It sounds like a solid plan, Robert. And yes, definitely limit the daily balance checking. That's a recipe for anxiety. I'll send you some research on historical market recoveries after our call and we'll schedule our first monthly review for next month. How does that sound? That sounds good. Thanks for talking me through this, Sarah. I feel a lot better than when I call. I'm so glad to hear that, Robert. Remember, staying invested requires patience, but your future self will thank you for it. I'll have the rebalancing done by tomorrow morning, and you should see the changes reflected in your account by Thursday. Perfect. Thanks again, Sarah. I thank you deeply for your patience and understanding. I'll talk to you next month."} ``` ### Word-level segmentation with timestamps Set the timestamp granularity to "word" to extract precise timestamps for every word spoken, enabling searchable, navigable transcripts. Note that [this audio file](/static/samples/cortex/audio/consultation_3_sp.wav) is in Spanish. ```sql SELECT AI_TRANSCRIBE(TO_FILE('@financial_consultation', 'consultation_3_sp.wav'), {'timestamp_granularity': 'word'}); ``` Response: The output is truncated for brevity. The full output contains a segment for each word spoken in the audio file. ```text { "audio_duration": 150.66, "segments": [ { "end": 1.513, "start": 0.031, "text": "«Buenos" }, { "end": 2.034, "start": 1.553, "text": "días," }, { "end": 2.334, "start": 2.054, "text": "doña" }, { "end": 4.457, "start": 2.374, "text": "Esperanza." }, { "end": 4.597, "start": 4.477, "text": "¿En" }, { "end": 4.857, "start": 4.697, "text": "qué" }, { "end": 5.118, "start": 4.917, "text": "puedo" }, { "end": 5.518, "start": 5.178, "text": "ayudarla" }, { "end": 6.5, "start": 5.578, "text": "hoy?»" }, ... { "end": 146.671, "start": 146.551, "text": "Ya" }, { "end": 147.234, "start": 146.732, "text": "veremos," }, { "end": 147.837, "start": 147.355, "text": "Roberto." }, { "end": 148.581, "start": 148.078, "text": "Gracias" }, { "end": 148.822, "start": 148.661, "text": "por" }, { "end": 149.646, "start": 148.902, "text": "tu" }, { "end": 150.711, "start": 150.249, "text": "ayuda." } ], "text": "«Buenos días, doña Esperanza. ¿En qué puedo ayudarla hoy?» «Roberto, quiero hacer un cambio grande en mi portafolio. Quiero vender todo y compra solo acciones de Tesla». «¿Tesla? Doña Esperanza, usted tiene 72 años. ¿Por qué quiere poner todo su dinero en una sola compañía?» «¿Por qué Tesla va a ser el futuro?» Un minuto me explico que van a dominar los carros eléctricos. Dice que puedo triplicar mi dinero en dos años. Entiendo que Tesla es una impresión innovador, pero poner todos sus ajuros en una sola acción es muy arriesgado. ¿Qué pasa si Tesla baja? No va a bajar. Elon Musk es un genio. Además, mi vecina compró Teslas. Teslas es tres años. Y Aorus tiene el doble de dinero. Doña Esperanza, su vecina tuvo suerte, pero las yantes individuales pueden ser muy volátiles. Usted necesita dinero estable para sus gastos de retiro. Roberto, tengo $400,000 en mi cuenta. Si te la sube como dismi, voy a tener más de un año. Podré dejarle más dinero a mi familia. Pero también podría perder la mitad de su dinero o más. Te sabía Jairo 60% antes. No puedo recomendarle que haga esto. Entonces no me dejas escuchando. Yo sé lo que quiero hacer con mi dinero. Es mi decisión. Tienes razón, es su dinero. Pero como su asesor tengo que decir que esto es extremamanda peligroso para alguien de su edad. Eva, no importa. Quiero tomar este riesgo. Vas a Edom o no. Doña Esperanza, ¿qué tal si compramos algo de Tesla perronoto? ¿Podríamos poner 10% en Tesla y el resto en versiones más seguras? No, Roberto, quiero el 100% en Tesla. Si no me ayudas, voy a alcanzar otro asesor. Que sí lo haga. Está bien, Doña Presanza. Voy a procesar la orden, pero voy a documentar que fue contra mi recomendación profesional. Perfecto. Hazlo hoy mismo. Quiero compra antes que suba más. Será ahora. Él considera lo que le estoy diciendo. Esto puede ser ver muy mal a la vida. Ya veremos, Roberto. Gracias por tu ayuda." } ``` ### Speaker recognition Set timestamp granularity to "speaker" to detect, separate, and identify unique speakers in conversations or meetings. This example uses [an audio file](/static/samples/cortex/audio/consultation_5_mix_es_en.wav) an audio file with two speakers, one speaking English and the other Spanish. ```sql SELECT AI_TRANSCRIBE(TO_FILE('@financial_consultation', 'consultation_5_mix_es_en.wav'), {'timestamp_granularity': 'speaker'}); ``` Response: The output is truncated for brevity. The full output contains a segment for each conversational "turn" in the audio file. ```text { "audio_duration": 208.66, "segments": [ { "end": 3.076, "speaker_label": "SPEAKER_00", "start": 0.031, "text": "Good afternoon, this is Aaliyah Johnson from Secure Financial Services." }, { "end": 4.297, "speaker_label": "SPEAKER_02", "start": 3.196, "text": "How can I help you today?" }, { "end": 7.182, "speaker_label": "SPEAKER_02", "start": 5.139, "text": "Hola, necesito ayuda con mis inversiones." }, { "end": 11.528, "speaker_label": "SPEAKER_02", "start": 7.482, "text": "Estoy muy preocupada porque he perdido mucho dinero y no sé qué hacer." }, { "end": 14.132, "speaker_label": "SPEAKER_02", "start": 12.289, "text": "I'm sorry, I'm not understanding." }, { "end": 15.795, "speaker_label": "SPEAKER_02", "start": 14.553, "text": "Do you speak English?" }, ... { "end": 189.169, "speaker_label": "SPEAKER_02", "start": 185.841, "text": "Es muy difícil entender estas cosas en inglés." }, { "end": 192.326, "speaker_label": "SPEAKER_01", "start": 190.178, "text": "Por supuesto, señora Ramírez." }, { "end": 197.145, "speaker_label": "SPEAKER_01", "start": 192.788, "text": "Es muy importante que entienda completamente sus opciones." }, { "end": 203.229, "speaker_label": "SPEAKER_01", "start": 197.165, "text": "Voy a hacer los cambios hoy mismo y la llamaré la próxima semana para ver cómo se siente." }, { "end": 205.759, "speaker_label": "SPEAKER_02", "start": 203.891, "text": "Muchísimas gracias, María." }, { "end": 208.71, "speaker_label": "SPEAKER_02", "start": 206.18, "text": "Me siento mucho más tranquila ahora." } ], "text": "Good afternoon, this is Aaliyah Johnson from Secure Financial Services. How can I help you today? Hola, necesito ayuda con mis inversiones. Estoy muy preocupada porque he perdido mucho dinero y no sé qué hacer. I'm sorry, I'm not understanding. Do you speak English? Un poquito, pero es muy difícil para mí. Aquí hay alguien que habla español, ¿ok? Es muy importante. He perdido miles de dólares. I'm really sorry, but I don't speak Spanish. Let me see. I think we might have someone who speaks Spanish, but they're not available right now. ¿Cuándo pueden ayudarme? Necesito hablar con a lguien hoy. Mi esposo está muy enojado y quiere que vendamos todo. I understand you need someone who speaks Spanish. Let me check if Maria is available. She's our Spanish-speaking advisor. Can you hold for just a moment? No entiendo. Mañana. Pero necesito ayuda ahora. ¿No hay nadie más? I am going to transfer you to Maria right now. She'll be able to help you with your investment concerns. Hola, soy María González. Entiendo que necesita ayuda con sus inversiones. ¿Cómo está usted? ¡Ay, qué alivio! Sí, estoy muy preocupada. He perdido casi 20.000 dólares en las últimas semanas y mi esposo quiere que vendamos todo. Comprendo perfectamente su preocupación, señora Ramírez. Perder dinero es muy estresante. Cuénteme un poco más sobre su situación. ¿Qué tipo de inversiones tiene? Tengo fondos mutuos y algunas acciones. Todo está bajando mucho. Mi esposo dice que es mejor tener el dinero en el banco, pero yo no estoy segura. Es natural sentirse nerviosa cuando el mercado está volátil. Pero antes de tomar decisiones importantes, vamos a revisar su situación completa. ¿Cuántos años tiene usted y cuándo planea retirarse? Tengo 55 años y quiero retirarme a los 65, pero con estas pérdidas no sé si voy a poder. Señora Ramírez, usted todavía tiene 10 años hasta el retiro. Eso es tiempo suficiente para que sus inversiones se recuperen. El mercado siempre tiene altibajos, pero históricamente se ha recuperado. ¿Pero qué pasa si no se recupera esta vez? No puedo perder más dinero. Entiendo su miedo. ¿Qué le parece si hacemos algunos ajustes para que se sienta más cómoda? Podemos mover parte de su dinero a inversiones más conservadoras, como bonos. Eso suena mejor. No quiero arriesgar todo, pero tampoco quiero perder la oportunidad de crecer mi dinero. Perfecto. Vamos a encontrar un equilibrio. ¿Qué tal si movemos el 40% de sus acciones a bonos? Así tendrá menos riesgo, pero todavía podrá crecer su dinero para el retiro. Sí, eso me hace sentir mucho mejor. Gracias por explicarme todo en español. Es muy difícil entender estas cosas en inglés. Por supuesto, señora Ramírez. Es muy importante que entienda completamente sus opciones. Voy a hacer los cambios hoy mismo y la llamaré la próxima semana para ver cómo se siente. Muchísimas gracias, María. Me siento mucho más tranquila ahora." } ``` ## Use with other AI Functions ### Call transcript analysis You can pass the output of AI_TRANSCRIBE to other AI Functions for further processing. For example, you can use AI_SUMMARIZE to summarize the transcription, or AI_CLASSIFY to classify the content of the transcription. This example uses AI_SENTIMENT and AI_COMPLETE to analyze the text transcribed from [customer call audio](/static/samples/cortex/audio/consultation_1.wav) and provide sentiment on four dimensions and an assessment of the agent. AI_SENTIMENT analyzes only text and does not consider speech characteristics like tone of voice. ```sql WITH transcriptions AS ( SELECT TO_VARCHAR (AI_TRANSCRIBE(TO_FILE('@financial_consultation', 'consultation_1.wav'))) AS transcribed_call ) SELECT AI_SENTIMENT(transcribed_call, ['Professionalism', 'Resolution', 'Wait Time', 'Market Conditions']) AS call_sentiment, AI_COMPLETE ('claude-4-opus', CONCAT ('Summarize how the agent can improve in 50 words', transcribed_call)) AS agent_assessment FROM transcriptions ``` AI_SENTIMENT response: ```text { "categories": [ { "name": "overall", "sentiment": "negative" }, { "name": "Market Conditions", "sentiment": "negative" }, { "name": "Professionalism", "sentiment": "negative" }, { "name": "Resolution", "sentiment": "negative" }, { "name": "Wait Time", "sentiment": "unknown" } ] } ``` AI_COMPLETE response: ```text "The agent needs significant improvement in empathy, active listening, and client-centered communication. Instead of dismissing concerns and using condescending language, they should validate emotions, explain market conditions professionally, present multiple options, and guide clients through informed decision-making while respecting their risk tolerance and personal circumstances." ``` ### Video transcript analysis The following example transcribes a [video file](https://www.youtube.com/watch?v=QEQZs8SLhQE) stored in the `podcast_videos_S3` stage, ```sql SELECT AI_TRANSCRIBE(TO_FILE( '@podcast_videos_S3', 'podcast-interview.mp4')); ``` Response: ```text { "audio_duration": 5423.744, "text": "Welcome to the New York Times Popcast, your deepest duende of music news and criticism. I'm John Caramonica, and I'm the critic. I'm Joe Cascarelli, and I'm the reporter. I'm Rosalía and I'm here today with you guys. Yes. Thank you so much for being here. Like literally on some days, Jo. Some days. On some days, I think, is this person the only good pop star? ... Thank you for being here. Loved. Every episode of Popcast is at nytimes.com slash popcast. We're on YouTube at Popcast. Subscribe. We're on Instagram and TikTok at Popcast. Tap that like. Tap that follow. Tap in. Don't tap out. Credits and links and bio. We'll be back next week. Yes. Invite me anytime to eat more snacks, please. I lost my hands in Jerez" } ``` Once you have the transcript, you can use AI_COMPLETE to perform additional analysis. This example identifies retail brands mentioned in the conversation for use in advertising or sponsorship analytics. ```sql SELECT AI_COMPLETE('claude-sonnet-4-5', PROMPT('Return a list of any Retail Brands mentioned in this podcast {0}', TO_VARCHAR(transcription_results))) as brands_identified FROM podcast_video_transcription; ``` Response ```text Retail Brands Mentioned in Podcast Based on the transcript analysis, the following brands were identified: Calvin Klein — Mentioned in relation to Rosalía’s commercial appearance Kinder Bueno — Cited as one of Rosalía’s favorite snacks. Nutella — Referenced as a preferred treat. Nestlé — Mentioned as the manufacturer of Milky Bar ice cream bites. Nongshim — Korean snack brand discussed during the tasting segment. Cap'n Crunch — Referenced for its scent similarity to Korean snacks. Doritos — Mentioned by one of the hosts while discussing snack collections. ``` ## Cost considerations Billing for all AI Functions is based on token consumption. For transcription, each second of audio processed is 50 tokens, regardless of language or segmentation method. A full hour of audio is therefore 180,000 tokens. Assuming that processing a million tokens costs 1.3 credits, and that Snowflake credits cost US $3 each, each hour of audio processed costs about US $0.702. This estimate is subject to change. For current pricing information, see the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf). AI_TRANSCRIBE has a minimum billing duration of 1 minute. Files shorter than 1 minute are still processed, but are billed at 1 minute. To efficiently process large numbers of short audio files, consider batching them into a single file and using timestamps to identify the start and end of each original file in the resulting transcription. --- title: Cortex AI Functions: Documents source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/ai-documents.md section: Snowflake Cortex (AI & ML) --- # Cortex AI Functions: Documents Available to accounts in [select regions](#label-cortex-llm-availability). This feature is not available in the People's Republic of China. Snowflake provides advanced AI-powered document intelligence capabilities as Cortex AI Functions. These functions help you to process, parse, classify, and extract information from a wide variety of document types to power analytics, automation, and intelligent applications, all using simple SQL. Document functions help you with the following tasks: - **Parse documents** to convert unstructured text and layouts into structured, searchable, analyzable content. - **Extract structured information** (entities, tables, or fields) from documents. - **Classify document types** to drive downstream workflows and analytics. Cortex document processing functions can be combined to build retrieval augmented generation (RAG) pipelines, intelligent search and chatbot systems, and large-scale document analytics. The following illustration shows how Cortex document processing functions form a composable framework in which components can be mixed and matched to build tailored solutions. ![Composable framework for Cortex document processing functions](/static/images/cortex-document/composable-framework.png) ## Document functions The core Cortex AI Functions for document processing are: - [AI_PARSE_DOCUMENT](/user-guide/snowflake-cortex/parse-document): Converts digital-native or scanned documents into rich text while preserving layout and context. Optionally extracts images from documents. Ideal for semantic search, RAG pipelines, and summarization workflows. Works well with document analysis that requires understanding the entire document content. - [AI_EXTRACT](/user-guide/snowflake-cortex/document-extraction): Provides high-quality structured extraction of information from documents. Understands text, tables, checkboxes, handwriting, and other visual elements. Specializes in extracting structured data based on a schema. - [AI_CLASSIFY](/sql-reference/functions/ai_classify): Classifies a document into one of a list of categories you define. Useful for routing mixed inbound document streams (for example, invoices, contracts, and statements) to different downstream extraction workflows. - [AI_COMPLETE](/sql-reference/functions/ai_complete): The most general-purpose AI Function, AI_COMPLETE generates text completions based on a prompt you provide, and so can be used for a wide variety of tasks involving extracting or transforming text from documents. An advantage of AI_COMPLETE is the ability to choose a model. The following text-processing AI Functions can be used to further analyze or transform text extracted from documents. - [AI_SENTIMENT](/sql-reference/functions/ai_sentiment): Analyzes the sentiment of text content. - [AI_TRANSLATE](/sql-reference/functions/ai_translate): Translates text content between languages. - [SUMMARIZE](/sql-reference/functions/summarize-snowflake-cortex): Generates concise summaries of text content. ## Use cases Cortex AI Functions for document processing are designed to be used together or individually to address a variety of use cases, and are well-suited for these two use cases: ### Building RAG pipelines for chatbots and enterprise search services Documents processed by AI_PARSE_DOCUMENT can be indexed by Cortex Search Services, which can act as retrieval augmented generation (RAG) engines to improve language model responses to user queries. In this scenario, you use the Cortex Search Service to find documents related to the query, then pass these documents to AI_COMPLETE as part of the prompt to generate more contextually relevant responses. ### Building document processing pipelines for streamlining workflows and analytics Cortex document processing AI Functions help you build intelligent, flexible, and scalable document processing pipelines using modular components. Such a pipeline ingests documents in various formats and transforms them into actionable data, allowing you to build workflows like these: - Schema based extraction: Apply a natural language schema to extract entities – ranging from single entities to complex tabular data – from a set of documents - Q&A against document: Ask questions about a document in natural language. - Text and layout extraction: Capture document text (with or without layout) to extract entities, generate summaries, and perform analysis using other AI Functions. - Classification: Use AI_CLASSIFY to determine the document type (for example, "invoice," "contract," "report") when ingesting data to route each type to an appropriate processing workflow. - Build a model registry to share custom extraction and classification models: A model registry stores document extraction models fine-tuned for custom use cases specific to your organization. Reusing these models across teams saves time and effort. --- title: Cortex AI Functions: Image extraction with AI_PARSE_DOCUMENT source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/image-extraction.md section: Snowflake Cortex (AI & ML) --- # Cortex AI Functions: Image extraction with AI_PARSE_DOCUMENT Available to all accounts. This feature is not available in the People's Republic of China. AI_PARSE_DOCUMENT is a Cortex AI function that extracts text, data, layout elements, and images, from PDFs, Word documents, and images. Use this high-fidelity image extraction capability to power advanced, multimodal document processing workflows, such as: - *Enrich data*: Extract images from documents to add visual context for deeper insights. - *Multimodal RAG*: Combine images and text for retrieval-augmented generation (RAG) to improve model responses. - *Image classification*: Use extracted images with AI_EXTRACT or AI_COMPLETE for automatic tagging and analysis. - *Knowledge bases*: Build richer repositories by including both text and images for better search and reasoning. - *Compliance*: Extract and analyze images (e.g., charts, signatures) for regulatory and audit workflows. For an introduction to AI_PARSE_DOCUMENT, see [Parsing documents with AI_PARSE_DOCUMENT](/user-guide/snowflake-cortex/parse-document). ## Using AI_PARSE_DOCUMENT to extract images To extract images from a document using AI_PARSE_DOCUMENT: - Set the `'mode'` option to `'LAYOUT'`. Image extraction requires LAYOUT mode. - Set the `'extract_images'` option to TRUE. AI_PARSE_DOCUMENT image extraction returns an array, `images`, in the JSON output. Each element of `images` contains a field, `image_base64`, with the extracted image data encoded as a base64 string. Image OBJECT_CONSTRUCT also contains fields for a unique ID and image bounding boxes. ```sql SELECT AI_PARSE_DOCUMENT( TO_FILE('@my_stage', 'my_document.pdf'), {'mode': 'LAYOUT', 'extract_images': true}) AS layout_wƒith_images; ``` You can decode the images using BASE64_DECODE_BINARY, then pass them directly to AI_EXTRACT to process or describe the image contents. Alternatively, you can store them in a stage for processing using multimodal AI_COMPLETE. (AI_COMPLETE does not currently support direct image input.) ## Examples ### Extract and describe images After extracting image data, you can use AI_EXTRACT to process or describe the image content. The following example generates a description for the first extracted image after converting it to binary from base64. (AI_EXTRACT requires binary input.) The query uses a regular expression to strip the metadata (schema and format) from the base64 string. ```sql SELECT AI_EXTRACT( file_data => BASE64_DECODE_BINARY( REGEXP_REPLACE( ( SELECT ( AI_PARSE_DOCUMENT( TO_FILE('@image_docs', 'my_document.pdf'), {'mode': 'LAYOUT', 'extract_images': true} ):images[0]['image_base64'] )::STRING ), '^data:image/[^;]+;base64,', '') ), responseFormat => {'Image Name': 'Describe the image'} ); ``` ### Store extracted images in a stage You can store extracted images from documents in a Snowflake stage for reuse, auditing, or additional processing with other Cortex AI functions. This example creates and uses a Python stored procedure to decode base64 image data from AI_PARSE_DOCUMENT and upload the resulting image files to a specified stage. ```sqlexample-python CREATE OR REPLACE PROCEDURE SAVE_EXTRACTED_IMAGES(r VARIANT) RETURNS ARRAY LANGUAGE PYTHON RUNTIME_VERSION = '3.9' PACKAGES = ('pillow', 'snowflake-snowpark-python') HANDLER = 'run' AS $$ from PIL import Image def process_parse_document_result(data: dict) -> tuple[str, str, str]: images = data["images"] for image in images: id = image["id"] data, image_base64 = image["image_base64"].split(";", 1) extension = data.split("/")[1] base64 = image_base64.split(",")[1] yield id, extension, base64 def decode_base64(encoded_image: str) -> bytes: return base64.b64decode(encoded_image) def run(session, r): destination_path = r["DESTINATION_PATH"] parse_document_result = r["PARSE_DOCUMENT_RESULT"] if not destination_path: return ["Error: destination_path parameter is required"] if not destination_path.startswith("@"): return ["Error: destination_path must start with @ (e.g. @output_stage/path"] if destination_path == "@": return ["Error: destination_path must include a stage name after @"] # Clean the result directory session.sql(f"RM destination_path") uploaded_files = [] with tempfile.TemporaryDirectory() as temp_dir: for image_id, extension, encoded_image in process_parse_document_result(parse_document_result): image_bytes = decode_base64(encoded_image) image: Image = Image.open(io.BytesIO(image_bytes)) image_path = os.path.join(temp_dir, image_id) image.save(image_path) # Use session.file.put with source file path and auto_compress=False session.file.put( image_path, destination_path, auto_compress=False, overwrite=True ) uploaded_files.append(f"{destination_path}/{image_id}") # Cleanup os.remove(image_path) return uploaded_files $$; ``` After creating the SAVE_EXTRACTED_IMAGES procedure, you can call it to extract images from a document and store them in a stage, as shown in the following code snippet: ```sql CALL SAVE_EXTRACTED_IMAGES( ( SELECT OBJECT_CONSTRUCT(*) FROM ( SELECT '@image_docs/output' as destination_path, AI_PARSE_DOCUMENT( TO_FILE('@image_docs/my_document.pdf'), {'mode': 'LAYOUT', 'extract_images': true} ) as parse_document_result ) LIMIT 1 )); ``` The output of this query is a list of file paths for the images stored in the specified stage, such as: ```text image_docs/output/img-0.jpeg image_docs/output/img-1.jpeg image_docs/output/img-10.jpeg image_docs/output/img-11.jpeg image_docs/output/img-12.jpeg image_docs/output/img-13.jpeg ``` Now you can process the stored images using other Cortex AI functions, such as AI_COMPLETE for multimodal analysis or generation. ```sql SELECT AI_COMPLETE( 'pixtral-large', 'Describe the image in 10 words.', TO_FILE('@image_docs/output/img-0.jpeg') ); ``` Response: ```text The image shows central bank policy rates for various countries from 2000 to 2025. ``` ## Cost considerations AI_PARSE_DOCUMENT uses billing based on the number of pages processed. A single image file is considered to be a page for billing purposes. Extracting images does not incur additional costs. ## Current limitations - No more than fifty images can be extracted from a single document. Additional images are ignored. - Images smaller than 4x4 pixels are not extracted. - If the size of a response exceeds the account parameter EXTERNAL_FUNCTION_MAx_RESPONSE_SIZE, the function returns an error. Increase the value of this parameter if necessary. --- title: Cortex AI Functions: Images source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/ai-images.md section: Snowflake Cortex (AI & ML) --- # Cortex AI Functions: Images This feature is not available in the People's Republic of China. - [Snowflake Cortex AI Functions (including LLM functions)](/user-guide/snowflake-cortex/aisql) With Cortex AI Images, you can accomplish the following: - Compare images - Caption images - Classify images - Extract entities from images - Generate embedding vectors for use in retrieval systems - Answer questions using data in graphs and charts You can do those tasks with the following functions: - [AI_COMPLETE](/sql-reference/functions/ai_complete) - [AI_EMBED](/sql-reference/functions/ai_embed) - [AI_FILTER](/sql-reference/functions/ai_filter) - [AI_CLASSIFY](/sql-reference/functions/ai_classify) - [AI_SIMILARITY](/sql-reference/functions/ai_similarity) ## Input requirements COMPLETE Multimodal can process images with the following characteristics:
Processing files from stages is currently incompatible with custom network policies. ## Analyze images The COMPLETE function processes a single image or multiple images (for example, extracting differences in entities across various images) stored in a stage. See [](#label-cortex-llm-media-files) for information on creating a suitable stage. The function call specifies the following: - The multimodal model to be used - A prompt - The stage path of the image file(s) via a [FILE](#label-data-types-file) object ### Vision Q&A example The following example uses Anthropic's Claude Sonnet 4.6 model to summarize a pie chart `science-employment-slide.jpeg` stored in the `@myimages` stage. ![Pie chart showing the distribution of occupations where mathematics is considered ](/static/images/cortex-llm/science-employment-slide.jpeg) The distribution of occupations where mathematics is considered "extremely important" in 2023 ```sql SELECT AI_COMPLETE('claude-4-6-sonnet', 'Summarize the insights from this pie chart in 100 words', TO_FILE('@myimages', 'science-employment-slide.jpeg')); ``` Response: ```text This pie chart shows the distribution of occupations where mathematics is considered "extremely important" in 2023. Data scientists dominate with nearly half (48.7%) of all such positions, followed by operations research analysts at 29.6%. The remaining positions are distributed among statisticians (7.8%), actuaries (7.2%), physicists (5.1%), mathematicians (0.6%), and other mathematical science occupations (1.1%). This distribution highlights the growing importance of data science in mathematics-intensive careers, while traditional mathematics roles represent a smaller share of the workforce. ``` ### Compare images example Currently, only Anthropic (`claude`) and Meta (`llama`) models can reference multiple images in a single prompt. Multiple image support for other models may be available in a future release. Use the [PROMPT helper function](/sql-reference/functions/prompt) to process multiple images in a single COMPLETE call. The following example uses Anthropic's Claude Sonnet 4.6 model to compare two different ad creatives from the `@myimages` stage. ![Images of two ads for electric cars](/static/images/cortex-llm/two-ad-creatives.png) Image of two ads for electric cars ```sql SELECT AI_COMPLETE('claude-4-6-sonnet', PROMPT('Compare this image {0} to this image {1} and describe the ideal audience for each in two concise bullets no longer than 10 words', TO_FILE('@myimages', 'adcreative_1.png'), TO_FILE('@myimages', 'adcreative_2.png') )); ``` Response: ```text First image ("Discover a New Energy"): • Conservative luxury SUV buyers seeking a subtle transition to electrification Second image ("Electrify Your Drive"): • Young, tech-savvy urbanites attracted to bold, progressive automotive design ``` ### Classify images example The following example uses AI_CLASSIFY to classify an image for a real estate application. ![Image of a staged living room for real estate](/static/images/cortex-image/classify_example.png) The following SQL uses the AI_CLASSIFY function to classify the image as a picture of a living area, kitchen, bath, garden, or master bedroom. ```sql SELECT AI_CLASSIFY(TO_FILE('@my_images', 'REAL_ESTATE_STAGING.PNG'), ['Living Area', 'Kitchen', 'Bath', 'Garden', 'Master Bedroom']) AS room_classification; ``` Response: ```text { "labels": [ "Living Area" ] } ``` The SQL below categorizes the objects found in the above image as a couch, window, table, television, or artwork. ```sql SELECT AI_CLASSIFY (TO_FILE ('@my_images', 'REAL_ESTATE_STAGING.PNG'), ['Couch', 'Window', 'Table', 'Television', 'Art'], {'output_mode': 'multi'} ) AS living_room_objects; ``` Response: ```text { "labels": [ "Art", "Couch", "Table", "Window" ] } ``` ## Search images You can use AI_EMBED to find images that are similar to a target image. First, use the AI_EMBED function to generate an embedding vector for the target image, mapping its visual features into an abstract vector space, a numerical representation of the image's features. You can then use vector similarity functions to compare this embedding vector to the embedding vectors of other images, producing a similarity score based on their common or similar visual features. This score can be used to classify, rank, or filter images based on their similarity to the target image. | ![Crowd image of people in a city](/static/images/cortex-image/compare_example_1.png) | ![Crowd image of people in a city](/static/images/cortex-image/compare_example_2.png) | | ---------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- | For example, given the images above, the following SQL generates an embedding vector for each image, then compares the vectors using cosine similarity. The result, about 0.5, indicates that the images are somewhat similar. Both photos are taken in an urban setting and contain background crowds, but the main subjects are different. ```sql WITH ai_image_embeddings as ( SELECT AI_EMBED('voyage-multimodal-3', TO_FILE ('@my_images', 'CITY_WALKING1.PNG')) as image1_embeddings, AI_EMBED('voyage-multimodal-3', TO_FILE ('@my_images', 'CITY_WALKING2.PNG')) as image2_embeddings ) SELECT VECTOR_COSINE_SIMILARITY(image1_embeddings,image2_embeddings) as similarity FROM ai_image_embeddings; ``` ```text 0.5359029029 ``` To find images that are similar to a target image, you can use AI_SIMILARITY. The example below computes a similarity score for possibly thousands of images, and returns the advertising creatives that are most similar to the motorcycle advertisement below.
![Image of a motorcycle advertisement for image search](/static/images/cortex-image/image_search_example.png)
```sql SELECT TO_FILE('@ad_images', relative_path) as ALL_ADS FROM DIRECTORY(@ad_images) WHERE AI_SIMILARITY(TO_FILE('@ad_images', 'image_226.jpg'), ALL_ADS) >= 0.5; ``` The query returns images from a multimodal table where the similarity score is greater than 0.50. One of the images identified (`image_226.jpg`) is the one we used as a reference. ```text +-----------------------------------------------------------+ | {} ALL_ADS | +-----------------------------------------------------------+ | { "CONTENT_TYPE": "image/jpeg", | | "ETAG": "686897696a7c876b7e", | | "LAST_MODIFIED": "Wed, 26 Mar 2025 18:11:45 GMT", | | "RELATIVE_PATH": "image_226.jpg", | | "SIZE": 39086, | | "STAGE": "@ad_images" } | +-----------------------------------------------------------+ | { "CONTENT_TYPE": "image/jpeg", | | "ETAG": "e7b678c7a696798686", | | "LAST_MODIFIED": "Wed, 26 Mar 2025 18:11:57 GMT", | | "RELATIVE_PATH": "image_441.jpg", | | "SIZE": 12650, | | "STAGE": "@ad_images" }, | +-----------------------------------------------------------+ ``` ## Model limitations All models available to Snowflake Cortex have limitations on the total number of input and output tokens, known as the model's *context window*. The context window size is measured in tokens. Inputs exceeding the context window limit result in an error. Output which would exceed the context window limit is truncated. For text models, tokens generally represent approximately four characters of text, so the word count corresponding to a limit is less than the token count. For image models, the token count per image depends on the vision model's architecture. Tokens within a prompt (for example, “what animal is this?”) also contribute to the model's context window. | Model | Context window (tokens) | File types | File size | Images per prompt | | --------------------- | ----------------------- | ------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------- | | `openai-gpt-4.1` | 1,047,576 | .jpg, .jpeg, .png, .webp, .gif | 10MB | 5 | | `claude-4-opus` | 200,000 | .jpg, .jpeg, .png, .webp, .gif | 3.75 MB [L1] | 20 | | `claude-4-sonnet` | 200,000 | .jpg, .jpeg, .png, .webp, .gif | 3.75 MB [L1] | 20 | | `claude-3-7-sonnet` | 200,000 | .jpg, .jpeg, .png, .webp, .gif | 3.75 MB [L1] | 20 | | `claude-4-6-sonnet` | 200,000 | .jpg, .jpeg, .png, .webp, .gif | 3.75 MB [L1] | 20 | | `llama4-maverick` | 128,000 | .jpg, .jpeg, .png, .webp, .gif, .bmp | 10 MB | 10 | | `llama-4-scout` | 128,000 | .jpg, .jpeg, .png, .webp, .gif, .bmp | 10 MB | 10 | | `pixtral-large` | 128,000 | .jpg, .jpeg, .png, .webp, .gif, .bmp | 10 MB | 1 | | `voyage-multimodal-3` | 32,768 | .jpg, .png, .pg, .gif, .bmp | 10 MB | 1 | ## Cost considerations Billing scales with the number of tokens processed. The number of tokens per image depends on the architecture of the vision model. - Anthropic (`claude`) models' formula is roughly: tokens = (Width in pixels x Height in pixels) / 750. - Mistral (`pixtral`) models divide each image into batches of 16x16 pixels and converts each batch to a token. The total number of tokens is equivalent to roughly (Width in pixels / 16) \* (Height in pixels / 16). - Meta (`llama`) models try to tile the image with square tiles. Depending on the image's aspect ratio and size, the number of tiles can be up to 16, each represented by around 153 tokens. - Open AI models rescale the image and tile it with square patches. For `openai-gpt-4.1`, depending on the image ratio and size, the number of tokens can be 211 (images up to 512x512px), 352 (non-square images with longer side length 1024px), or from 630 tokens (square images at least 1024x1024px) to 913 tokens (non-square images with shorter side length 1024px). - `voyage-multimodal-3` operates on an array of image patches that are roughly 14x14px in size. The image is rescaled so that it is covered by a grid, which has a minimum of 64 patches and a maximum of 2500 patches. Two extra image tokens are added, so the input ranges from 66 to 2502 tokens, depending on the image size and aspect ratio. The COUNT_TOKENS function does not currently support image inputs. ## Choosing a vision model The COMPLETE function supports multiple models of varying capability, latency, and cost. To achieve optimal performance per credit, choose a model that aligns with the content size and task complexity. The benchmarks are: - MMMU: Evaluates multimodal models on multidisciplinary tasks that require college-level reasoning. - Mathvista: Mathematical reasoning benchmark within a visual context. - ChartQA: Evaluates complex reasoning questions about charts. - DocVQA and VQv2: Benchmarks for visual question-answering on documents. For multimodal embeddings, only the `voyage-multimodal-3` model is currently available. `voyage-multimodal-3` is a state-of-art multimodal embedding model capable of embedding text and images. It can extract key visual features from sources such as screenshots of PDFs, slides, tables, and figures, reducing the need for complex document parsing workflows. According to Voyage AI internal benchmarks, the `voyage-multimodal-3` model outperforms competing models such as OpenAI CLIP Large, Amazon Titan Multimodal, and Cohere Multimodal v3. ## Regional availability Support for this feature is available natively to accounts in the following Snowflake regions: | Model | AWS US West 2 (Oregon) | AWS US East 1 (N. Virginia) | AWS Europe Central 1 (Frankfurt) | | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------- | --------------------------- | -------------------------------- | | `claude-3-7-sonnet` [A1] | | | | | `claude-4-sonnet` [A1] | | | | | `claude-4-opus` [A1] | | | | | `pixtral-large` | %cm% | %cm% | %cm% | | `llama4-maverick` | %cm% | | | | `llama4-scout` | %cm% | | | | `voyage-multimodal-3` [A1] | | | | AI_COMPLETE is available in additional regions through [cross-region inference](/user-guide/snowflake-cortex/cross-region-inference). ## Error Conditions | Message | Explanation | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Request failed for external function SYSTEM$COMPLETE_WITH_IMAGE_INTERNAL with remote service error: 400 '"invalid image path" | Either the file extension or the file itself is not accepted by the model. The message might also mean that the file path is incorrect; that is, the file does not exist at the specified location. Filenames are case-sensitive. | | Error in secure object | May indicate that the stage does not exist. Check the stage name and ensure that the stage exists and is accessible. Be sure to use the at (@) sign at the beginning of the stage path, such as `@myimages`. | | Request failed for external function \_COMPLETE_WITH_PROMPT with remote service error: 400 '"invalid request parameters: unsupported image format: image/\*\* | Unsupported image format given to `claude-4-6-sonnet`, i.e. other than .jpeg, .png, .webp, or .gif. | | Request failed for external function \_COMPLETE_WITH_PROMPT with remote service error: 400 '"invalid request parameters: Image data exceeds the limit of 5.00 MB" | The provided image given to `claude-4-6-sonnet` exceeds 5 MB. | ## Legal The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). --- title: Cortex AI Functions: Multimodal source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/ai-multimodal.md section: Snowflake Cortex (AI & ML) --- # Cortex AI Functions: Multimodal This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/aisql) Audio and video processing using AI_COMPLETE is in public preview. Video semantic search with AI_MULTI_EMBED is generally available to a limited number of customers. All other multimodal capabilities described on this page are generally available. Cortex AI Functions support multimodal analysis across **documents, images, audio, and video**, enabling end-to-end media understanding and processing pipelines directly inside Snowflake. These functions process files stored on internal or external stages, extracting insights from textual, visual, and audio signals. They can be combined to build advanced workflows for summarization, classification, transcription, structured extraction, and analysis. Cortex AI Functions give you instant access to industry-leading multimodal models to understand content across modalities, allowing you to integrate unstructured media with structured data for downstream analytics and applications. Cortex AI Functions support a wide range of use cases, including: - **Content understanding:** Summarize, classify, and describe documents, images, audio, and video. - **Data extraction:** Extract structured information such as entities, objects, sentiment, and metadata. - **Document intelligence:** Analyze charts, tables, and layouts within complex documents. - **Transcription and conversation analysis:** Convert speech to text with timestamps and speaker identification. - **Multimodal analytics:** Combine visual, audio, and textual signals for deeper insights. - **Knowledge base creation:** Enrich datasets with media-derived context for search and discovery. - **Compliance and moderation:** Detect harmful, unsafe, or policy-violating content. Multimodal capabilities are available through existing Cortex AI Functions, including `AI_COMPLETE`, `AI_TRANSCRIBE`, `AI_CLASSIFY`, `AI_EMBED`, and `AI_SIMILARITY`. ## Supported media types and functions Cortex AI Functions support extraction of specific information in structured format from documents, images, audio, and video files. You can define the exact schema you want the model to return, such as detected objects, colors, labels, or other domain-specific attributes.
Multimodal functions can process single or multiple files stored in internal or external stages. For information about creating a suitable stage, see [](#label-cortex-llm-media-files). In addition, you can dive deeper into [Cortex AI for Document Intelligence](/user-guide/snowflake-cortex/ai-documents) in our dedicated documentation. ## Examples ### Video metadata extraction The following example shows how to extract structured metadata from a library of social media videos using AI_COMPLETE. The query processes video files stored in a stage and returns a JSON object for each video, including sentiment, summary, detected brands and products, content safety classification, visual attributes, and music metadata. In this example, a table is first created from staged video files using the [FILE](#label-data-types-file) data type. The query then calls AI_COMPLETE with a multimodal model to analyze each video and return structured results. A filter is applied to show output for a single video file. ```sql -- Create video table using TO_FILE data type CREATE OR REPLACE TABLE video_ads_table AS SELECT TO_FILE('@video_ads', RELATIVE_PATH) AS video_file, RELATIVE_PATH FROM DIRECTORY(@video_ads); -- Extract metadata from videos in a single query SELECT AI_COMPLETE( 'gemini-3.1-pro', 'Analyze the attached video and extract the required data points. Respond in JSON.', video_file, {}, { 'type': 'json', 'schema': { 'type': 'object', 'properties': { 'sentiment': {'type': 'string'}, 'summary': {'type': 'string'}, 'brands': {'type': 'array', 'items': {'type': 'string'}}, 'products': {'type': 'array', 'items': {'type': 'string'}}, 'harmful_content_detected': {'type': 'boolean'}, 'lighting': {'type': 'string'}, 'visible_items': {'type': 'array', 'items': {'type': 'string'}}, 'music_metadata': { 'type': 'object', 'properties': { 'genre': {'type': 'string'}, 'tempo': {'type': 'string'}, 'mood': {'type': 'string'} }, 'required': ['genre', 'tempo', 'mood'] } }, 'required': ['sentiment', 'summary', 'brands', 'products', 'harmful_content_detected', 'lighting', 'visible_items', 'music_metadata'] } } ) FROM video_ads_table -- Extract metadata from a single file as an example WHERE FL_GET_RELATIVE_PATH(video_file) = 'dog_food_creative_052.mp4'; ``` Response: ```text { "brands": [ "Rude Dog Food" ], "harmful_content_detected": false, "lighting": "natural", "music_metadata": { "genre": "Acoustic", "mood": "Upbeat", "tempo": "Medium" }, "products": [ "Hypoallergenic dog food", "Meat grinder" ], "sentiment": "positive", "summary": "A man demonstrates how to make hypoallergenic dog food at home using a meat grinder, highlighting the ingredients and process, while promoting his brand, Rude Dog Food, for those who don't have the time to make it themselves.", "visible_items": [ "Meat grinder", "Bowl", "Meat", "Vegetables", "Cutting board", "Knife", "Spatula", "String lights" ] } ``` ### Video transcript analysis The following example transcribes a [video file](https://www.youtube.com/watch?v=QEQZs8SLhQE) stored in the `podcast_videos_S3` stage. ```sql SELECT AI_TRANSCRIBE(TO_FILE('@podcast_videos_S3', 'podcast-interview.mp4')); ``` Response: ```text { "audio_duration": 5423.744, "text": "Welcome to the New York Times Popcast, your deepest duende of music news and criticism. I'm John Caramonica, and I'm the critic. I'm Joe Cascarelli, and I'm the reporter. I'm Rosalía and I'm here today with you guys. Yes. Thank you so much for being here. Like literally on some days, Jo. Some days. On some days, I think, is this person the only good pop star? ... Thank you for being here. Loved. Every episode of Popcast is at nytimes.com slash popcast. We're on YouTube at Popcast. Subscribe. We're on Instagram and TikTok at Popcast. Tap that like. Tap that follow. Tap in. Don't tap out. Credits and links and bio. We'll be back next week. Yes. Invite me anytime to eat more snacks, please. I lost my hands in Jerez" } ``` Once you have the transcript, you can use AI_COMPLETE to perform additional analysis. This example identifies retail brands mentioned in the conversation for use in advertising or sponsorship analytics. ```sql SELECT AI_COMPLETE('claude-sonnet-4-5', PROMPT('Return a list of any Retail Brands mentioned in this podcast {0}', TO_VARCHAR(transcription_results))) AS brands_identified FROM podcast_video_transcription; ``` Response: ```text Retail Brands Mentioned in Podcast Based on the transcript analysis, the following brands were identified: Calvin Klein — Mentioned in relation to Rosalía's commercial appearance. Kinder Bueno — Cited as one of Rosalía's favorite snacks. Nutella — Referenced as a preferred treat. Nestlé — Mentioned as the manufacturer of Milky Bar ice cream bites. Nongshim — Korean snack brand discussed during the tasting segment. Cap'n Crunch — Referenced for its scent similarity to Korean snacks. Doritos — Mentioned by one of the hosts while discussing snack collections. ``` ### Video search Cortex AI Functions let you perform semantic video searches. To do this, you generate multimodal embeddings for the video content using `AI_MULTI_EMBED` and store them in a table, where they can be searched using a SQL query. The Twelve Labs Marengo 3 embedding model converts each video into one or more 512-dimensional vectors, capturing visual, audio, and text-based semantics across scenes or segments. These vectors can then be searched to find similar scenes, detect objects or actions, or retrieve relevant moments in large video libraries. This example uses a small library of short-form videos (such as TikTok clips) stored in a Snowflake stage. The goal is to perform semantic search across this collection — for example, finding videos of a man riding a skateboard. This example is extensive and contains multiple steps that must be performed in sequence. #### Create a media table to hold video metadata and embeddings First, register the video library in a Snowflake table. Each row represents a video file stored in the stage. Use the stage's directory table to create `FILE` objects for each video, then store these in the table. ```sql CREATE OR REPLACE TABLE video_table AS (SELECT TO_FILE('@MY_VIDEOS', RELATIVE_PATH) AS video_file FROM DIRECTORY(@MY_VIDEOS)); ``` Inspect the table to verify that the table contains the video files from the stage: ```sql SELECT * FROM video_table; ``` Response: ```text +-----------------------------------------------------+ | VIDEO_FILE | +-----------------------------------------------------+ | { | | "CONTENT_TYPE": "video/mp4", | | "ETAG": "64fb6fecc9a8f99b48f46f81e77d66be", | | "LAST_MODIFIED": "Mon, 08 Dec 2025 19:24:23 GMT", | | "RELATIVE_PATH": "content_video_2.mp4", | | "SIZE": 2974020, | | "STAGE": "@MY_VIDEOS" | | } | +-----------------------------------------------------+ | ... | +-----------------------------------------------------+ | { | | "CONTENT_TYPE": "video/mp4", | | "ETAG": "357277a66571924e2aefefd7ba582a1c", | | "LAST_MODIFIED": "Mon, 08 Dec 2025 19:24:19 GMT", | | "RELATIVE_PATH": "content_video_4.mp4", | | "SIZE": 1270501, | | "STAGE": "@.MY_VIDEOS" | | } | +-----------------------------------------------------+ ``` #### Generate video embeddings Next, use the `AI_MULTI_EMBED` function to generate embeddings for each video in the table. The following example generates embeddings using the Twelve Labs Marengo 3 model. ```sql SELECT video_file, AI_MULTI_EMBED('twelvelabs-marengo-embed-3-0', video_file) AS embeddings FROM video_table; ``` The model returns a multimodal fingerprint of each video, with one or more embeddings for each modality (visual, audio, transcription) and timestamps indicating the segment range (`start_sec`, `end_sec`). For example: ```text { "error": null, "value": [ { "embedding": [ -0.022094727, 0.0053710938, 0.024291992, 0.06347656, 0.0013122559, -0.016845703 ], "embedding_option": "audio", "end_sec": 33.466667, "start_sec": 0 }, { "embedding": [ 0.03515625, 0.10205078, -0.0043945312, ... 0.041748047, 0.016357422, -0.007385254 ], "embedding_option": "visual", "end_sec": 33.466667, "start_sec": 0 }, { "embedding": [ -0.048095703, -0.10449219, -0.033935547, ... -0.004333496, 0.088378906, 0.029541016 ], "embedding_option": "transcription", "end_sec": 1, "start_sec": 0 } ] } ``` #### Create table of embeddings To facilitate searching, create a new table that stores the video file along with its generated embeddings. Flatten the embeddings arrays from the `AI_MULTI_EMBED` output so that each row of the new table contains a single embedding vector for a segment of a video, along with its modality and timestamps. This structure is essentially a catalog of scenes. ```sql CREATE OR REPLACE TABLE tik_tok_embeddings AS WITH embedding_index AS ( SELECT video_file, AI_MULTI_EMBED('twelvelabs-marengo-embed-3-0', video_file) AS embeddings FROM video_table ) SELECT video_file, f.value['embedding']::VECTOR(FLOAT, 512) AS embedding_vec, f.value['embedding_option']::STRING AS embedding_option, f.value['start_sec']::FLOAT AS start_sec, f.value['end_sec']::FLOAT AS end_sec FROM embedding_index, LATERAL FLATTEN(input => embedding_index.embeddings['value']) f; ``` The result contains one row per embedding segment per video, with the following columns: - `video_file`: The FILE object representing the video. - `embedding_vec`: The embedding vector for a segment in the video file. - `embedding_option`: The modality of the embedding (visual, audio, or transcription). - `start_sec`: The starting timestamp of the segment in seconds. - `end_sec`: The ending timestamp of the segment in seconds. #### Create text embedding from query and search To search for relevant video segments, generate an embedding vector for the text query using the same embedding model used to generate the video clip embeddings. Then, compare the query embedding to the video embeddings stored in the table using a vector similarity function. Order the result by similarity to find the most relevant video segments. The result includes the video file, the timestamps, and the similarity score for each similar segment. ```sql WITH query AS ( SELECT (AI_MULTI_EMBED( 'twelvelabs-marengo-embed-3-0', 'Find segments where there is a man riding a skateboard' )):value[0]['embedding']::VECTOR(FLOAT, 512) AS query_embedding ) SELECT video_file, v.start_sec, v.end_sec, VECTOR_COSINE_SIMILARITY(v.embedding_vec, q.query_embedding) AS similarity FROM tik_tok_embeddings v CROSS JOIN query q ORDER BY similarity DESC LIMIT 10; ``` The top results show high similarity scores for two specific segments in a single video: ```text +-----------------------------------------------------+-----------+---------+------------------+ | video_file | start_sec | end_sec | similarity | +-----------------------------------------------------+-----------+---------+------------------+ | content_video_1.mp4 | 5.5 | 11.5 | 0.6915 | +-----------------------------------------------------+-----------+---------+------------------+ | content_video_1.mp4 | 11.5 | 17.0 | 0.5646 | +-----------------------------------------------------+-----------+---------+------------------+ ``` By checking the first video segment (from 5.5 to 11.5 seconds), you can confirm that it indeed contains a skateboarder. ![Screenshot from a video search result showing a man riding a skateboard](/static/images/skateboarding.png) ### Audio-based sentiment analytics This example shows how to analyze a call center audio recording using AI_COMPLETE to extract structured sentiment insights based on both spoken content and vocal delivery. The model evaluates agent and customer behavior, including tone, professionalism, anger, and escalation signals, and returns a JSON object summarizing overall sentiment, participant dynamics, escalation events, and interaction outcome. ```sql -- Create audio table using TO_FILE data type CREATE OR REPLACE TABLE call_center_logs AS ( SELECT TO_FILE('@AUDIO_STAGE', RELATIVE_PATH) AS audio_files, RELATIVE_PATH FROM DIRECTORY(@AUDIO_STAGE) ); -- Analyze audio in a single query SELECT FL_GET_RELATIVE_PATH(audio_files) AS file_name, AI_COMPLETE( 'gemini-3.1-pro', 'Analyze the attached audio call center recording. You are an acoustic and semantic analyzer. Evaluate both the literal spoken words and the vocal delivery (pitch, pace, tone, volume, and pauses). Focus on two participants: - AGENT: detect sarcasm, passive-aggressiveness, rudeness, or professionalism - CUSTOMER: detect anger, frustration, distress, or calmness Return ONLY raw JSON. No markdown, no backticks, no preamble.', audio_files, {}, { 'type': 'json', 'schema': { 'type': 'object', 'properties': { 'overall_sentiment': {'type': 'string'}, 'agent': { 'type': 'object', 'properties': { 'tone': {'type': 'string'}, 'sarcasm_level': {'type': 'string', 'enum': ['none', 'low', 'medium', 'high']}, 'rudeness_level': {'type': 'string', 'enum': ['none', 'low', 'medium', 'high']}, 'professionalism': {'type': 'string', 'enum': ['poor', 'fair', 'good', 'excellent']}, 'key_signals': {'type': 'array', 'items': {'type': 'string'}} }, 'required': ['tone', 'sarcasm_level', 'rudeness_level', 'professionalism', 'key_signals'] }, 'customer': { 'type': 'object', 'properties': { 'sentiment': {'type': 'string'}, 'anger_level': {'type': 'string', 'enum': ['calm', 'mild', 'moderate', 'high', 'furious']}, 'tone': {'type': 'string'}, 'key_signals': {'type': 'array', 'items': {'type': 'string'}} }, 'required': ['sentiment', 'anger_level', 'tone', 'key_signals'] }, 'escalation_detected': {'type': 'boolean'}, 'escalation_summary': {'type': 'string'}, 'resolution_sentiment': {'type': 'string'}, 'agent_effectiveness': {'type': 'string'} }, 'required': ['overall_sentiment', 'agent', 'customer', 'escalation_detected', 'escalation_summary', 'resolution_sentiment', 'agent_effectiveness'] } } ) AS analysis FROM call_center_logs WHERE FL_GET_RELATIVE_PATH(audio_files) = 'consultation_1.wav'; ``` Response: ```text { "agent": { "key_signals": [ "Dismissed customer concerns", "Insulted customer risk tolerance", "Told customer she was overreacting" ], "professionalism": "poor", "rudeness_level": "high", "sarcasm_level": "low", "tone": "condescending" }, "agent_effectiveness": "poor", "customer": { "anger_level": "high", "key_signals": [ "Stressed about $40k loss", "Demanded to sell all assets", "Expressed regret trusting agent" ], "sentiment": "negative", "tone": "frustrated" }, "escalation_detected": true, "escalation_summary": "Customer escalated to liquidating all assets due to the agent's dismissive and rude behavior regarding her financial losses.", "overall_sentiment": "negative", "resolution_sentiment": "negative" } ``` ### Vision Q&A example The following example uses Anthropic's Claude Sonnet 4.6 model to summarize a pie chart `science-employment-slide.jpeg` stored in the `@myimages` stage. The distribution of occupations where mathematics is considered "extremely important" in 2023 ```sql SELECT AI_COMPLETE('claude-sonnet-4-6', 'Summarize the insights from this pie chart in 100 words', TO_FILE('@myimages', 'science-employment-slide.jpeg')); ``` Response: ```text This pie chart shows the distribution of occupations where mathematics is considered "extremely important" in 2023. Data scientists dominate with nearly half (48.7%) of all such positions, followed by operations research analysts at 29.6%. The remaining positions are distributed among statisticians (7.8%), actuaries (7.2%), physicists (5.1%), mathematicians (0.6%), and other mathematical science occupations (1.1%). This distribution highlights the growing importance of data science in mathematics-intensive careers, while traditional mathematics roles represent a smaller share of the workforce. ``` ### Compare images example Use the [PROMPT helper function](/sql-reference/functions/prompt) to process multiple images in a single AI_COMPLETE call. The following example uses Anthropic's Claude Sonnet 4.6 model to compare two different ad creatives from the `@myimages` stage. ![Images of two ads for electric cars](/static/images/cortex-llm/two-ad-creatives.png) Image of two ads for electric cars ```sql SELECT AI_COMPLETE('claude-sonnet-4-6', PROMPT('Compare this image {0} to this image {1} and describe the ideal audience for each in two concise bullets no longer than 10 words', TO_FILE('@myimages', 'adcreative_1.png'), TO_FILE('@myimages', 'adcreative_2.png') )); ``` Response: ```text First image ("Discover a New Energy"): • Conservative luxury SUV buyers seeking a subtle transition to electrification Second image ("Electrify Your Drive"): • Young, tech-savvy urbanites attracted to bold, progressive automotive design ``` ## Create stage for media files Cortex AI Functions that process media files (documents, images, audio, or video) require the files to be stored on an internal or external stage. The stage must use server-side encryption. If you want to be able to query the stage or programmatically process all the files stored there, the stage must have a directory table. The SQL below creates a suitable internal stage: ```sql CREATE OR REPLACE STAGE input_stage DIRECTORY = ( ENABLE = true ) ENCRYPTION = ( TYPE = 'SNOWFLAKE_SSE' ); ``` To process files from external object storage (for example, Amazon S3), create a storage integration, then create an external stage that uses the storage integration. To learn how to configure a Snowflake storage integration, see our detailed guides: - [Amazon S3 storage integration](/user-guide/data-load-s3-config-storage-integration) - [Azure container integration](/user-guide/data-load-azure-config) - [Google Cloud Storage integration](/user-guide/data-load-gcs-config) Create an external stage that references the integration and points to your cloud storage container. This example points to an Amazon S3 bucket: ```sql CREATE OR REPLACE STAGE my_aisql_media_files STORAGE_INTEGRATION = my_s3_integration URL = 's3://my_bucket/prefix/' DIRECTORY = ( ENABLE = TRUE ) ENCRYPTION = ( TYPE = 'AWS_SSE_S3' ); ``` With an internal or external stage created, and files stored there, you can use Cortex AI Functions to process media files stored in the stage. For document parsing, see [](/user-guide/snowflake-cortex/parse-document). AI Functions are currently incompatible with custom [network policies](/user-guide/network-policies). ### Cortex AI Functions storage best practices You may find the following best practices helpful when working with media files in stages with Cortex AI Functions: - Establish a scheme for organizing media files in stages. For example, create a separate stage for each team or project, and store the different types of media files in subdirectories. - Enable directory listings on stages to allow querying and programmatic access to its files. To automatically refresh the directory table for the external stage when new or updated files are available, set AUTO_REFRESH = TRUE when creating the stage. - For external stages, use fine-grained policies on the cloud provider side (for example, AWS IAM policies) to restrict the storage integration's access to only what is necessary. - Always use encryption, such as AWS_SSE or SNOWFLAKE_SSE, to protect your data at rest. ## Model limitations All models available to Snowflake Cortex have limitations on the total number of input and output tokens, known as the model's *context window*. The context window size is measured in tokens. Inputs exceeding the context window limit result in an error. Output which would exceed the context window limit is truncated. For text models, tokens generally represent approximately four characters of text, so the word count corresponding to a limit is less than the token count. For multimodal models, the token count per image and video depends on the model's architecture. Tokens within a prompt (for example, "what animal is this?") also contribute to the model's context window. | Model | Context window (tokens) | File types | File size | Files per prompt | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `openai-gpt-4.1` | 1,047,576 | .jpg, .jpeg, .png, .webp, .gif | 10MB | 5 | | `claude-4-opus` | 200,000 | .jpg, .jpeg, .png, .webp, .gif | 3.75 MB [L1] | 20 | | `claude-4-sonnet` | 200,000 | .jpg, .jpeg, .png, .webp, .gif | 3.75 MB [L1] | 20 | | `claude-3-7-sonnet` | 200,000 | .jpg, .jpeg, .png, .webp, .gif | 3.75 MB [L1] | 20 | | `claude-sonnet-4-6` | 1,000,000 | .jpg, .jpeg, .png, .webp, .gif | 3.75 MB [L1] | 20 | | `llama4-maverick` | 128,000 | .jpg, .jpeg, .png, .webp, .gif, .bmp | 10 MB | 10 | | `llama-4-scout` | 128,000 | .jpg, .jpeg, .png, .webp, .gif, .bmp | 10 MB | 10 | | `pixtral-large` | 128,000 | .jpg, .jpeg, .png, .webp, .gif, .bmp | 10 MB | 8 | | `voyage-multimodal-3` | 32,768 | .jpg, .png, .pg, .gif, .bmp | 10 MB | 5 | | `gemini-3.1-pro` | 1,000,000 | **Audio:** .wav, .mp3, .aiff, .aac, .ogg, .flac, .m4a, .mp4, .pcm, .webm **Video:** .mp4, .mpeg, .mov, .avi, .flv, .mpg, .webm, .wmv, .3gpp | 100 MB combined [P1] | 10 audio + 10 video [P1] | | `gemini-3.5-flash` | 1,000,000 | **Audio:** .wav, .mp3, .aiff, .aac, .ogg, .flac, .m4a, .mp4, .pcm, .webm **Video:** .mp4, .mpeg, .mov, .avi, .flv, .mpg, .webm, .wmv, .3gpp | 100 MB combined [P1] | 10 audio + 10 video [P1] | | `twelvelabs-marengo-embed-3-0` [M1] | 4 hours (duration) [M2] | **Video:** .mp4, .mov, .avi, .mkv, .wmv, .webm | 400 MB | N/A (embedding model) | For per-model regional availability, see [Regional availability](#label-cortex-llm-availability) on the Cortex AI Functions page. ## Error conditions | Message | Explanation | | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Request failed for external function SYSTEM$COMPLETE_WITH_IMAGE_INTERNAL with remote service error: 400 '"invalid image path" | Either the file extension or the file itself is not accepted by the model. The message might also mean that the file path is incorrect; that is, the file does not exist at the specified location. Filenames are case-sensitive. | | Error in secure object | May indicate that the stage does not exist. Check the stage name and ensure that the stage exists and is accessible. Be sure to use the at (@) sign at the beginning of the stage path, such as `@myimages`. | | Request failed for external function _COMPLETE_WITH_PROMPT with remote service error: 400 '"invalid request parameters: unsupported image format: image/** | Unsupported image format given to `claude-sonnet-4-6`, i.e. other than .jpeg, .png, .webp, or .gif. | | Request failed for external function _COMPLETE_WITH_PROMPT with remote service error: 400 '"invalid request parameters: Image data exceeds the limit of 5.00 MB" | The provided image given to `claude-sonnet-4-6` exceeds 5 MB. | ## Legal The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). --- title: Cortex AI Guardrails source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-ai-guardrails.md section: Snowflake Cortex (AI & ML) --- # Cortex AI Guardrails This feature requires Enterprise Edition (or higher). To inquire about upgrading, please contact [Snowflake Support](https://docs.snowflake.com/user-guide/contacting-support). ## Overview Cortex AI Guardrails, part of the [Snowflake Horizon Catalog](/user-guide/snowflake-horizon), provide run-time protection against prompt injection and jailbreak attacks on [Cortex Code](/user-guide/cortex-code/cortex-code), [%sf-intelligence%](/user-guide/snowflake-cortex/snowflake-cowork), and [Cortex Agents](/user-guide/snowflake-cortex/cortex-agents). As enterprises move AI applications from pilot to production, they face increased risk from adversarial prompts that can threaten data integrity and security. Cortex AI Guardrails extend Snowflake's default protections against known prompt injection techniques by adding guardrails to detect and mitigate adversarial threats. Integrated centrally into Snowflake Horizon Catalog, Cortex AI Guardrails leverage contextual reasoning to detect and neutralize malicious intent, preventing adversarial threats from circumventing established security boundaries and hardened permissions. ### Key capabilities Cortex AI Guardrails provide the following protections: - **Prompt injection detection**: Identifies and blocks attempts to override system instructions through malicious prompts, including indirect prompt injections embedded in tool calls. - **Jailbreak prevention**: Detects attempts to bypass the model's safety protocols and security boundaries. - **Zero-day style protection**: Uses advanced techniques to identify sophisticated, previously unknown attack patterns in real time. ## Configure Cortex AI Guardrails You can configure Cortex AI Guardrails at the account level using the `AI_SETTINGS` parameter. This provides centralized control over guardrail behavior for Cortex Code, %sf-intelligence%, and Cortex Agents in your account. Users with the ACCOUNTADMIN role can configure Cortex AI Guardrails. Cortex AI Guardrails are available to Commercial (non-Gov, VPS, Sovereign) accounts that have [Cross-region inference](/user-guide/snowflake-cortex/cross-region-inference) enabled. The account parameter `CORTEX_ENABLED_CROSS_REGION` must be set to `ANY_REGION`, `AWS_US`, or `AWS_GLOBAL`. For details on this parameter, see [CORTEX_ENABLED_CROSS_REGION](#label-cortex-enable-cross-region). ### Enable guardrails To enable Cortex AI Guardrails for your account, use the ALTER ACCOUNT command with the `AI_SETTINGS` parameter: ```sql ALTER ACCOUNT SET AI_SETTINGS = $$ guardrails: advanced_prompt_injection: - enabled: true $$; ``` ### View guardrail settings To view the current guardrail configuration for your account: ```sql SHOW PARAMETERS LIKE 'AI_SETTINGS' IN ACCOUNT; ``` ### Disable guardrails To disable Cortex AI Guardrails: ```sql ALTER ACCOUNT UNSET AI_SETTINGS; ``` ## Monitor guardrail activity When Cortex AI Guardrails detect a potential threat, the event is logged for audit and monitoring purposes. - **Cortex Code**: Review detected threats in the conversation logs. For where those logs are stored and how to manage them, see [Conversation history](/user-guide/cortex-code/security#label-cortex-code-security-conversation-history). - **%sf-intelligence%** and **Cortex Agents**: Review conversation and trace data in Cortex Agent monitoring (for example in %sf-web-interface%, **AI & ML** %raa% **Agents**, then the **Monitoring** pane for the agent). For details, see [Monitor Cortex Agent requests](/user-guide/snowflake-cortex/cortex-agents-monitor#label-cortex-agents-access-conversation-logs). With that information (conversation logs on Cortex Code, or monitoring and trace data for agents), you can: - Monitor for attempted attacks against your AI workloads - Identify patterns in blocked or flagged requests - Audit guardrail effectiveness ## Considerations - While Cortex AI Guardrails are optimized for high accuracy, some legitimate prompts may occasionally be flagged. Review your guardrail logs periodically to identify any patterns. - Cortex AI Guardrails for prompt injection are currently available with [Cortex Code](/user-guide/cortex-code/cortex-code), [%sf-intelligence%](/user-guide/snowflake-cortex/snowflake-cowork), and [Cortex Agents](/user-guide/snowflake-cortex/cortex-agents). ## Cost You are charged credits for the use of Cortex AI Guardrails as listed in the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf). Usage is measured based on the number of tokens scanned. ## Related topics - [](/user-guide/snowflake-horizon) - [](/user-guide/cortex-code/cortex-code) - [](/user-guide/snowflake-cortex/snowflake-cowork) - [](/user-guide/snowflake-cortex/cortex-agents) - [](/guides-overview-ai-features) --- title: Cortex Analyst source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst.md section: Snowflake Cortex (AI & ML) --- # %cortex-analyst% This feature is not available in the People's Republic of China. - [](/user-guide/views-semantic/sql) - [](/user-guide/snowflake-cortex/cortex-analyst/rest-api) - [](/user-guide/snowflake-cortex/cortex-analyst/verified-query-repository) - [](/user-guide/snowflake-cortex/cortex-analyst/tutorials/tutorial-1) - [](/user-guide/snowflake-cortex/cortex-analyst-evaluations) ## Overview %cortex-analyst% is a fully-managed, LLM-powered [Snowflake Cortex](https://www.snowflake.com/en/data-cloud/cortex/) feature that helps you create applications capable of reliably answering business questions based on your structured data in Snowflake. With %cortex-analyst%, business users can ask questions in natural language and receive direct answers without writing SQL. Available as a convenient REST API, %cortex-analyst% can be seamlessly integrated into any application. Building a production-grade conversational self-service analytics solution requires a service that generates accurate text-to-SQL responses. For most teams, developing such a service that successfully balances accuracy, latency, and costs is a daunting task. %cortex-analyst% simplifies this process by providing a fully managed, sophisticated agentic AI system that handles all of these complexities, generating highly accurate text-to-SQL responses. It helps you accelerate the delivery of high-precision, self-serve conversational analytics to business teams, while avoiding time sinks such as complex RAG solution patterns, model experimentation, and GPU capacity planning. The generated SQL queries are executed against the scalable Snowflake engine, ensuring industry-leading price performance and lower total cost of ownership (TCO). Want to get started with %cortex-analyst% quickly? Try the [](/user-guide/snowflake-cortex/cortex-analyst/tutorials/tutorial-1) tutorial. ### Key features - *Self-serve analytics via natural language queries.* Delight your business teams and non-technical users with instant answers and insights from their structured data in Snowflake. Using %cortex-analyst%, you can build downstream chat applications that allow your users to ask questions using natural language and receive accurate answers on the fly. - *Convenient REST API for integration into existing business workflows.* %cortex-analyst% takes an API-first approach, giving you full control over the end user experience. Easily integrate %cortex-analyst% into existing business tools and platforms, bringing the power of data insights to where business users already operate, such as Streamlit apps, Slack, Teams, custom chat interfaces, and more. - *Powered by state-of-the-art large language models:* By default, %cortex-analyst% is powered by industry-leading models which run securely inside [Snowflake Cortex](#label-cortex-llm-availability), Snowflake's intelligent, fully managed AI service. At runtime, %cortex-analyst% selects the best combination of models to ensure the highest accuracy and performance for each query. As LLMs evolve, Snowflake may add more models to the mix to further improve performance and accuracy. - *Semantic models for high precision and accuracy:* Generic AI solutions often struggle with text-to-SQL conversions when given only a database schema, as schemas lack critical knowledge like business process definitions and metrics handling. %cortex-analyst% overcomes this limitation by using a [semantic model](/user-guide/views-semantic/sql) to bridge the gap between business users and databases. Captured in a lightweight YAML file, the overall structure and concepts of the semantic model are similar to those of database schemas, but allow for a richer description of the semantic information around the data. If you set up %cortex-analyst% to answer questions from a large number of data sources, %cortex-analyst% can automatically figure out which one to use. You don't have to worry about specifying the right one with each query. - *Security and governance.* Snowflake's privacy-first foundation and enterprise-grade security ensure that you can explore AI-driven use cases with confidence, knowing your data is protected by the highest standards of privacy and governance. - %cortex-analyst% does not train on Customer Data. We do not use your Customer Data to train or fine-tune any Model to be made available for use across our customer base. Additionally, for inference, %cortex-analyst% uses the metadata provided in the semantic model YAML file (e.g., table names, column names, value type, descriptions, etc.) only for SQL-query generation. This SQL query is then executed in your Snowflake virtual warehouse to generate the final output. - Data stays within Snowflake's governance boundary. By default, %cortex-analyst% is powered by Snowflake-hosted LLMs from Mistral and Meta, ensuring that no data, including metadata or prompts, leaves Snowflake's governance boundary. - Seamless integration with Snowflake's Privacy and Governance features. %cortex-analyst% fully integrates with Snowflake's role-based access control (RBAC) policies, ensuring that SQL queries generated and executed adhere to all established access controls. This guarantees robust security and governance for your data. ## Understanding Semantic Views %cortex-analyst% uses [Semantic Views](/user-guide/views-semantic/overview) to understand your data and generate accurate SQL queries. Semantic Views are schema-level objects that define business concepts, metrics, and relationships in a way that bridges the gap between how business users think about data and how it's stored in database tables. ### What are Semantic Views? Semantic Views provide a business-friendly layer over your data by defining: - **Logical tables** that represent business entities (such as customers, orders, or products) - **Dimensions** that provide categorical context (such as customer name, product category, or order date) - **Facts** that capture row-level quantitative data (such as sale amounts or quantities) - **Metrics** that aggregate data into business KPIs (such as total revenue or average order value) - **Relationships** that define how tables join together ### Why use Semantic Views with Cortex Analyst? Semantic Views significantly improve the accuracy and reliability of %cortex-analyst% by: - **Providing rich metadata**: Descriptions, synonyms, and data types help the LLM understand your data - **Defining business logic**: Metrics capture the correct aggregation formulas and calculation rules - **Establishing relationships**: Join paths are predefined, ensuring correct multi-table queries - **Offering verified examples**: Sample questions and their SQL answers guide query generation ### Benefits of Semantic Views Semantic Views are the recommended approach for working with %cortex-analyst% because they offer: - **Native Snowflake integration**: Full RBAC, privilege management, and governance features - **Sharing capabilities**: Easily share semantic views through Snowflake's sharing mechanisms - **Advanced features**: Support for derived metrics that combine data from multiple tables - **Access modifiers**: Mark facts and metrics as public or private to control visibility - **Custom instructions**: Provide guidance to %cortex-analyst% for SQL generation and question categorization ### Legacy semantic model support Legacy semantic model YAML files (stored on stages) are still supported for backward compatibility, but Semantic Views are the recommended approach for new implementations. For more information about Semantic Views and their YAML specification, see: - [](/user-guide/views-semantic/overview) - [](/user-guide/views-semantic/semantic-view-yaml-spec) - [](/user-guide/views-semantic/sql) ## Access control requirements To make requests to %cortex-analyst%, use a role with either the SNOWFLAKE.CORTEX_USER or SNOWFLAKE.CORTEX_ANALYST_USER database role. CORTEX_USER provides access to all Covered AI features, while CORTEX_ANALYST_USER provides access only to %cortex-analyst%. For information about Covered AI features, see [](#label-analyst-legal-notices). To use %cortex-analyst% with a semantic model, you also need the following privileges: | Privilege | Object | | ------------- | ----------------------------------------------------------------------------------------------- | | READ or WRITE | Stage that contains the semantic model YAML file, if the semantic model is uploaded to a stage. | | USAGE | The Cortex Search services mentioned in the semantic model. | | SELECT | The tables mentioned in the semantic model. | Requests to the %cortex-analyst% API must include an authorization token. For details on how to authenticate to the API, see [](/developer-guide/snowflake-rest-api/authentication). Note that the example in this topic uses a session token to authenticate to a Snowflake account. ### Limiting access to specific roles By default, the CORTEX_USER role is granted to the PUBLIC role. The PUBLIC role is automatically granted to all users and roles. If you don't want all users to have this privilege, you can revoke access to the PUBLIC role and grant access to specific roles. For more information, see [](#label-cortex-llm--privileges). To control access to specific semantic models, you can store the YAML file in a stage and control access to that stage. ### Limiting access using the Cortex Analyst user role To provide selective access to Cortex Analyst for specific users, use the SNOWFLAKE.CORTEX_ANALYST_USER database role. This role includes the privileges needed to call the Cortex Analyst API. For more information about Covered AI features, see [](#label-analyst-legal-notices). If your user roles have the CORTEX_USER role, you must revoke access to the CORTEX_USER role. To revoke the CORTEX_USER database role from your user roles, run the following command using the ACCOUNTADMIN role: ```sql REVOKE DATABASE ROLE SNOWFLAKE.CORTEX_USER FROM ROLE analyst; ``` To provide access to %cortex-analyst%, use the ACCOUNTADMIN role to do the following: 1. Grant the SNOWFLAKE.CORTEX_ANALYST_USER database role to a custom role. 2. Assign this custom role to users. You can't grant database roles directly to users. For more information, see [](/sql-reference/sql/grant-database-role). The following example: 1. Creates the custom role, `cortex_analyst_user_role`. 2. Grants it the CORTEX_ANALYST_USER database role. 3. Assigns this role to `example_user`. ```sql USE ROLE ACCOUNTADMIN; CREATE ROLE cortex_user_role; GRANT DATABASE ROLE SNOWFLAKE.CORTEX_ANALYST_USER TO ROLE cortex_analyst_user_role; GRANT ROLE cortex_analyst_user_role TO USER example_user; ``` You can also grant access to %cortex-analyst% through existing roles. For example, if you have an `analyst` role used by analysts in your organization, you can grant access with a single GRANT statement: ```sql GRANT DATABASE ROLE SNOWFLAKE.CORTEX_ANALYST_USER TO ROLE analyst; ``` ## Region availability %cortex-analyst% is natively available in the following regions. - AWS ap-northeast-1 (Tokyo) - AWS ap-southeast-2 (Sydney) - AWS us-east-1 (Virginia) - AWS US East (Commercial Gov - N. Virginia) - AWS us-west-2 (Oregon) - AWS eu-central-1 (Frankfurt) - AWS eu-west-1 (Ireland) - Azure East US 2 (Virginia) - Azure West Europe (Netherlands) If your Snowflake account is in a different cloud region, you can still use %cortex-analyst% by leveraging [](/user-guide/snowflake-cortex/cross-region-inference). Once cross-region inference is enabled, %cortex-analyst% processes requests in other regions for models that are not available in your default region. For optimal performance, configure cross-region with AWS US regions. ## Multi-turn conversation in %cortex-analyst% %cortex-analyst% supports multi-turn conversations for data-related questions. This feature enables asking follow-up questions that build on previous queries, creating a more dynamic and interactive data exploration experience. For example, the user asks, "What is the month-over-month revenue growth for 2021 in Asia?", then follows up with, "What about North America?" %cortex-analyst% recognizes the follow-up, retrieves the context from the initial query, and rephrases the second question as: "What is the month-over-month revenue growth for 2021 in North America?" %cortex-analyst% then generates a SQL query to answer this question. To use this feature, pass the conversation history in the `messages` field: ```json { "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What is the month over month revenue growth for 2021 in Asia?" } ] }, { "role": "analyst", "content": [ { "type": "text", "text": "We interpreted your question as ..." }, { "type": "sql", "statement": "SELECT * FROM table" } ] }, { "role": "user", "content": [ { "type": "text", "text": "What about North America?" } ] }, ], "semantic_model_file": "@my_stage/my_semantic_model.yaml" } ``` The conversation history is an array of messages in chronological order, where each message has a role and content. The role can be `"user"` (for previous questions) or `"analyst"` (for previous responses). Analyst responses have both text and SQL responses, as shown in the example above, while user messages have only text. Large language models like the ones used by %cortex-analyst% do not store state between requests. The full history is processed for each new query in a conversation, with corresponding compute cost that increases with each round. ### Known limitations in multi-turn conversations Some of the following limitations might be addressed in future versions of %cortex-analyst%.
Access to the results of previous SQL queries
%cortex-analyst% doesn't have access to results from previous SQL queries. For example, if you first ask, "What are my products?" and then ask, "What is the revenue of the second product?", %cortex-analyst% cannot refer to the list of products from the first query to get the second product.
General business insights
%cortex-analyst% is limited to answering questions that can be resolved with SQL. It does not generate insights for broader business-related queries, such as "What trends do you observe?"
Long conversations
If a conversation includes too many turns or the user shifts intent frequently, %cortex-analyst% might struggle to interpret the follow-up questions. In such cases, reset the conversation and start again.
## Evaluate and improve your semantic view You can evaluate the quality of your semantic view by running your verified queries against %cortex-analyst% and measuring how accurately it generates SQL. Use evaluation results to identify areas for improvement, track regressions over time, and iteratively refine your semantic view. For more information, see [](/user-guide/snowflake-cortex/cortex-analyst-evaluations). ## Getting started Developers can use the following resources to get started with %cortex-analyst%: 1. Basic code example: The [](#label-analyst-access-example) in the following section provides a simple, easy-to-read script that helps you create an interactive app using %cortex-analyst%. Choose this option if you want a basic fundamental example to start with, and are comfortable with using Streamlit and making your own modifications. You can run this example either in %sis% (SiS) or locally. 2. Snowflake Samples repository: If you're instead looking for a more comprehensive implementation, the %cortex-analyst% advanced SiS demo in the Snowflake Samples repository has all the features and options already set up. This repository is configured with various pre-built features that make deploying %cortex-analyst% seamless and robust. Choose this option if you are trying to test out the feature for the first time, or have fewer custom modifications to make. This is shown only as an example. Snowflake does not provide support for the below content, nor does Snowflake warrant that the below content is accurate. To learn more, see the [Cortex Analyst advanced SiS demo](https://github.com/Snowflake-Labs/sf-samples/tree/main/samples/cortex-analyst/Advanced%20SiS%20Demo) in the Snowflake Samples GitHub repository. ## %cortex-analyst% example Follow these steps to create an interactive Streamlit in Snowflake (SiS) or standalone Streamlit app that uses %cortex-analyst%. 1. [Create a semantic model](#label-copilot-create-semantic-model) 2. [Upload the semantic model to stage](#label-copilot-upload-semantic-model) 3. Create and run a [Streamlit in Snowflake app](#label-copilot-create-streamlit-app) 4. [Interact with the Streamlit in Snowflake app](#label-copilot-interact-with-copilot) ### Create a semantic model A [semantic model](/user-guide/views-semantic/sql) is a lightweight mechanism that addresses issues related to the language difference between business users and database definitions by allowing for the specification of additional semantic details about a dataset. These additional semantic details, like more descriptive names or synonyms, enable %cortex-analyst% to answer data questions much more reliably. 1. Start with a list of questions you would like %cortex-analyst% to answer. Based on that, decide on the dataset for your semantic model. 2. Create your semantic view using the [Semantic View Autopilot](/user-guide/views-semantic/autopilot) or review the [YAML specification](#label-semantic-views-create-from-yaml) to create one manually. ### Upload semantic model You can upload a semantic model YAML file to a [stage](/sql-reference/sql/create-stage) or pass the semantic model YAML as a string in the request body. If you upload a semantic model YAML to a stage, access to that semantic model is controlled by access to the stage it's uploaded to. This means that any role with access to the stage can access the semantic models on that stage even if the role doesn't have access to the tables that the models are based on. Ensure that roles granted access to a stage have SELECT access on all tables referenced in all semantic models on that stage. Below is an example of how to set up the stages containing the semantic models. One stage (`public`) is accessible to all members of the organization, whereas another stage (`sales`) is only accessible to the `sales_analyst` role. Create the database and schema for the stage. The following example creates a database named `semantic_model` with a schema named `definition` but you can use any valid identifier string for these names. ```sql CREATE DATABASE semantic_model; CREATE SCHEMA semantic_model.definitions; GRANT USAGE ON DATABASE semantic_model TO ROLE PUBLIC; GRANT USAGE ON SCHEMA semantic_model.definitions TO ROLE PUBLIC; USE SCHEMA semantic_model.definitions; ``` Then create the stages for storing your semantic models: ```sql CREATE STAGE public DIRECTORY = (ENABLE = TRUE); GRANT READ ON STAGE public TO ROLE PUBLIC; CREATE STAGE sales DIRECTORY = (ENABLE = TRUE); GRANT READ ON STAGE sales TO ROLE sales_analyst; ``` In Snowsight, you can refresh the page and find the newly created stages in the [database object explorer](/user-guide/ui-snowsight-data). You can open the stage page in a new tab and upload your YAML files in Snowsight. Alternatively, you can use the [Snowflake CLI client](/developer-guide/snowflake-cli/command-reference/stage-commands/copy) to upload from your local file system. ```snowcli snow stage copy file:///path/to/local/file.yaml @sales ``` ### Creating a Streamlit in Snowflake App This example shows you how to create a Streamlit in Snowflake app that takes a natural language question as input and calls %cortex-analyst% to generate an answer based on the semantic model you provide. This is shown only as an example. Snowflake does not provide support for the below content, nor does Snowflake warrant that the below content is accurate. For more information on creating and running Streamlit apps in Snowflake, see [](/developer-guide/streamlit/about-streamlit). 1. Follow the directions in [](#label-streamlit-create-app) to create a new Streamlit app in Snowsight. 2. Copy the [Streamlit code](https://github.com/Snowflake-Labs/sfguide-getting-started-with-cortex-analyst/blob/main/cortex_analyst_sis_demo_app.py) from our GitHub repo into the code editor. 3. Replace the placeholder values with your account details. 4. To preview the app, select **Run** to update the content in the Streamlit preview pane. ### Interact with the Streamlit App 1. Navigate to the Streamlit app in your browser or the Streamlit in Snowflake preview pane. 2. Start asking questions about your data in natural language (e.g. "What questions can I ask?"). ### Create a standalone Streamlit app You can also use the example code to build a standalone app. This is shown only as an example. Snowflake does not provide support for the below content, nor does Snowflake warrant that the below content is accurate. 1. Install [Streamlit](https://pypi.org/project/streamlit/). 2. Create a Python file locally called `analyst_api.py`. 3. Copy the [Streamlit code](https://github.com/Snowflake-Labs/sfguide-getting-started-with-cortex-analyst/blob/main/cortex_analyst_streaming_demo.py) from our GitHub repo into the file. 4. Replace the placeholder values with your account details. 5. Run the Streamlit app using `streamlit run analyst_api.py`. The database and schema specified in the code is the stage location for the semantic model YAML file. The role used in the Snowflake connector should have access to underlying data defined in semantic model. For a more comprehensive implementation, see the [Cortex Analyst advanced SiS demo](https://github.com/Snowflake-Labs/sf-samples/tree/main/samples/cortex-analyst/Advanced%20SiS%20Demo) in the Snowflake Samples GitHub repository. This repository is configured with various pre-built features that make deploying %cortex-analyst% seamless and robust. ## Disable %cortex-analyst% functionality If you do not want %cortex-analyst% to be available in your account, disable the feature by changing the ENABLE_CORTEX_ANALYST parameter using the ACCOUNTADMIN role: ```sql USE ROLE ACCOUNTADMIN; ALTER ACCOUNT SET ENABLE_CORTEX_ANALYST = FALSE; ``` ## Control models used by Cortex Analyst You can use model-level RBAC (role-based access control) to control access to the models used by %cortex-analyst%. Each model is protected by a designated application role, and administrators can grant or revoke access to specific LLMs via these model-specific roles. For more information, see [](#label-cortex-llm-rbac). Model-level RBAC is an advanced feature intended for customers with specific regulatory or compliance requirements that dictate which models can be used and where they can be hosted. If you do not have such requirements, Snowflake recommends that you do not use this feature. You cannot choose a model directly. Instead, %cortex-analyst% assigns each request to a model, or to a combination of models, taking into account the following factors: - The models [available in your Snowflake region](#label-cortex-llm-availability). - The account's [cross-region inference configuration](/user-guide/snowflake-cortex/cross-region-inference). - Any model-level [RBAC restrictions](#label-cortex-llm-rbac) you have established. Different models produce different results. For consistent results, use the same Snowflake region, cross-region inference configuration, and model-level RBAC restrictions for all requests. %cortex-analyst% selects models in the following order of preference, using the highest-ranked model to which your role has access. If your role has access to none of these models, the request fails. #. Anthropic Claude Sonnet 4.6 #. Anthropic Claude Sonnet 4.5 #. OpenAI GPT 4.1 #. Arctic Text2SQL R1.5 (with thinking enabled) #. Combination of Mistral Large 2 and Llama 3.1 70b %cortex-analyst%'s model selection behavior may change from time to time to take advantage of advances in model functionality. ### Risks and limitations %cortex-analyst% relies upon the availability at least one supported model configuration. Disabling specific models reduces fallback options and increases the risk of query failures. Model-level restrictions apply to all Cortex features that can use the model; it is not possible to restrict access to a model only in %cortex-analyst% or in any other single Cortex feature. ## Cost considerations The credit rate usage for %cortex-analyst% is based on the number of messages processed as outlined in the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf). Only successful responses (HTTP 200) are counted. The number of tokens in each message only affects cost when Cortex Analyst is invoked using Cortex Agents. Otherwise, the number of tokens in each message does not affect cost. The above charges cover AI costs for text-to-SQL. Additional warehouse costs apply when you execute the SQL generated by %cortex-analyst%. ### Monitoring the cost of %cortex-analyst% To view credit consumption for %cortex-analyst%, use the [](/sql-reference/account-usage/cortex_analyst_usage_history). For example: ```sql SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_ANALYST_USAGE_HISTORY; ``` Usage of %cortex-analyst% also appears in the [](/sql-reference/account-usage/metering_history) in the ACCOUNT_USAGE schema with a service type of AI_SERVICES. ## Legal notices Where your configuration of %cortex-analyst% uses a model provided on the [Model and Service Flow-down Terms](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/ai-features/open-source-model-flow-down-terms/), your use of that model is further subject to the terms for that model on that page. The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [](/guides-overview-ai-features). --- title: Cortex Analyst administrator monitoring source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst/admin-observability.md section: Snowflake Cortex (AI & ML) --- # %cortex-analyst% administrator monitoring To improve the quality of answers provided by %cortex-analyst%, you must continue to refine the semantic model or view. To help you refine the model or view, %cortex-analyst% logs requests to an event table in the Snowflake database. The logs include the following: - The user who asked the question - The question asked - Generated SQL - Errors and/or warnings - Request and response bodies - Other metadata There is a small lag, on the order of 1-2 minutes, between a request being made and it being visible in the view. ## Accessing logs You can view these logs in the **Monitoring** tab of the Semantic View within %sf-web-interface%. In order to view the logs, users must have the SELECT privilege on referenced tables, in addition to: - MONITOR or OWNERSHIP on the semantic view (when using semantic views) - WRITE privilege on the stage (for semantic models stored in a file on a stage) Alternatively, you can query the logs directly from the Snowflake database using SQL, depending on your privileges. ## Querying logs with SQL Call the SNOWFLAKE.LOCAL.CORTEX_ANALYST_REQUESTS table function to retrieve logs for a specific semantic model or view. This table function performs access control checks to ensure that the caller has required privileges to access the request data. The following is an example of how to call the function: ```sqlsyntax SELECT * FROM TABLE( SNOWFLAKE.LOCAL.CORTEX_ANALYST_REQUESTS( '', '' ) ); ``` When calling this function, pass in the following arguments: - semantic_model_or_view_type: Specify the type of semantic model or view used in the requests: - For a semantic model defined in a file on a stage, specify `'FILE_ON_STAGE'`. - For a semantic view, specify `'SEMANTIC_VIEW'`. - semantic_model_or_view_name: Specify the location where the semantic model or view is defined: - For a semantic view defined in a file on a stage, specify the fully qualified path to the semantic view specification file (for example, `@my_db.my_schema.my_stage/path/to/file.yaml`). - For a semantic view, specify the fully qualified name of the semantic view. Returns: A table with all API requests for the specified semantic model or view. If a query was made using inline YAML (instead of a semantic view or a file on stage), the request will not be accessible via the table function, but will be visible in the view and event table detailed below. If you are using a role that has been granted the SNOWFLAKE.CORTEX_ANALYST_REQUESTS_ADMIN or SNOWFLAKE.CORTEX_ANALYST_REQUESTS_VIEWER application role, you can query the [SNOWFLAKE.LOCAL.CORTEX_ANALYST_REQUESTS_V](/sql-reference/local/cortex_analyst_requests_v) view. This view includes all requests to %cortex-analyst% across all semantic models and views. You can also query the raw event data in the SNOWFLAKE.LOCAL.CORTEX_ANALYST_REQUESTS_RAW event table. The responses are in the [OpenTelemetry format](https://opentelemetry.io/docs/specs/otel/). The SNOWFLAKE.LOCAL.CORTEX_ANALYST_REQUESTS_V view contains the same data, formatted and processed for human readability. --- title: Cortex Analyst evaluations source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst-evaluations.md section: Snowflake Cortex (AI & ML) --- # %cortex-analyst% evaluations - [](/user-guide/snowflake-cortex/cortex-analyst) - [](/user-guide/views-semantic/overview) - [](/user-guide/snowflake-cortex/cortex-analyst/verified-query-repository) This feature is not available in the People's Republic of China. %cortex-analyst% evaluations let you measure and improve the performance of your semantic views that are used for SQL generation. Evaluations work by testing your semantic views against their own verified queries as the ground truth. This gives confidence that your semantic view can handle queries that users rely on, which can also translate to higher accuracy for SQL in general. Evaluations measure accuracy by executing the SQL generated by %cortex-analyst% and comparing the results against your verified queries. Regression metrics are aggregated to track verified queries that were previously answered correctly but are now failing. In addition to these correctness metrics, latency is recorded to track performance of queries. These metrics can be used to identify weaknesses and iteratively refine your semantic views to improve SQL accuracy while preventing regressions. ## Access control requirements The ability to run a %cortex-analyst% evaluation requires a role with the following: - The DATABASE ROLE SNOWFLAKE.CORTEX_USER - The EXECUTE TASK ON ACCOUNT global privilege - The CREATE TASK privilege on the schema containing your semantic view - The CREATE DATASET ON SCHEMA privilege on the schema containing your semantic view - The SELECT privilege on the semantic view and the tables referenced in the semantic view - The MONITOR privilege on the semantic view All of the above privileges must be granted under a single primary role. Evaluation runs are executed using Snowflake tasks, which do not consider secondary role privileges. ## Prepare an evaluation set %cortex-analyst% evaluations use verified queries (VQs) as the evaluation set. Each verified query pairs a natural language question with its expected SQL answer. Before running an evaluation, you need at least one verified query associated with your semantic view. If you don't have any verified queries yet, add them through the semantic view editor in %sf-web-interface%. For more information, see [](/user-guide/snowflake-cortex/cortex-analyst/verified-query-repository). ### How verified queries are used during evaluation When you select verified queries for an evaluation run, %cortex-analyst% creates a temporary copy of your semantic view with those selected queries removed. %cortex-analyst% then generates SQL using this temporary copy, which does not contain the evaluation queries. This prevents the evaluation queries from influencing SQL generation, ensuring that the evaluation measures how well %cortex-analyst% can answer questions without relying on exact matches from verified queries. Verified queries that you do *not* select for evaluation remain in the temporary semantic view and continue to guide %cortex-analyst% during the evaluation run, just as they would during normal usage. A verified query can either guide %cortex-analyst% at runtime or be used as evaluation ground truth, but not both at the same time. Selecting a verified query for evaluation temporarily removes it from the semantic view so the evaluation result reflects genuine SQL generation ability. ## Start a %cortex-analyst% evaluation ### %sf-web-interface% Begin your evaluation of a semantic view by doing the following: 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **AI & ML** » **Cortex Analyst**. 3. From the list, select the semantic view you want to run the evaluation on. 4. Select the **Evaluations** tab. 5. Select **Create evaluation run**. 6. In the **Name** field, provide a name for your evaluation. This name should be unique for the semantic view being evaluated. 7. Select **Next**. This advances to the **Select verified queries** modal. 8. Select which verified queries to include in the evaluation. You can either select all verified queries or select a specific set by checking the corresponding boxes. 9. Select **Run evaluation**. ### SQL %cortex-analyst% evaluation runs can also be started with SQL using the [](/sql-reference/functions/execute_ai_evaluation) function. This function accepts the following `evaluation_job` values: - `'START'`: Start an evaluation run. - `'STATUS'`: Query the progress of an evaluation run. - `'CANCEL'`: Cancel a running evaluation. - `'DELETE'`: Delete a completed evaluation run and its results. Each call requires the following additional arguments: - `run_parameters`: A SQL [OBJECT](#label-data-type-object) containing the key `run_name`, with a value of the name of your run. - `config_file_path`: A stage file path pointing to your run configuration YAML file. For the YAML specification, see [](#label-analyst-evaluation-yaml-spec). The following example starts an evaluation run called `Evaluation run 1`: ```sql CALL EXECUTE_AI_EVALUATION( 'START', OBJECT_CONSTRUCT('run_name', 'Evaluation run 1'), '@EVAL_DB.EVAL_SCHEMA.METRICS/analyst_evaluation_config.yaml' ); ``` After a run starts, you can query its progress: ```sql CALL EXECUTE_AI_EVALUATION( 'STATUS', OBJECT_CONSTRUCT('run_name', 'Evaluation run 1'), '@EVAL_DB.EVAL_SCHEMA.METRICS/analyst_evaluation_config.yaml' ); ``` To cancel or delete a run, replace `'STATUS'` with `'CANCEL'` or `'DELETE'`. ## Inspect evaluation results ### %sf-web-interface% The **Evaluations** tab for a semantic view in %sf-web-interface% gives an overview of every evaluation run and a summary of each, including the number of query regressions. To view evaluation results: 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **AI & ML** » **Cortex Analyst**. 3. From the list, select the semantic view you want to view evaluations for. 4. Select the **Evaluations** tab. 5. Select an individual run to see detailed results. The run detail page shows: - **Accuracy** – The percentage of verified queries where the generated SQL was judged correct, with an option to **Improve** the semantic view. - **Regressions** – The number of verified queries that were previously correct but are now failing. - **Latency** – Average and per-query response times for %cortex-analyst%. - **Per-query results** – For each verified query: the natural language question, the expected SQL, the generated SQL, and whether the result was correct or incorrect. Select a query to see the detailed comparison. ### SQL To retrieve the results of an evaluation run, use the `GET_ANALYST_AI_EVALUATION_DATA` function. This function has the following required arguments: - `database`: The database containing the semantic view. - `schema`: The schema containing the semantic view. - `object_name`: The name of the semantic view. - `object_type`: The string constant `'SEMANTIC VIEW'`. - `run_name`: The name of the evaluation run to retrieve. The following example displays the full evaluation details for a run called `Evaluation run 1`, where the semantic view is named `SEMANTIC_VIEW_EVAL` stored on the schema `EVAL_DB.EVAL_SCHEMA`: ```sql SELECT * FROM TABLE(SNOWFLAKE.LOCAL.GET_ANALYST_AI_EVALUATION_DATA( 'EVAL_DB', 'EVAL_SCHEMA', 'SEMANTIC_VIEW_EVAL', 'SEMANTIC VIEW', 'Evaluation run 1') ); ``` #### Evaluation results table format The `GET_ANALYST_AI_EVALUATION_DATA` function returns a table with the following columns:
## Analyst evaluation YAML specification To trigger evaluation runs programmatically, you need a YAML configuration file uploaded to a Snowflake stage. This section describes the YAML format and how to upload it. ### YAML format ```yaml evaluation: analyst_params: analyst_name: "SEMANTIC_VIEW_EVAL" analyst_type: "SEMANTIC VIEW" source_metadata: type: "verified_queries" # Optional: list specific verified queries by their question text. # If omitted, all verified queries are used. verified_queries: - "What are our top 10 customers from the last 30 days?" - "What is the total revenue by region for Q1 2025?" metrics: - "sql_correctness" ``` **analyst_params** - `analyst_name`: The name of the semantic view to run the evaluation against. - `analyst_type`: The string constant `SEMANTIC VIEW`. **source_metadata** - `type`: The type of source used as the evaluation data. For %cortex-analyst%, the only supported source type is `verified_queries`. - `verified_queries` (optional): A list of questions matching the `question` field of each verified query that should be used as ground truth for the evaluation. If not provided, all verified queries are used. **metrics** The metrics to compute for the evaluation. `sql_correctness` is the only supported metric for %cortex-analyst% evaluations. ### Upload configuration to a stage Upload your YAML configuration to a Snowflake stage. The following example creates a file format, creates a stage, and uploads a local configuration file: ```sql CREATE OR REPLACE FILE FORMAT evals_db.evals_schema.yaml_file_format TYPE = 'CSV' FIELD_DELIMITER = NONE RECORD_DELIMITER = '\n' SKIP_HEADER = 0 FIELD_OPTIONALLY_ENCLOSED_BY = NONE ESCAPE_UNENCLOSED_FIELD = NONE; CREATE OR REPLACE STAGE evals_db.evals_schema.metrics FILE_FORMAT = evals_db.evals_schema.yaml_file_format; PUT file:///Users/dev/analyst_evaluation_config.yaml @evals_db.evals_schema.metrics AUTO_COMPRESS='false' OVERWRITE=TRUE; ``` Snowflake recommends keeping your YAML file uncompressed. ## Improve your semantic view Use evaluation results to iteratively improve your semantic view. The recommended workflow is: 1. **Run an evaluation** to establish a baseline accuracy score. 2. **Inspect a completed run** by selecting it from the **Evaluations** tab to review the expected vs generated SQL for each query. 3. **Optimize your semantic view** by selecting **Improve** in the **Accuracy** summary box. This starts semantic view optimization, which analyzes the evaluation failures and automatically suggests changes to your semantic view. For more information, see [](/user-guide/snowflake-cortex/cortex-analyst/analyst-optimization). 4. **Re-run the evaluation** to measure the impact of the changes. Repeat this cycle to incrementally improve your semantic view's accuracy. Tracking accuracy across runs lets you detect regressions if a change inadvertently breaks previously correct queries. ## Known limitations %cortex-analyst% evaluations are subject to the following limitations: - **Single semantic view per run**: Each evaluation run evaluates one semantic view. Evaluating across multiple semantic views in a single run is not supported. - **No multi-turn evaluation**: Evaluation queries are processed independently. Follow-up or multi-turn conversation evaluation is not supported. - **No auto-generated evaluation datasets**: Evaluation sets must be manually curated from verified queries. Automatic generation from query history, dashboards, or synthetic generation is not available. - **Ground truth staleness**: If your verified queries reference time-relative concepts (for example, `last quarter` rather than `Q1 2025`), evaluation results may drift over time. Scope queries to specific, absolute dates and time ranges for consistent results. ## Cost considerations %cortex-analyst% evaluations run queries against your semantic view and use evaluation judges to score correctness. You are charged for: - **Warehouse charges** for running the evaluation queries against %cortex-analyst% using the warehouse selected for the evaluation run. - **Evaluation credits** for the [](/sql-reference/functions/ai_complete) function calls used to compute the `sql_correctness` metric. - **Storage charges** for datasets and evaluation results stored in your account. For more information on estimating costs, see [](/user-guide/cost-understanding-overall). --- title: Cortex Analyst REST API source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst/rest-api.md section: Snowflake Cortex (AI & ML) --- # %cortex-analyst% REST API - [](/user-guide/snowflake-cortex/cortex-analyst) - [](/user-guide/views-semantic/sql) - [](/user-guide/snowflake-cortex/cortex-analyst/verified-query-repository) Use this API to answer questions about your data with natural language queries. ## Send message `POST /api/v2/cortex/analyst/message` Generates a SQL query for the given question using a semantic model or [semantic view](/user-guide/views-semantic/overview) provided in the request. One or more models can be specified; when multiple models are specified, %cortex-analyst% chooses the most appropriate one. You can have multi-turn conversations where you can ask follow-up questions that build upon previous queries. For more information, see [](#label-analyst-multi-turn-conversation). The request includes a user question; the response includes the user question and the analyst response. Each message in a response can have multiple content blocks of different types. Three values that are currently supported for the `type` field of the content object are: `text`, `suggestions`, and `sql`. Responses can be sent all at once after processing is complete, or incrementally as they are generated. ### Request headers
### Request body In the request body: - Set the last `messages[].role` field to the role of the speaker, which must be `user`. - Include the user's question in the `content` object. In this object: - Set `type` to `text`. - Set `text` to the user's question. - Include one of the following: - The [YAML specification](/user-guide/views-semantic/semantic-view-yaml-spec) for a semantic view. - The path to the YAML file that contains the semantic view specification. This file must be on a stage. - The name of the semantic view. The following table describes the fields that you can set in the body of the request:
You must specify one of the following fields in the body of the request: - `semantic_model_file` - `semantic_model` - `semantic_models` - `semantic_view` #### Example of specifying a semantic model in a file on a stage ```json { "messages": [ { "role": "user", "content": [ { "type": "text", "text": "which company had the most revenue?" } ] } ], "semantic_model_file": "@my_db.my_schema.my_stage/my_semantic_model.yaml" } ``` #### Example of specifying a semantic view ```json { "messages": [ { "role": "user", "content": [ { "type": "text", "text": "which company had the most revenue?" } ] } ], "semantic_view": "MY_DB.MY_SCH.MY_SEMANTIC_VIEW" } ``` ### Non-streaming response This operation can return the response codes listed below. The response always has the following structure. Currently, three content types are supported for the response, `text`, `suggestion`, and `sql`. The content types `suggestion` and `sql` are mutually exclusive so that if the response contains a `sql` content type, it won't contain a `suggestion` content type, and vice versa. The `suggestion` content type is only included in a response if the user question was ambiguous and %cortex-analyst% could not return a SQL statement for that query. When the request contains a `semantic_models` field, the response includes a `semantic_model_selection` field that indicates which semantic model was chosen for the request. To ensure forward compatibility, make sure your implementation takes the content type into account and handles types.
By default, the response is returned all at once after %cortex-analyst% has fully processed the user's question. See [](#label-cortex-analyst-rest-api-streaming) for the format of streaming mode responses.
```json { "request_id": "75d343ee-699c-483f-83a1-e314609fb563", "message": { "role": "analyst", "content": [ { "type": "text", "text": "We interpreted your question as ..." }, { "type": "sql", "statement": "SELECT * FROM table", "confidence": { "verified_query_used": { "name": "My verified query", "question": "What was the total revenue?", "sql": "SELECT * FROM table2", "verified_at": 1714497970, "verified_by": "Jane Doe" } } } ] }, "warnings": [ { "message": "Table table1 has (30) columns, which exceeds the recommended maximum of 10" }, { "message": "Table table2 has (40) columns, which exceeds the recommended maximum of 10" } ], "response_metadata": { "model_names": [ "claude-sonnet-4-5" ], "cortex_search_retrieval": [ { "service": "my_db.my_schema.my_search_service", "response_body": { "results": [ { "CUST_NAME": "customer1" } ], "request_id": "request1" }, "query": "'customer1'" } ], "question_category": "CLEAR_SQL" } } ```
### Streaming response Streaming mode lets your client receive responses as they are generated by %cortex-analyst%, rather than waiting for the entire response to be generated. This improves the perceived responsiveness of your application, especially for long-running queries, because users begin seeing output much sooner. Streaming responses also provide status information that can help you understand where %cortex-analyst% is in the process of generating a response, and warnings that can help understand what went wrong when %cortex-analyst% doesn't work as you expected. To receive a streaming response, set the `stream` field in the request body to `true`. Streaming responses use [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events). %cortex-analyst% sends five distinct types of events in a streaming response: - `status`: Conveys status updates about the SQL generation process. - `message.content.delta`: Contains a piece of the response. This event is sent multiple times. - `error`: Indicates that %cortex-analyst% has encountered an error and cannot continue processing the request. No further `message.content.delta` events will be sent. - `warnings`: Contains any warnings encountered during processing. Warnings do not stop processing. - `response_metadata`: Sent at the end of a response to display data about request processing. - `done`: Sent to indicate that processing is complete and no further `message.content.delta` events will be sent. Of these, the `message.content.delta` events are the most crucial to understand, because they contain the actual response content. Each `delta` contains tokens from some field in the complete response. It is possible for each `delta` event to contain anywhere between a single character to the full response, and they may be of different lengths. You receive these tokens as they are generated; it is up to you to assemble them into the final response. Events from different responses (even extremely similar ones) can vary. There is no guarantee that events will be sent in the same order or with the same content. #### Simple example The following is a sample non-streaming response for a simple query: ```json { "message": { "role": "analyst", "content": [ { "type": "text", "text": "This is how we interpreted your question and this is how the sql is generated" }, { "type": "sql", "statement": "SELECT * FROM table" } ] } } ``` And this is one possible series of streaming events for that response (a different series of events is also possible): ```text event: status data: { status: "interpreting_question" } event: message.content.delta data: { index: 0, type: "text", text_delta: "This is how we interpreted your question" } event: status data: { status: "generating_sql" } event: status data: { status: "validating_sql" } event: message.content.delta data: { index: 0, type: "text", text_delta: " and this is how the sql is generated" } event: message.content.delta data: { index: 1, type: "sql", statement_delta: "SELECT * FROM table" } event: status data: { status: "done" } ``` Use the `index` field in the `message.content.delta` respnoses to determine which field in the full response the event is part of. For example, here the first two `delta` events use index 0, which means they are part of the first field (element 0) in the `content` array of the non-streaming response. Similarly, the `delta` event that contains the SQL response uses index 1. #### Example with suggestions This example contains suggested questions for an ambiguous question. The following is the non-streaming response: ```json { "message": { "role": "analyst", "content": [ { "type": "text", "text": "Your question is ambigous, here are some alternatives:" }, { "type": "suggestions", "suggestions": [ "which company had the most revenue?", "which company placed the most orders?" ] } ] } } ``` And here is a possible series of streaming events that constitute that response: ```text event: status data: { status: "interpreting_question" } event: message.content.delta data: { index: 0, type: "text", text_delta: "Your question is ambigous," } event: status data: { status: "generating_suggestions" } event: message.content.delta data: { index: 0, type: "text", text_delta: " here are some alternatives:" } event: message.content.delta data: { index: 1, type: "suggestions", suggestions_delta: { index: 0, suggestion_delta: "which company had", } } event: message.content.delta data: { index: 1, type: "suggestions", suggestions_delta: { index: 0, suggestion_delta: " the most revenue?", } } event: message.content.delta data: { index: 1, type: "suggestions", suggestions_delta: { index: 1, suggestion_delta: "which company placed", } } event: message.content.delta data: { index: 1, type: "suggestions", suggestions_delta: { index: 1, suggestion_delta: " the most orders?", } } event: status data: { status: "done" } ``` In this example, the `content` field of the non-streaming response is an array. One of the elements of `content` is the `suggestions` array. So the meaning of `index` fields for `text` and `suggestions` delta events refer to the location of elements in these two different arrays. You will need to keep track of these indexes separately when assembling the full response. Currently, the generated SQL statement is always sent in a single event. This may not be the case in the future. Your client must be prepared to receive the SQL statement in multiple events. #### Other examples You can find a Streamlit streaming client for %cortex-analyst% in the %cortex-analyst% [GitHub repo](https://github.com/Snowflake-Labs/sfguide-getting-started-with-cortex-analyst/blob/main/cortex_analyst_streaming_demo.py). This demo must be run locally; SiS does not currently support streaming. See the Cortex Analyst playground in the AI/ML Studio (in Snowsight) for an interactive demonstration of streaming response. ### Streaming event schemas The following are the OpenAPI/Swagger schemas of the events sent by %cortex-analyst% in a streaming response.
status
message.content.delta
error
``` StreamingError: type: object properties: message: type: string description: A description of the error code: type: string description: The Snowflake error code categorizing the error request_id: type: string description: Unique request ID ```
warnings
``` Warnings: type: object description: Warnings found while processing the request properties: warnings: type: array items: $ref: "#/components/schemas/Warning" Warning: type: object title: The warning object description: Represents a warning within a chat. properties: message: type: string description: A human-readable message describing the warning ```
response_metadata
``` ResponseMetadata: type: object description: Details about request processing ```
## Send feedback `POST /api/v2/cortex/analyst/feedback` Provides qualitative end-user feedback. Within %sf-web-interface%, the feedback is shown in **Snowsight → AI & ML → Cortex Analyst → Select Semantic View → Monitoring tab**. ### Request headers
### Request body
### Response Empty response body with status code 200. ## Access control requirements For information on the required privileges, see [](#label-analyst-access-control). For details about authenticating to the API, see [](/developer-guide/snowflake-rest-api/authentication). --- title: Cortex Analyst Verified Query Repository source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst/verified-query-repository.md section: Snowflake Cortex (AI & ML) --- # %cortex-analyst% Verified Query Repository This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/cortex-analyst) - [](/user-guide/snowflake-cortex/cortex-analyst/verified-query-suggestions) - [](/user-guide/views-semantic/sql) The %cortex-analyst% Verified Query Repository (VQR) can help improve the accuracy and trustworthiness of results by providing a collection of questions and corresponding SQL queries to answer them. %cortex-analyst% then leverages relevant SQL queries from the repository when answering similar questions. You can specify verified queries in your semantic model YAML file. Verified SQL queries must use the names of the logical tables and columns defined in the semantic model, not those in the underlying dataset. See the [example query and its discussion](#label-cortex-analyst-example-verified-query) for more information. Verified queries are specified in the `verified_queries` section of the semantic model, as shown here. ```yaml verified_queries: # Verified Query 1 - name: # A descriptive name of the query. question: # The natural language question that this query answers. verified_at: # Optional: Time (in seconds since the UNIX epoch, January 1, 1970) when the query was verified. verified_by: # Optional: Name of the person who verified the query. use_as_onboarding_question: # Optional: Marks this question as an onboarding question for the end user. sql: # The SQL query for answering the question. # Verified Query 2 - name: question: verified_at: verified_by: use_as_onboarding_question: sql: ``` Below is a sample semantic model that includes a verified query. ```yaml name: Sales Data tables: - name: sales_data base_table: database: sales schema: public table: sd_data dimensions: - name: state description: The state where the sale took place. expr: d_state data_type: TEXT unique: false sample_values: - "CA" - "IL" # Time dimension columns in the logical table. time_dimensions: - name: sale_timestamp synonyms: - "time_of_sale" - "transaction_time" description: The time when the sale occurred. In UTC. expr: dt data_type: TIMESTAMP unique: false # Measure columns in the logical table. measures: - name: profit synonyms: - "earnings" - "net income" description: The profit generated from a sale. expr: amt - cst data_type: NUMBER default_aggregation: sum verified_queries: - name: "California profit" question: "What was the profit from California last month?" verified_at: 1714497970 verified_by: Jane Doe use_as_onboarding_question: true sql: " SELECT sum(profit) FROM __sales_data WHERE state = 'CA' AND sale_timestamp >= DATE_TRUNC('month', DATEADD('month', -1, CURRENT_DATE)) AND sale_timestamp < DATE_TRUNC('month', CURRENT_DATE) " ``` In the example above, `__sales_data` corresponds to the `sales_data` table defined in the model. To avoid name conflicts, the name of the logical table is prefixed with two underscores. The columns used in the query (`state`, `sale_timestamp`, and `profit`) are the logical columns defined in the model's `sale_data` table. The names of the underlying columns (`d_state`, `dt`, `amt`, and `cst`) are not used directly in the query. As illustrated in the example, the question doesn't need to be a complete sentence, or actually in the form of a question, but it should reflect something a user might ask. Ensure that the SQL queries are syntactically correct and actually answer the posed questions; this is the essence of a "verified query." Invalid or inaccurate queries can negatively impact %cortex-analyst%'s performance and accuracy. Use the open-source semantic model generator app, described in the next section, to help add verified queries to your semantic model, without needing to concern yourself with SQL or YAML syntax. ## Adding verified queries using the semantic model generator Snowflake provides an open-source Streamlit app to help add verified queries to your model. To install and use this app, follow these instructions. 1. **Clone the repository.** Start by cloning the [semantic-model-generator](https://github.com/Snowflake-Labs/semantic-model-generator) repository. 2. **Configure credentials and install the app.** Follow the setup instructions in the repo's [README](https://github.com/Snowflake-Labs/semantic-model-generator/blob/main/README.md) to provide your Snowflake credentials and run the app either on Snowflake or locally. 3. **Configure the app.** Once the app is running, enter the database, schema, and stage location of your semantic model YAML file into the provided fields. The YAML file will appear in an interactive editor on the left side of the window. 4. **Generate a Query.** On the right side of the window, use the chat interface to ask a question that will generate a SQL query. 5. **Verify and Save the Query.**
- Inspect the generated query and the results it produces. If it works as expected, select the **Save as verified query** button below the assistant's answer to add the query to your semantic model. - If the generated query is incorrect, select the **Edit** button to modify the query. Run the modified query to check if it produces the intended results. Continue editing and testing until the query works as desired. Then select **Save as verified query** to add it to your semantic model.
1. **Update the Semantic Model.** Select the **Save** button in the bottom left of the window to update the semantic model. Repeat the process to add more queries. 2. **Upload the new YAML file.** Once you're satisfied with the queries you've added, select the **Upload** button, enter a file name for your new YAML file, and select **Submit Upload**. When you return to your stage in Snowsight, you'll see the new semantic model YAML file with your verified queries. ## Adding suggested %cortex-analyst% Verified Query entries %cortex-analyst% also provides the Verified Query Suggestion interface in %sf-web-interface%, which offers potential new verified queries based on user behavior. For information about adding verified query suggestions, see [](/user-guide/snowflake-cortex/cortex-analyst/verified-query-suggestions). ## Viewing verified queries used in the %cortex-analyst% response When the user's question is similar to a query in the Verified Query Repository (VQR), %cortex-analyst% uses that query to generate the SQL query in its response. To see which verified query was used, see the [confidence field](#label-cortex-analyst-rest-api-response) in the API response. --- title: Cortex Knowledge Extensions source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-knowledge-extensions/cke-overview.md section: Snowflake Cortex (AI & ML) --- # Cortex Knowledge Extensions This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/cortex-knowledge-extensions/cke-access-history) - [](/user-guide/snowflake-cortex/cortex-knowledge-extensions/overview-tutorials) - [](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) ## Overview Cortex Knowledge Extensions (CKEs) are [Cortex Search Services](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) that can be shared on the [Snowflake Marketplace](https://app.snowflake.com/_deeplink/marketplace) or via [private listings](#label-listings-create) or [organizational listings](/user-guide/collaboration/listings/organizational/org-listing-about). They can be used in a retrieval-augmented generation (RAG) architecture to integrate licensed and proprietary content into Cortex AI applications. For example, CKEs can be used to integrate knowledge from unstructured content, such as articles, market research, books, or forum posts, into Cortex AI applications, such as chatbots and agentic systems. ## How CKE works Here's how it works: 1. A Provider uploads their text data into a table in their account and creates a [Cortex Search Service](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) on the table. This Cortex Search Service is then shared on the on the [Snowflake Marketplace](https://app.snowflake.com/_deeplink/marketplace). A Cortex Search Service that is shared on the Snowflake Marketplace is known as a Cortex Knowledge Extension (CKE). 2. A Consumer builds an application leveraging Cortex AI, such as a chatbot, using [Cortex AI Functions](/user-guide/snowflake-cortex/aisql) or the [Cortex Agent API](/user-guide/snowflake-cortex/cortex-agents) with the CKE. 3. When a prompt is given to the Cortex AI application that is integrated with a CKE, the prompt is passed on to the CKE to get relevant knowledge by performing a semantic search. The relevant knowledge is given back to the Cortex AI applications's LLM and reasoned over before returning an answer back to the user with citations and attribution. ![A flowchart showing the CKE workflow, from a provider](/static/images/cortex-knowledge-extensions/cke-workflow.png) ## CKE features Some of the key features of Cortex Knowledge Extensions include: - [](#label-cke-content-protection) - [](#label-cke-management) - [](#label-cke-trial-support) - [](#label-cke-monetization) Each of these features is described in more detail below. ### Content protection Providers can limit the percentage of indexed content in their CKE that can be returned to their consumers within a rolling 24-hour period. This is done by setting a threshold using the commands below. The threshold is not applied at the individual document level, but rather across the entire corpus of indexed content. Consumers will only be able to access the threshold percentage of the indexed content in the CKE. Refer to the [Listing manifest reference](/progaccess/listing-manifest-reference) for more information about the `cke_content_protection` field. ```sql -- Use CREATE to create a new CKE listing with content protection. -- Use ALTER to update an existing listing with content protection. -- This example creates a CKE listing targeting to two accounts. CREATE EXTERNAL LISTING cke_listing SHARE cke_share AS $$ title: "CKE Listing Title" description: "Cortex Knowledge Extension Listing Description" listing_terms: type: "STANDARD" auto_fulfillment: refresh_type: "SUB_DATABASE" refresh_schedule: "1440 MINUTE" targets: accounts: - "ORG1.ACCOUNT1" - "ORG2.ACCOUNT2" cke_content_protection: enable: true, threshold: 0.2 $$ -- DESCRIBE LISTING cke_listing -- See the manifest_yaml column for the cke_content_protection setting ``` When the threshold has been hit by a consumer, queries to the CKE are blocked from executing, and the consumer receives the following error: ```text You have reached the content protection threshold. Please try again later. ``` The consumer can re-query the data when the threshold refreshes. ### Management To see the number of queries that the CKE executed, sign in to %sf-web-interface-link%. In the navigation menu, select **Marketplace** %raa% **Provider Studio** %raa% **Home**. The **Analytics** section shows the number of queries executed. ### Trial support As a provider, you can offer customers a [limited trial](#label-trial-listing) of your CKE so that they can try your product before they commit to paying for it. ### Monetization Cortex Knowledge Extensions can be monetized using the on-platform [Snowflake Marketplace Monetization](#label-monetization-provider-onboarding) capability via [subscriptions](#label-listings-subscription-pricing-model) or through [off-platform](#label-listings-free-private-create) monetization. ## Region availability Cortex Knowledge Extensions are available in any region where [Cortex Search](#label-cortex-search-overview-regional-availability) is available. ## Key considerations When customers use your Cortex Knowledge Extension, be careful when disabling serving of the [Cortex Search Service](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview), as that will break customers' applications. For advanced tuning of a Cortex Knowledge Extension, refer to the [Cortex Search](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) documentation. ## Costs for CKE Providers: - Providers pay to host the Cortex Search Service in their account, including indexing, servicing, and replication to other regions. For more information about costs associated with Cortex Search Services, providers can refer to [](/user-guide/snowflake-cortex/cortex-search/cortex-search-costs). Consumers: - If the CKE isn't free, consumers pay the provider to access the CKE. - If the CKE leverages a Cortex Agent, consumers pay for the Cortex Agent. For more information, see [](#label-cortex-agent-cost-considerations) for Cortex Agents. ## Citations To ensure that the CKE is providing citations, when you configure the [Cortex Search Services](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview), make sure that you include a *SOURCE_URL* column that points to the source of the document in the indexed columns. This can be used by LLMs or %sf-intelligence% to provide clear attribution and hyperlinks back to the source material. ## Publishing the CKE to the Snowflake Marketplace After you create a Cortex Search Service that you want to publish to the Marketplace, [create a listing](/collaboration/provider-listings-creating-publishing). Make sure that you point to the Cortex Search Service object that you created as an object that you want to publish. ## Talking with the CKE You can use the following methods to ask the CKE questions. - Use the Cortex Search Playground: 1. In %sf-web-interface%, in the navigation menu, select **AI & ML** %raa% **Cortex Search**. 2. Select the CKE from the **Database/Schema** drop down menu. 3. Click on **Playground** in the upper-right corner. 4. Type in a search query and see the results - Use %sf-intelligence%: - Follow the steps outlined in [](/user-guide/snowflake-cortex/cortex-knowledge-extensions/tutorials/add-cke-to-snowflake-cowork-tutorial). - Use Cortex Agent API: - Use the Cortex Agent API, and specify the shared CKE in the [CREATE CORTEX SEARCH](/sql-reference/sql/create-cortex-search) parameter. Refer to the [Cortex Agent API](/user-guide/snowflake-cortex/cortex-agents) documentation for more information. ## Updating your CKE Keeping a CKE up-to-date is a common use case for providers that regularly introduce new or updated content. To ensure your Cortex Knowledge Extension is up-to-date do the following: 1. Ensure that the underlying table with content has been updated via some separate process of inserting new / updated documents into your Snowflake account. 2. Review the Cortex Search Service target lag. The Cortex Search Service is configured to refresh and to keep the data fresh up to a certain `target_lag`. Refer to the Cortex Search [Use SQL](#label-cortex-search-overview-example-sql) topic for more information about `target_lag`. 3. Run the following commands to ensure that the Cortex Search Service is indexing. ```sql -- Get the status of the search service DESCRIBE CORTEX SEARCH SERVICE cke_simple_cortex_search_service; -- If the indexing status is suspended, you can resume it with the following command ALTER CORTEX SEARCH SERVICE cke_simple_cortex_search_service RESUME INDEXING; ``` ## CKE and auto-fulfillment Consumers can only access a Cortex Knowledge Extension made available in their region. Providers can automatically replicate their Cortex Search Service to remote consumer regions by [enabling auto-fulfillment](/collaboration/provider-listings-auto-fulfillment) on their Cortex Knowledge Extension listing in Provider Studio. ## Limitations - [Usage-based](#label-listings-usage-pricing-model) billing with CKEs isn't supported. - CKEs are not supported in listings that have [Egress Cost Optimizer (ECO)](/collaboration/provider-listings-auto-fulfillment-eco) enabled. Providers should be aware of the cost implications for replication with listings that have a CKE. Adding a CKE to a listing that has ECO enabled will automatically turn off ECO. With ECO turned off, costs associated with the listing can increase. An email notification will also be sent to the provider indicating that ECO was turned off. Similarly, if a CKE is added to a listing that's part of a replication group, then ECO will be turned off for all listings within that replication group. An email notification will be sent to the provider indicating that the ECO was turned off. --- title: Cortex Playground source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-playground.md section: Snowflake Cortex (AI & ML) --- # Cortex Playground Available to all accounts. This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/aisql) - [](/sql-reference/functions/ai_complete) The Cortex Playground lets you compare text completions across the multiple large language models available in Cortex AI. You can test language model responses across prompts and model settings, and perform side-by-side comparisons of model outputs. With a few clicks, you can also connect the model to a Snowflake table to experiment directly on your data. The Cortex Playground is purpose-built to help you easily test how different language models perform for your use case before you decide which model to deploy into production. The Cortex Playground supports all of the models available for the COMPLETE function that are available in your account's region. For the complete list of models, see [Model availability](#label-cortex-llm-availability). ## Required privileges The Cortex Playground requires the CORTEX_USER database role that includes the privileges to call Snowflake Cortex LLM functions. For more information, see [](#label-cortex-llm--privileges). ## Get started with the Cortex Playground The Cortex Playground is accessible from the Snowflake AI & ML Studio. You can access the studio from %sf-web-interface% as follows: 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **AI & ML** %raa% **AI Studio**. The Cortex Playground appears among other the other Studio functions. ![Cortex playground box.](/static/images/cortex-llm/cortex-playground-studio.png) 3. To open the playground, select **Try**. ## Test your prompt with a language model Use the Cortex Playground to test prompts across different language models. 1. Select a warehouse. This warehouse is used to run the SQL command that calls the COMPLETE function. 2. Select a model from the dropdown menu at the top. The drop down menu includes only the models that are available in the region of the account being used. ![Cortex playground model list.](/static/images/cortex-llm/cortex-playground-models.png) 3. Enter your prompt in the prompt box and select `Enter`. 4. The model output appears above the prompt box. You can select **View Code** to see and copy the SQL command used to process your prompt. To try a different prompt or model, choose the desired model and enter a new prompt in the prompt box, then select `Enter`. ## Compare model outputs To compare the output of your prompts between two different models or two different settings of the same model, use the **Compare** feature. ### Compare two models 1. Select **Compare** in the top right corner. 2. Select different models for the two panels using the dropdown menu on each side. 3. Open the settings panel by selecting **Change settings** %sf-settings-menu-button% next to **Compare**. 4. Select the **Sync** toggle to use the same settings for the two models. 5. Enter your prompt and select `Enter`. The output from the models you selected appears on each side. ### Compare settings for one model 1. Select **Compare** in the top right corner. 2. Select the same model for the two panels. 3. Open the settings panel by selecting **Change settings** %sf-settings-menu-button% next to **Compare**. 4. Choose different settings for **temperature**, **top_p** or **max_tokens** for each tab to compare how the language model response changes with different model settings. For more details on these parameters, see [](/sql-reference/functions/ai_complete). 5. You can also check **Enable Cortex Guard** to implement safeguards that filter out potentially inappropriate or unsafe large language model (LLM) responses. For more details on Cortex Guard, see [](#label-cortex-llm-complete-cortex-guard). 6. Enter your prompt and select `Enter`. The output from the model for each set of settings appears on each side. ## Connect to Snowflake tables You can connect the model to a Snowflake table with textual data that you want to test with text completion. You can select only one column. The Cortex Playground returns at most 100 rows. 1. Select the **+ Connect your data** button in the prompt box. 2. Select your Snowflake data source from the drop down menu. 3. Select the column with the textual data you want to test. 4. Select a column to use as a filter. You can use this column to select a record from your data source. 5. Select **Done**. 6. Select a record from your data source using the **Select <filter column>** field in the prompt box. You can select a record by scrolling or by searching for a term in the text data. To search, enter a term in the search box. The following example shows a filter column named **ID**. In this example, you could search for a particular ID number or enter a string to match the text data. ![Cortex playground connect data.](/static/images/cortex-llm/cortex-playground-connect-data.png) 7. Enter a **System Prompt** and select `Enter` to see the model response. A system prompt provides instructions to the model on how to process the input text. For example, you might want the model to summarize the selected text or pull out keywords from it. ## Controlling settings You can adjust model settings to compare how the language model response changes when provided with different **temperature**, **top_p**, and **max_tokens** settings. To implement safeguards that filter out potentially inappropriate or unsafe responses, select **Enable Cortex Guard** in the settings panel. You can read more about how these settings potentially impact language model responses in the [](#label-cortex-complete-temperature-tokens) page. 1. Select **Change settings** %sf-settings-menu-button% to open the settings menu on the top right corner. 2. Check the box for the setting to adjust its value. ![Cortex playground model settings.](/static/images/cortex-llm/cortex-playground-settings.png) 3. Try out prompts with different settings. ## Exporting a SQL query To get a SQL query that includes the settings, such as temperature, that you've defined in the Cortex Playground, select **View Code** after any model response. The displayed code can be executed from a [worksheet](/user-guide/ui-snowsight-worksheets-gs) or [notebook](/user-guide/ui-snowsight/notebooks), or automated for continuous execution using [streams and tasks](/user-guide/data-pipelines-intro). You can also use this code with a [dynamic table](/user-guide/dynamic-tables/overview). Dynamic tables do not support incremental refresh with COMPLETE. The following images show examples of the **View SQL** dialog. ![Exporting SQL with data connected](/static/images/cortex-llm/cortex-playground-export-code-connected.png) Example 1: Exporting code when you connect the model to a Snowflake table ![Exporting code without data connected](/static/images/cortex-llm/cortex-playground-export-code.png) Example 2: Exporting code when you do not connect the model to a Snowflake table --- title: Cortex REST API source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-rest-api.md section: Snowflake Cortex (AI & ML) --- # Cortex REST API This feature is not available in the People's Republic of China. The Cortex REST API gives you access to leading frontier models from Anthropic, OpenAI, Meta, Mistral, and more through your preferred endpoint or SDK. All inference runs within the Snowflake perimeter, so your data remains secure and within your governance boundary. See below on how to get started. ## Choose your API Cortex REST API supports two industry-standard API specifications. Pick the one that best fits your stack:
Both APIs share the same authentication, model catalog, and rate limits. The only difference is the request/response format and which models each endpoint supports. For pricing, see the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf). ## Quickstart ### Prerequisites Before you begin, you need: 1. Your **Snowflake account URL** (e.g., `https://.snowflakecomputing.com`). 2. A **Snowflake Programmatic Access Token (PAT)** for authentication. See [](#label-pat-generate). 3. A **model name** to use in requests. See [](#label-cortex-complete-llm-model-availability) for available models. ### Chat Completions quickstart The Chat Completions API follows the OpenAI specification. You can use the OpenAI SDK directly. **Python:** ```python from openai import OpenAI client = OpenAI( api_key="", base_url="https://.snowflakecomputing.com/api/v2/cortex/v1" ) response = client.chat.completions.create( model="claude-sonnet-4-5", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "How does a snowflake get its unique pattern?"} ] ) print(response.choices[0].message.content) ``` **JavaScript/TypeScript:** ```javascript const client = new OpenAI({ apiKey: "", baseURL: "https://.snowflakecomputing.com/api/v2/cortex/v1" }); const response = await client.chat.completions.create({ model: "claude-sonnet-4-5", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "How does a snowflake get its unique pattern?" } ], }); console.log(response.choices[0].message.content); ``` **curl:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "model": "claude-sonnet-4-5", "messages": [ {"role": "user", "content": "How does a snowflake get its unique pattern?"} ] }' ``` In the preceding examples, replace the following: - ``: Your Snowflake account identifier. - ``: Your Snowflake Programmatic Access Token (PAT). - `model`: The model name. See [](#label-cortex-complete-llm-model-availability) for supported models. ### Messages API quickstart The Messages API follows the Anthropic specification and supports Claude models only. **Python:** The Anthropic SDK sends credentials via `x-api-key` by default, but Snowflake expects a `Bearer` token. Use an `httpx` client to set the correct authorization header. ```python PAT = "" http_client = httpx.Client( headers={"Authorization": f"Bearer {PAT}"}, ) client = anthropic.Anthropic( api_key="not-used", base_url="https://.snowflakecomputing.com/api/v2/cortex", http_client=http_client, default_headers={"Authorization": f"Bearer {PAT}"}, ) response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ {"role": "user", "content": "How does a snowflake get its unique pattern?"} ], ) print(response.content[0].text) ``` **JavaScript/TypeScript:** Like Python, override the default auth header with a `Bearer` token via `defaultHeaders`. ```javascript const PAT = ""; const client = new Anthropic({ apiKey: "not-used", baseURL: "https://.snowflakecomputing.com/api/v2/cortex", defaultHeaders: { "Authorization": `Bearer ${PAT}`, }, }); const response = await client.messages.create({ model: "claude-sonnet-4-5", max_tokens: 1024, messages: [ { role: "user", content: "How does a snowflake get its unique pattern?" } ], }); console.log(response.content[0].text); ``` **curl:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/messages" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "messages": [ {"role": "user", "content": "How does a snowflake get its unique pattern?"} ] }' ``` In the preceding examples, replace the following: - ``: Your Snowflake account identifier. - ``: Your Snowflake Programmatic Access Token (PAT). - `model`: The Claude model name. See [](#label-cortex-complete-llm-model-availability) for supported models. ## Setting up authentication To authenticate to the Cortex REST API, you can use the methods described in [](/developer-guide/snowflake-rest-api/authentication). Set the `Authorization` header to include your token (for example, a JSON web token (JWT), OAuth token, or [programmatic access token](/user-guide/programmatic-access-tokens)). Consider creating a dedicated user for Cortex REST API requests. ## Setting up authorization To send a REST API request, your default role must be granted *either* the SNOWFLAKE.CORTEX_USER database role *or* the SNOWFLAKE.CORTEX_REST_API_USER database role. SNOWFLAKE.CORTEX_USER provides access to all Covered AI features including the Cortex REST API, whereas SNOWFLAKE.CORTEX_REST_API_USER provides access only to the Cortex REST API. In most cases, users already have access because SNOWFLAKE.CORTEX_USER is granted to the PUBLIC role automatically, and all roles inherit PUBLIC. If your Snowflake administrator has revoked this grant, they must re-grant it: ```sql GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE my_role; GRANT ROLE my_role TO USER my_user; ``` REST API requests use the user's default role, so that role must have the necessary privileges. You can change a user's default role with [ALTER USER ... SET DEFAULT_ROLE](/sql-reference/sql/alter-user). ```sql ALTER USER my_user SET DEFAULT_ROLE=my_role ``` ### Limiting access using the Cortex REST API user role To provide selective access to the Cortex REST API for specific users, use the SNOWFLAKE.CORTEX_REST_API_USER database role. This role grants access to the Cortex REST API without granting access to other Cortex features such as Cortex AI functions, Cortex Agent, Cortex Analyst, Cortex Fine-tuning, or Cortex Search. CORTEX_REST_API_USER is not granted to the PUBLIC role by default. An account administrator must explicitly grant this role to roles that require access to the Cortex REST API. The SNOWFLAKE.CORTEX_REST_API_USER database role can't be granted directly to a user. For more information, see [](#label-using-snowflake-db-roles). If your user roles already have the CORTEX_USER role, you must revoke access to the CORTEX_USER role before the CORTEX_REST_API_USER role can take effect as a fine-grained control. ```sql REVOKE DATABASE ROLE SNOWFLAKE.CORTEX_USER FROM ROLE PUBLIC; ``` To provide access to the Cortex REST API, use the ACCOUNTADMIN role to do the following: 1. Grant the SNOWFLAKE.CORTEX_REST_API_USER database role to a custom role. 2. Assign this custom role to users. The following example creates the custom role `cortex_rest_api_role`, grants it the CORTEX_REST_API_USER database role, and assigns the role to `example_user`: ```sql USE ROLE ACCOUNTADMIN; CREATE ROLE cortex_rest_api_role; GRANT DATABASE ROLE SNOWFLAKE.CORTEX_REST_API_USER TO ROLE cortex_rest_api_role; GRANT ROLE cortex_rest_api_role TO USER example_user; ``` You can also grant access to the Cortex REST API through existing roles. For example, if you have an `api_consumer` role used by a group of users, you can grant access with a single GRANT statement: ```sql GRANT DATABASE ROLE SNOWFLAKE.CORTEX_REST_API_USER TO ROLE api_consumer; ``` ## Model availability The following tables show the models available in the Cortex REST API for each region: **Cross-region and Cross-cloud:**
**North America:**
**Europe:**
**Asia-Pacific:**
***** Indicates a preview function or model. Preview features are not suitable for production workloads. You can also use any [fine-tuned](/user-guide/snowflake-cortex/cortex-finetuning) model in any supported region. ## Features ### Streaming Both APIs support streaming responses using [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events). #### Chat Completions streaming **Python:** ```python from openai import OpenAI client = OpenAI( api_key="", base_url="https://.snowflakecomputing.com/api/v2/cortex/v1" ) response = client.chat.completions.create( model="claude-sonnet-4-5", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "How does a snowflake get its unique pattern?"} ], stream=True ) for chunk in response: print(chunk.choices[0].delta.content, end="", flush=True) ``` **JavaScript/TypeScript:** ```javascript const client = new OpenAI({ apiKey: "", baseURL: "https://.snowflakecomputing.com/api/v2/cortex/v1" }); const stream = await client.chat.completions.create({ model: "claude-sonnet-4-5", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "How does a snowflake get its unique pattern?" } ], stream: true, }); for await (const event of stream) { process.stdout.write(event.choices[0]?.delta?.content || ""); } ``` **curl:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "model": "claude-sonnet-4-5", "messages": [ {"role": "user", "content": "How does a snowflake get its unique pattern?"} ], "stream": true, "stream_options": { "include_usage": true } }' ``` #### Messages API streaming **Python:** ```python PAT = "" http_client = httpx.Client( headers={"Authorization": f"Bearer {PAT}"}, ) client = anthropic.Anthropic( api_key="not-used", base_url="https://.snowflakecomputing.com/api/v2/cortex", http_client=http_client, default_headers={"Authorization": f"Bearer {PAT}"}, ) with client.messages.stream( model="claude-sonnet-4-5", max_tokens=1024, messages=[ {"role": "user", "content": "How does a snowflake get its unique pattern?"} ], ) as stream: for text in stream.text_stream: print(text, end="", flush=True) ``` **JavaScript/TypeScript:** ```javascript const PAT = ""; const client = new Anthropic({ apiKey: "not-used", baseURL: "https://.snowflakecomputing.com/api/v2/cortex", defaultHeaders: { "Authorization": `Bearer ${PAT}`, }, }); const stream = client.messages.stream({ model: "claude-sonnet-4-5", max_tokens: 1024, messages: [ { role: "user", content: "How does a snowflake get its unique pattern?" } ], }); for await (const event of stream) { if (event.type === "content_block_delta" && event.delta.type === "text_delta") { process.stdout.write(event.delta.text); } } ``` **curl:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/messages" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "stream": true, "messages": [ {"role": "user", "content": "How does a snowflake get its unique pattern?"} ] }' ``` ### Tool calling Tool calling lets the model invoke external functions during a conversation. The flow works in steps: 1. You send a request with a list of available tools. 2. The model decides to call one or more tools and returns the tool name and arguments. 3. You execute the tool on your end. 4. You send the tool result back, and the model generates a final response. Tool calling is supported for OpenAI and Claude models. #### Chat Completions tool calling **Python:** ```python from openai import OpenAI client = OpenAI( api_key="", base_url="https://.snowflakecomputing.com/api/v2/cortex/v1" ) tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "required": ["location"] } } } ] messages = [ {"role": "user", "content": "What is the weather like in San Francisco?"} ] # Step 1: Send the request with tools response = client.chat.completions.create( model="claude-sonnet-4-5", messages=messages, tools=tools, ) # Step 2: The model responds with tool_calls message = response.choices[0].message if message.tool_calls: tool_call = message.tool_calls[0] # Step 3: Execute the tool (your implementation) result = json.dumps({"temperature": "69°F", "condition": "sunny"}) # Step 4: Send the tool result back messages.append(message) messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": result, }) final_response = client.chat.completions.create( model="claude-sonnet-4-5", messages=messages, tools=tools, ) print(final_response.choices[0].message.content) ``` **JavaScript/TypeScript:** ```javascript const client = new OpenAI({ apiKey: "", baseURL: "https://.snowflakecomputing.com/api/v2/cortex/v1" }); const tools = [ { type: "function", function: { name: "get_weather", description: "Get the current weather for a location", parameters: { type: "object", properties: { location: { type: "string", description: "The city and state, e.g. San Francisco, CA" } }, required: ["location"] } } } ]; const messages = [ { role: "user", content: "What is the weather like in San Francisco?" } ]; // Step 1: Send the request with tools const response = await client.chat.completions.create({ model: "claude-sonnet-4-5", messages, tools, }); // Step 2: The model responds with tool_calls const message = response.choices[0].message; if (message.tool_calls) { const toolCall = message.tool_calls[0]; // Step 3: Execute the tool (your implementation) const result = JSON.stringify({ temperature: "69°F", condition: "sunny" }); // Step 4: Send the tool result back messages.push(message); messages.push({ role: "tool", tool_call_id: toolCall.id, content: result, }); const finalResponse = await client.chat.completions.create({ model: "claude-sonnet-4-5", messages, tools, }); console.log(finalResponse.choices[0].message.content); } ``` **curl:** **Step 1 — Send the request with tools:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "model": "claude-sonnet-4-5", "messages": [ {"role": "user", "content": "What is the weather like in San Francisco?"} ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "required": ["location"] } } } ] }' ``` The model responds with a `tool_calls` array: ```json { "choices": [ { "message": { "role": "assistant", "tool_calls": [ { "id": "call_abc123", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\": \"San Francisco, CA\"}" } } ] }, "finish_reason": "tool_calls" } ] } ``` **Step 2 — Execute the tool and send the result back:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "model": "claude-sonnet-4-5", "messages": [ {"role": "user", "content": "What is the weather like in San Francisco?"}, { "role": "assistant", "tool_calls": [ { "id": "call_abc123", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\": \"San Francisco, CA\"}" } } ] }, { "role": "tool", "tool_call_id": "call_abc123", "content": "{\"temperature\": \"69°F\", \"condition\": \"sunny\"}" } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "required": ["location"] } } } ] }' ``` #### Messages API tool calling **Python:** ```python PAT = "" http_client = httpx.Client( headers={"Authorization": f"Bearer {PAT}"}, ) client = anthropic.Anthropic( api_key="not-used", base_url="https://.snowflakecomputing.com/api/v2/cortex", http_client=http_client, default_headers={"Authorization": f"Bearer {PAT}"}, ) tools = [ { "name": "get_weather", "description": "Get the current weather for a location", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "required": ["location"] } } ] messages = [ {"role": "user", "content": "What is the weather like in San Francisco?"} ] # Step 1: Send the request with tools response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=messages, tools=tools, ) # Step 2: The model responds with a tool_use block if response.stop_reason == "tool_use": tool_use = next(b for b in response.content if b.type == "tool_use") # Step 3: Execute the tool (your implementation) result = json.dumps({"temperature": "69°F", "condition": "sunny"}) # Step 4: Send the tool result back messages.append({"role": "assistant", "content": response.content}) messages.append({ "role": "user", "content": [ { "type": "tool_result", "tool_use_id": tool_use.id, "content": result, } ], }) final_response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=messages, tools=tools, ) print(final_response.content[0].text) ``` **JavaScript/TypeScript:** ```javascript const PAT = ""; const client = new Anthropic({ apiKey: "not-used", baseURL: "https://.snowflakecomputing.com/api/v2/cortex", defaultHeaders: { "Authorization": `Bearer ${PAT}`, }, }); const tools = [ { name: "get_weather", description: "Get the current weather for a location", input_schema: { type: "object", properties: { location: { type: "string", description: "The city and state, e.g. San Francisco, CA" } }, required: ["location"] } } ]; const messages = [ { role: "user", content: "What is the weather like in San Francisco?" } ]; // Step 1: Send the request with tools const response = await client.messages.create({ model: "claude-sonnet-4-5", max_tokens: 1024, messages, tools, }); // Step 2: The model responds with a tool_use block if (response.stop_reason === "tool_use") { const toolUse = response.content.find(b => b.type === "tool_use"); // Step 3: Execute the tool (your implementation) const result = JSON.stringify({ temperature: "69°F", condition: "sunny" }); // Step 4: Send the tool result back messages.push({ role: "assistant", content: response.content }); messages.push({ role: "user", content: [ { type: "tool_result", tool_use_id: toolUse.id, content: result, } ], }); const finalResponse = await client.messages.create({ model: "claude-sonnet-4-5", max_tokens: 1024, messages, tools, }); console.log(finalResponse.content[0].text); } ``` **curl:** **Step 1 — Send the request with tools:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/messages" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "messages": [ {"role": "user", "content": "What is the weather like in San Francisco?"} ], "tools": [ { "name": "get_weather", "description": "Get the current weather for a location", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "required": ["location"] } } ] }' ``` The model responds with a `tool_use` content block: ```json { "role": "assistant", "content": [ { "type": "text", "text": "I'll check the weather for you." }, { "type": "tool_use", "id": "toolu_abc123", "name": "get_weather", "input": {"location": "San Francisco, CA"} } ], "stop_reason": "tool_use" } ``` **Step 2 — Execute the tool and send the result back:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/messages" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "messages": [ {"role": "user", "content": "What is the weather like in San Francisco?"}, { "role": "assistant", "content": [ {"type": "text", "text": "I'\''ll check the weather for you."}, { "type": "tool_use", "id": "toolu_abc123", "name": "get_weather", "input": {"location": "San Francisco, CA"} } ] }, { "role": "user", "content": [ { "type": "tool_result", "tool_use_id": "toolu_abc123", "content": "{\"temperature\": \"69°F\", \"condition\": \"sunny\"}" } ] } ], "tools": [ { "name": "get_weather", "description": "Get the current weather for a location", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "required": ["location"] } } ] }' ``` ### Structured output You can request structured JSON output that conforms to a specific schema. Both the Chat Completions API and the Messages API support structured output. #### Chat Completions structured output Use the `response_format` field with a JSON schema to constrain the model's output. **Python:** ```python from openai import OpenAI client = OpenAI( api_key="", base_url="https://.snowflakecomputing.com/api/v2/cortex/v1" ) response = client.chat.completions.create( model="claude-sonnet-4-5", messages=[ {"role": "user", "content": "Create a dataset of 3 people with their names and ages."} ], response_format={ "type": "json_schema", "json_schema": { "name": "people_data", "schema": { "type": "object", "properties": { "people": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "number"} }, "required": ["name", "age"] } } }, "required": ["people"] } } } ) data = json.loads(response.choices[0].message.content) print(data) ``` **JavaScript/TypeScript:** ```javascript const client = new OpenAI({ apiKey: "", baseURL: "https://.snowflakecomputing.com/api/v2/cortex/v1" }); const response = await client.chat.completions.create({ model: "claude-sonnet-4-5", messages: [ { role: "user", content: "Create a dataset of 3 people with their names and ages." } ], response_format: { type: "json_schema", json_schema: { name: "people_data", schema: { type: "object", properties: { people: { type: "array", items: { type: "object", properties: { name: { type: "string" }, age: { type: "number" } }, required: ["name", "age"] } } }, required: ["people"] } } } }); const data = JSON.parse(response.choices[0].message.content); console.log(data); ``` **curl:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "model": "claude-sonnet-4-5", "messages": [ {"role": "user", "content": "Create a dataset of 3 people with their names and ages."} ], "response_format": { "type": "json_schema", "json_schema": { "name": "people_data", "schema": { "type": "object", "properties": { "people": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "number"} }, "required": ["name", "age"] } } }, "required": ["people"] } } } }' ``` Claude models support only `json_schema` as the response format type. OpenAI models support additional response format types as documented in the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create). #### Messages API structured output Use the `output_config` parameter with a JSON schema to constrain the model's output. The response contains valid JSON in a `text` content block that matches your schema. **Python:** ```python PAT = "" http_client = httpx.Client( headers={"Authorization": f"Bearer {PAT}"}, ) client = anthropic.Anthropic( api_key="not-used", base_url="https://.snowflakecomputing.com/api/v2/cortex", http_client=http_client, default_headers={"Authorization": f"Bearer {PAT}"}, ) response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ {"role": "user", "content": "Create a dataset of 3 people with their names and ages."} ], output_config={ "format": { "type": "json_schema", "schema": { "type": "object", "properties": { "people": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "number"} }, "required": ["name", "age"] } } }, "required": ["people"], "additionalProperties": False } } }, ) data = json.loads(response.content[0].text) print(data) ``` **JavaScript/TypeScript:** ```javascript const PAT = ""; const client = new Anthropic({ apiKey: "not-used", baseURL: "https://.snowflakecomputing.com/api/v2/cortex", defaultHeaders: { "Authorization": `Bearer ${PAT}`, }, }); const response = await client.messages.create({ model: "claude-sonnet-4-5", max_tokens: 1024, messages: [ { role: "user", content: "Create a dataset of 3 people with their names and ages." } ], output_config: { format: { type: "json_schema", schema: { type: "object", properties: { people: { type: "array", items: { type: "object", properties: { name: { type: "string" }, age: { type: "number" } }, required: ["name", "age"] } } }, required: ["people"], additionalProperties: false } } }, }); const data = JSON.parse(response.content[0].text); console.log(data); ``` **curl:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/messages" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "messages": [ {"role": "user", "content": "Create a dataset of 3 people with their names and ages."} ], "output_config": { "format": { "type": "json_schema", "schema": { "type": "object", "properties": { "people": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "number"} }, "required": ["name", "age"] } } }, "required": ["people"], "additionalProperties": false } } } }' ``` ### Image input You can include images in your requests for models that support vision. Images must be provided as base64-encoded strings. Images are limited to 20 per conversation with a 20 MiB max request size. Image input is supported for: - Claude models (`claude-sonnet-4-5` and newer) - OpenAI models (`openai-gpt-4.1` and newer) #### Chat Completions image input **Python:** ```python from openai import OpenAI client = OpenAI( api_key="", base_url="https://.snowflakecomputing.com/api/v2/cortex/v1" ) # Read and encode an image file with open("image.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8") response = client.chat.completions.create( model="claude-sonnet-4-5", messages=[ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": f"data:image/png;base64,{image_data}" } }, { "type": "text", "text": "What is in this image?" } ] } ] ) print(response.choices[0].message.content) ``` **JavaScript/TypeScript:** ```javascript const client = new OpenAI({ apiKey: "", baseURL: "https://.snowflakecomputing.com/api/v2/cortex/v1" }); // Read and encode an image file const imageData = fs.readFileSync("image.png").toString("base64"); const response = await client.chat.completions.create({ model: "claude-sonnet-4-5", messages: [ { role: "user", content: [ { type: "image_url", image_url: { url: `data:image/png;base64,${imageData}` } }, { type: "text", text: "What is in this image?" } ] } ], }); console.log(response.choices[0].message.content); ``` **curl:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "model": "claude-sonnet-4-5", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "data:image/png;base64," } }, { "type": "text", "text": "What is in this image?" } ] } ] }' ``` #### Messages API image input The Messages API uses a different image format — a `source` block with `type`, `media_type`, and `data` fields instead of a data URL. **Python:** ```python PAT = "" http_client = httpx.Client( headers={"Authorization": f"Bearer {PAT}"}, ) client = anthropic.Anthropic( api_key="not-used", base_url="https://.snowflakecomputing.com/api/v2/cortex", http_client=http_client, default_headers={"Authorization": f"Bearer {PAT}"}, ) # Read and encode an image file with open("image.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8") response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": image_data } }, { "type": "text", "text": "What is in this image?" } ] } ], ) print(response.content[0].text) ``` **JavaScript/TypeScript:** ```javascript const PAT = ""; const client = new Anthropic({ apiKey: "not-used", baseURL: "https://.snowflakecomputing.com/api/v2/cortex", defaultHeaders: { "Authorization": `Bearer ${PAT}`, }, }); // Read and encode an image file const imageData = fs.readFileSync("image.png").toString("base64"); const response = await client.messages.create({ model: "claude-sonnet-4-5", max_tokens: 1024, messages: [ { role: "user", content: [ { type: "image", source: { type: "base64", media_type: "image/png", data: imageData } }, { type: "text", text: "What is in this image?" } ] } ], }); console.log(response.content[0].text); ``` **curl:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/messages" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "messages": [ { "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": "" } }, { "type": "text", "text": "What is in this image?" } ] } ] }' ``` ### Prompt caching Prompt caching lets you reuse previously processed context (such as large system prompts, documents, or conversation history) across requests, reducing latency and cost. - **OpenAI models**: Caching is **implicit**. Prompts with 1,024+ tokens are automatically cached — no request changes needed. - **Claude models**: Caching is **explicit**. Add `cache_control` breakpoints to content blocks you want cached. Only the `ephemeral` cache type is supported, with a **5-minute TTL**. A maximum of 4 cache breakpoints per request. #### Chat Completions prompt caching For Claude models via Chat Completions, add `cache_control` to content blocks. OpenAI models are cached automatically and do not require this field. **Python:** ```python from openai import OpenAI client = OpenAI( api_key="", base_url="https://.snowflakecomputing.com/api/v2/cortex/v1" ) response = client.chat.completions.create( model="claude-sonnet-4-5", messages=[ { "role": "system", "content": [ { "type": "text", "text": "", "cache_control": {"type": "ephemeral"} } ] }, {"role": "user", "content": "Summarize the key points."} ] ) print(response.choices[0].message.content) ``` **JavaScript/TypeScript:** ```javascript const client = new OpenAI({ apiKey: "", baseURL: "https://.snowflakecomputing.com/api/v2/cortex/v1" }); const response = await client.chat.completions.create({ model: "claude-sonnet-4-5", messages: [ { role: "system", content: [ { type: "text", text: "", cache_control: { type: "ephemeral" } } ] }, { role: "user", content: "Summarize the key points." } ], }); console.log(response.choices[0].message.content); ``` **curl:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "model": "claude-sonnet-4-5", "messages": [ { "role": "system", "content": [ { "type": "text", "text": "", "cache_control": {"type": "ephemeral"} } ] }, {"role": "user", "content": "Summarize the key points."} ] }' ``` #### Messages API prompt caching Use `cache_control` on system or user content blocks. Only the `ephemeral` cache type is supported, with a 5-minute TTL. A maximum of 4 cache breakpoints can be set per request. **Python:** ```python PAT = "" http_client = httpx.Client( headers={"Authorization": f"Bearer {PAT}"}, ) client = anthropic.Anthropic( api_key="not-used", base_url="https://.snowflakecomputing.com/api/v2/cortex", http_client=http_client, default_headers={"Authorization": f"Bearer {PAT}"}, ) response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, system=[ { "type": "text", "text": "", "cache_control": {"type": "ephemeral"} } ], messages=[ {"role": "user", "content": "Summarize the key points."} ], ) print(response.content[0].text) ``` **JavaScript/TypeScript:** ```javascript const PAT = ""; const client = new Anthropic({ apiKey: "not-used", baseURL: "https://.snowflakecomputing.com/api/v2/cortex", defaultHeaders: { "Authorization": `Bearer ${PAT}`, }, }); const response = await client.messages.create({ model: "claude-sonnet-4-5", max_tokens: 1024, system: [ { type: "text", text: "", cache_control: { type: "ephemeral" } } ], messages: [ { role: "user", content: "Summarize the key points." } ], }); console.log(response.content[0].text); ``` **curl:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/messages" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "system": [ { "type": "text", "text": "", "cache_control": {"type": "ephemeral"} } ], "messages": [ {"role": "user", "content": "Summarize the key points."} ] }' ``` Anthropic prompt caching uses a **5-minute** TTL. Cached content not accessed within the TTL window is evicted. OpenAI prompt caching is implicit and managed automatically — no `cache_control` fields needed. ### Thinking and reasoning #### Chat Completions reasoning For Claude models, use the `reasoning` object. For OpenAI reasoning models, use the `reasoning_effort` field (values: `none`, `minimal`, `low`, `medium`, `high`). **Python:** ```python from openai import OpenAI client = OpenAI( api_key="", base_url="https://.snowflakecomputing.com/api/v2/cortex/v1" ) # Claude models — use the reasoning object response = client.chat.completions.create( model="claude-sonnet-4-5", temperature=1, messages=[ {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"} ], extra_body={ "reasoning": {"effort": "high"} } ) print(response.choices[0].message.content) ``` **JavaScript/TypeScript:** ```javascript const client = new OpenAI({ apiKey: "", baseURL: "https://.snowflakecomputing.com/api/v2/cortex/v1" }); // Claude models — use the reasoning object const response = await client.chat.completions.create({ model: "claude-sonnet-4-5", temperature: 1, messages: [ { role: "user", content: "Are there an infinite number of prime numbers such that n mod 4 == 3?" } ], reasoning: { effort: "high" }, }); console.log(response.choices[0].message.content); ``` **curl:** ```bash # Claude models — use the reasoning object curl "https://.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "model": "claude-sonnet-4-5", "temperature": 1, "messages": [ {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"} ], "reasoning": { "effort": "high" } }' ``` ```bash # OpenAI reasoning models — use reasoning_effort curl "https://.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "model": "openai-gpt-5", "messages": [ {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"} ], "reasoning_effort": "high" }' ``` #### Messages API thinking Some Claude models support **adaptive thinking**, where the model adjusts how much reasoning it applies based on task complexity. The following models support adaptive thinking: - `claude-opus-4-6` and newer - `claude-sonnet-4-6` For the Messages API, use the `thinking` parameter with `type: "adaptive"` to enable adaptive thinking. The `output_config.effort` parameter provides some high-level control over the thinking depth, and accepts the following values:
The following examples demonstrate how to make a Messages API call with adaptive thinking enabled: **Python:** ```python PAT = "" http_client = httpx.Client( headers={"Authorization": f"Bearer {PAT}"}, ) client = anthropic.Anthropic( api_key="not-used", base_url="https://.snowflakecomputing.com/api/v2/cortex", http_client=http_client, default_headers={"Authorization": f"Bearer {PAT}"}, ) response = client.messages.create( model="claude-opus-4-6", max_tokens=16384, thinking={ "type": "adaptive" }, messages=[ {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"} ], ) # The response includes thinking blocks followed by text for block in response.content: if block.type == "thinking": print(f"Thinking: {block.thinking[:100]}...") elif block.type == "text": print(f"Answer: {block.text}") ``` **JavaScript/TypeScript:** ```javascript const PAT = ""; const client = new Anthropic({ apiKey: "not-used", baseURL: "https://.snowflakecomputing.com/api/v2/cortex", defaultHeaders: { "Authorization": `Bearer ${PAT}`, }, }); const response = await client.messages.create({ model: "claude-opus-4-6", max_tokens: 16384, thinking: { type: "adaptive" }, messages: [ { role: "user", content: "Are there an infinite number of prime numbers such that n mod 4 == 3?" } ], }); // The response includes thinking blocks followed by text for (const block of response.content) { if (block.type === "thinking") { console.log(`Thinking: ${block.thinking.slice(0, 100)}...`); } else if (block.type === "text") { console.log(`Answer: ${block.text}`); } } ``` **curl:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/messages" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-opus-4-6", "max_tokens": 16384, "thinking": { "type": "adaptive" }, "messages": [ {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"} ] }' ``` The response includes thinking blocks with summarized thinking and thinking signatures. Pass these blocks back in multi-turn conversations to maintain reasoning context: ```json { "role": "assistant", "content": [ {"type": "thinking", "thinking": "", "signature": ""}, {"type": "text", "text": "Yes, there are infinitely many primes p where p ≡ 3 (mod 4)..."} ] } ``` For a full description of the Messages API support for Adaptive Thinking, see [Claude API Docs – Adaptive thinking](https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking). ### Beta features (Messages API)
%logo-snowflake-black% [Preview Feature](/release-notes/preview-features) — Open
Available to all accounts.
The Messages API supports Anthropic beta features via the `anthropic-beta` header. Pass one or more beta header values as a comma-separated string. **Supported beta headers**
The following example demonstrates using tool examples with `claude-sonnet-4-6`: **Python:** ```python PAT = "" http_client = httpx.Client( headers={"Authorization": f"Bearer {PAT}"}, ) client = anthropic.Anthropic( api_key="not-used", base_url="https://.snowflakecomputing.com/api/v2/cortex", http_client=http_client, default_headers={ "Authorization": f"Bearer {PAT}", }, ) response = client.beta.messages.create( model="claude-sonnet-4-6", max_tokens=8192, betas=["tool-examples-2025-10-29"], tools=[ { "name": "get_weather", "description": "Get the current weather for a location", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "required": ["location"] }, "examples": [ { "input": {"location": "San Francisco, CA"}, "output": {"temperature": "65°F", "condition": "sunny"} } ] } ], messages=[ {"role": "user", "content": "What's the weather in New York?"} ], ) print(f"Stop reason: {response.stop_reason}") if response.stop_reason == "tool_use": tool_use = next(b for b in response.content if b.type == "tool_use") print(f"Tool called: {tool_use.name}") print(f"Arguments: {tool_use.input}") ``` **JavaScript/TypeScript:** ```javascript const PAT = ""; const client = new Anthropic({ apiKey: "not-used", baseURL: "https://.snowflakecomputing.com/api/v2/cortex", defaultHeaders: { "Authorization": `Bearer ${PAT}`, }, }); const response = await client.beta.messages.create({ model: "claude-sonnet-4-6", max_tokens: 8192, betas: ["tool-examples-2025-10-29"], tools: [ { name: "get_weather", description: "Get the current weather for a location", input_schema: { type: "object", properties: { location: { type: "string", description: "The city and state, e.g. San Francisco, CA" } }, required: ["location"] }, examples: [ { input: { location: "San Francisco, CA" }, output: { temperature: "65°F", condition: "sunny" } } ] } ], messages: [ { role: "user", content: "What's the weather in New York?" } ], }); console.log(`Stop reason: ${response.stop_reason}`); if (response.stop_reason === "tool_use") { const toolUse = response.content.find(b => b.type === "tool_use"); console.log(`Tool called: ${toolUse.name}`); console.log(`Arguments: ${JSON.stringify(toolUse.input)}`); } ``` **curl:** ```bash curl "https://.snowflakecomputing.com/api/v2/cortex/v1/messages" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -H "anthropic-version: 2023-06-01" \ -H "anthropic-beta: tool-examples-2025-10-29" \ -d '{ "model": "claude-sonnet-4-6", "max_tokens": 8192, "tools": [ { "name": "get_weather", "description": "Get the current weather for a location", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "required": ["location"] }, "examples": [ { "input": {"location": "San Francisco, CA"}, "output": {"temperature": "65°F", "condition": "sunny"} } ] } ], "messages": [ {"role": "user", "content": "What'\''s the weather in New York?"} ] }' ``` You can combine multiple beta features by passing a comma-separated string: ```bash -H "anthropic-beta: tool-examples-2025-10-29,tool-search-tool-2025-10-19" ``` ## Chat Completions API reference ### POST /api/v2/cortex/v1/chat/completions Generates a chat completion using the specified model. The request and response format follows the [OpenAI Chat Completions API specification](https://platform.openai.com/docs/api-reference/chat/create). ```text POST https://.snowflakecomputing.com/api/v2/cortex/v1/chat/completions ``` #### Required headers
Authorization: Bearer token
Authorization for the request. token is a JSON web token (JWT), OAuth token, or [programmatic access token](/user-guide/programmatic-access-tokens). For details, see [](/developer-guide/snowflake-rest-api/authentication).
`Content-Type: application/json`
Specifies that the body of the request is in JSON format.
#### Optional headers
X-Snowflake-Authorization-Token-Type: type
Defines the type of authorization token. If you omit the `X-Snowflake-Authorization-Token-Type` header, Snowflake determines the token type by examining the token. Even though this header is optional, you can choose to specify this header. You can set the header to one of the following values: - `KEYPAIR_JWT` (for key-pair authentication) - `OAUTH` (for OAuth) - `PROGRAMMATIC_ACCESS_TOKEN` (for [programmatic access tokens](/user-guide/programmatic-access-tokens))
`Accept: application/json, text/event-stream`
Specifies that the response will either contain JSON (error case) or server-sent events.
#### Required JSON fields | Field | Type | Description | | ---------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `model` | string | The model to use (see [](#label-cortex-complete-llm-model-availability)). You may also use the fully-qualified name of any [fine-tuned](/user-guide/snowflake-cortex/cortex-finetuning) model in the format database.schema.model. | | `messages` | array | An array of message objects representing the conversation. Each message must have a `role` (`system`, `user`, `assistant`, or `tool`) and `content` (string or array of content parts). | #### Commonly used optional JSON fields | Field | Type | Default | Description | | ----------------------- | ---------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | | `max_completion_tokens` | integer | 4096 | Maximum tokens in the response. Theoretical maximum is 131,072; each model has its own output limit. | | `temperature` | number | Varies by model | Controls randomness. Values from 0 to 2. | | `top_p` | number | 1.0 | Controls diversity via nucleus sampling. | | `stream` | boolean | false | Whether to stream back partial progress as server-sent events. | | `tools` | array | null | A list of tools the model may call. Each tool must have `type: "function"` and a `function` object with `name`, `description`, and `parameters`. | | `tool_choice` | string or object | `"auto"` | Controls how the model selects tools. Options: `"auto"`, `"required"`, `"none"`, or an object specifying a particular function. | | `response_format` | object | null | Constrains the output format. Use `{"type": "json_schema", "json_schema": {...}}` for structured output. | | `reasoning_effort` | string | null | For OpenAI reasoning models. Values: `none`, `"minimal"`, `"low"`, `"medium"`, `"high"`. | | `reasoning` | object | null | For Claude models. Set `reasoning.effort` or `reasoning.max_tokens` to enable thinking. | See the [detailed compatibility chart](#label-cortex-openai-sdk-compatibility) for the full list of supported fields per model family. #### Status codes
200 `OK`
Request completed successfully.
400 `invalid options object`
The optional arguments have invalid values.
400 unknown model model_name
The specified model does not exist.
400 `schema validation failed`
The response schema structure is incorrect.
400 max tokens of count exceeded
The request exceeded the maximum number of tokens supported by the model.
400 `all requests were throttled by remote service`
The request has been throttled. Try again later.
402 `budget exceeded`
The model consumption budget was exceeded.
403 `Not Authorized`
Account not enabled for REST API, or the default role for the calling user does not have the `snowflake.cortex_user` database role.
429 `too many requests`
The usage quota has been exceeded. Try again later.
503 `inference timed out`
The request took too long.
#### Limitations - If unset, `max_completion_tokens` defaults to 4096. Each model has its own output token limit. - Tool calling is supported for OpenAI and Claude models only. - Audio is not supported. - Image understanding is supported for OpenAI and Claude models only. Images are limited to 20 per conversation with a 20 MiB max request size. - Only Claude models support ephemeral cache control points for prompt caching. OpenAI models support implicit caching. - Only Claude models support returning reasoning details in the response. - The `temperature` field is ignored for Claude Opus 4.7 since the model no longer supports temperature. - `max_tokens` is deprecated. Use `max_completion_tokens` instead. - Error messages are generated by Snowflake, not by the model provider. #### Detailed compatibility chart The following tables summarize which request and response fields are supported when using the Chat Completions API with different Snowflake-hosted model families. **Request fields**
**Response fields**
**Request headers**
**Response headers**
#### Learn more For additional usage examples, see the [OpenAI Chat Completions API reference](https://platform.openai.com/docs/guides/completions/) or the [OpenAI Cookbook](https://cookbook.openai.com/). In addition to providing compatibility with the Chat Completions API, Snowflake supports OpenRouter-compatible features for Claude models. These features are exposed as extra fields on the request: 1. For prompt caching, use the `cache_control` field. See the [OpenRouter prompt caching documentation](https://openrouter.ai/docs/features/prompt-caching). 2. For reasoning tokens, use the `reasoning` field. See the [OpenRouter reasoning documentation](https://openrouter.ai/docs/use-cases/reasoning-tokens). ## Messages API reference ### POST /api/v2/cortex/v1/messages Generates a response using a Claude model. The request and response format follows the [Anthropic Messages API specification](https://docs.anthropic.com/en/api/messages). ```text POST https://.snowflakecomputing.com/api/v2/cortex/v1/messages ``` The Messages API supports **Claude models only**. For other models, use the Chat Completions API. #### Required headers
Authorization: Bearer token
Authorization for the request. token is a JSON web token (JWT), OAuth token, or [programmatic access token](/user-guide/programmatic-access-tokens). For details, see [](/developer-guide/snowflake-rest-api/authentication).
`Content-Type: application/json`
Specifies that the body of the request is in JSON format.
`anthropic-version: 2023-06-01`
Required Anthropic API version header.
#### Optional headers
X-Snowflake-Authorization-Token-Type: type
Defines the type of authorization token. If you omit the `X-Snowflake-Authorization-Token-Type` header, Snowflake determines the token type by examining the token. Even though this header is optional, you can choose to specify this header. You can set the header to one of the following values: - `KEYPAIR_JWT` (for key-pair authentication) - `OAUTH` (for OAuth) - `PROGRAMMATIC_ACCESS_TOKEN` (for [programmatic access tokens](/user-guide/programmatic-access-tokens))
anthropic-beta: feature
Enables beta features. Only Bedrock-compatible beta headers are supported.
#### Required JSON fields | Field | Type | Description | | ------------ | ------- | --------------------------------------------------------------------------------------------------------------------------------- | | `model` | string | The Claude model to use (see [](#label-cortex-complete-llm-model-availability)). | | `max_tokens` | integer | The maximum number of tokens to generate. | | `messages` | array | An array of message objects. Each message has a `role` (`user` or `assistant`) and `content` (string or array of content blocks). | #### Supported features The Messages API supports the standard Anthropic Messages API feature set for Claude models, including: - Text generation and multi-turn conversations - Streaming (`"stream": true`) - System messages (via top-level `system` field) - Tool calling (Anthropic format with `name`, `description`, `input_schema`) - Structured output (`output_config` with `json_schema`) - Image input (base64 source blocks) - Prompt caching (`cache_control` on content blocks) - Adaptive thinking (`thinking` parameter with `type: "adaptive"`) and extended thinking (`budget_tokens`) For full request and response schema details, see the [Anthropic Messages API documentation](https://docs.anthropic.com/en/api/messages). #### Limitations - **Claude models only.** OpenAI, Llama, Mistral, and other models are not available through this endpoint. - **No flex processing or priority tier.** The `service_tier` field is not supported. - **Bedrock beta headers only.** Only Bedrock-compatible `anthropic-beta` header values are supported. - Error messages are generated by Snowflake, not by Anthropic. #### Status codes
200 `OK`
Request completed successfully.
400 `invalid_request_error`
The request body is malformed or contains invalid values.
400 unknown model model_name
The specified model does not exist or is not a Claude model.
402 `budget exceeded`
The model consumption budget was exceeded.
403 `Not Authorized`
Account not enabled for REST API, or the default role does not have the `snowflake.cortex_user` database role.
429 `too many requests`
The usage quota has been exceeded. Try again later.
503 `inference timed out`
The request took too long.
## Rate limits To ensure high performance for all Snowflake customers, Cortex REST API requests are subject to rate limits. Requests exceeding the limits may receive an HTTP 429 response. Snowflake may occasionally adjust these limits. The default limits in the following tables are applied per account and independently for each model. Ensure your application handles 429 responses gracefully by retrying with [exponential backoff](https://platform.openai.com/docs/guides/rate-limits#retrying-with-exponential-backoff). If you need to increase the limits, contact Snowflake Support. **Cortex REST API rate limits** | Model | Tokens Processed per Minute (TPM) | Requests per Minute (RPM) | Max output (tokens) | | ------------------- | --------------------------------- | ------------------------- | ------------------- | | `claude-4-sonnet` | 2,000,000 | 1,200 | 16,384 | | `claude-haiku-4-5` | 5,000,000 | 10,000 | 16,384 | | `claude-opus-4-5` | 2,000,000 | 10,000 | 16,384 | | `claude-opus-4-6` | 3,000,000 | 10,000 | 16,384 | | `claude-opus-4-7` | 3,000,000 | 10,000 | 16,384 | | `claude-sonnet-4-5` | 5,000,000 | 10,000 | 16,384 | | `claude-sonnet-4-6` | 6,000,000 | 10,000 | 16,384 | | `deepseek-r1` | 100,000 | 100 | 16,384 | | `llama3.1-8b` | 800,000 | 800 | 16,384 | | `llama3.1-70b` | 400,000 | 400 | 16,384 | | `llama3.1-405b` | 200,000 | 200 | 16,384 | | `mistral-7b` | 400,000 | 400 | 16,384 | | `mistral-large2` | 600,000 | 200 | 16,384 | | `openai-gpt-4.1` | 2,000,000 | 600 | 16,384 | | `openai-gpt-5` | 600,000 | 600 | 16,384 | | `openai-gpt-5-chat` | 1,000,000 | 1,000 | 16,384 | | `openai-gpt-5-mini` | 2,000,000 | 2,000 | 16,384 | | `openai-gpt-5-nano` | 10,000,000 | 10,000 | 16,384 | | `openai-gpt-5.1` | 600,000 | 3,000 | 16,384 | | `openai-gpt-5.2` | 1,200,000 | 3,000 | 16,384 | ### Increase rate limits with cross-region inference If you've opted into Cross Cloud, AWS Global, or Azure Global cross-region inference (see [cross-region inference](/user-guide/snowflake-cortex/cross-region-inference)) in your Snowflake account, the rate limits are higher for the following models: **Cortex REST API rate limits with cross-region inference** | Model | Tokens Processed per Minute (TPM) | Requests per Minute (RPM) | Max output (tokens) | | ------------------- | --------------------------------- | ------------------------- | ------------------- | | `claude-haiku-4-5` | 5,000,000 | 10,000 | 16,384 | | `claude-opus-4-5` | 2,000,000 | 10,000 | 16,384 | | `claude-opus-4-6` | 3,000,000 | 10,000 | 16,384 | | `claude-opus-4-7` | 3,000,000 | 10,000 | 16,384 | | `claude-sonnet-4-5` | 5,000,000 | 10,000 | 16,384 | | `claude-sonnet-4-6` | 6,000,000 | 10,000 | 16,384 | | `openai-gpt-4.1` | 1,000,000 | 1,000 | 16,384 | | `openai-gpt-5` | 1,000,000 | 10,000 | 16,384 | | `openai-gpt-5.1` | 1,000,000 | 10,000 | 16,384 | | `openai-gpt-5.2` | 1,000,000 | 10,000 | 16,384 | ### View your rate limits To see the rate limits that apply to your account, query the [CORTEX_REST_API_RATE_LIMIT_POLICIES](/sql-reference/account-usage/cortex_rest_api_rate_limit_policies) Account Usage view. The view returns the RPM and TPM limits for each model. ```sql SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_REST_API_RATE_LIMIT_POLICIES; ``` ### Troubleshooting rate limit events Offending either the TPM or RPM limits will result in a 429 response code. If your REST API usage is below the request per minute rate limit but still received a 429 response code, double check the token usage rate. Cortex REST API implements rate limits using the [Sliding Window Counter](https://blog.cloudflare.com/counting-things-a-lot-of-different-things/#sliding-windows-to-the-rescue) pattern. The counters are stored in a highly-available Redis cluster only accessible by Snowflake Cortex within Snowflake's private network. The sliding-window counter assumes that client traffic to the API in the previous time window is uniformly distributed. When traffic is spiky, this assumption could overestimate the rate of requests, but recovers quickly given the window is short. Please contact Snowflake Support if you are subject to the overestimation and want to increase the limits. ## Known issues ### Session token expiration We recommended authenticating with one of the three methods defined in [](/developer-guide/snowflake-rest-api/authentication). However, if you choose to authenticate with a Snowflake session token, you must handle token refresh to ensure uninterrupted API access. Session tokens expire periodically. If a request is executed with an expired session token, the REST API returns a `200 OK` response that includes error code `390112`. When this occurs, the operation is not performed. To handle this behavior, your application should: 1. Check each API response for error code `390112`, even when the HTTP status code is `200 OK`. 2. When error code `390112` is detected, refresh the session token and retry the request. This behavior only affects applications using Snowflake session tokens. If you authenticate using [key pair authentication](#label-sfrest-api-authenticating-key-pair), [OAuth](#label-sfrest-authenticating-oauth), or [programmatic access tokens (PATs)](#label-sfrest-authenticating-pat), you do not need to implement this error handling. ## Cost considerations Snowflake Cortex REST API requests incur compute costs based on the number of tokens processed. Refer to the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf) for each model's cost in dollars per million tokens. A token is the smallest unit of text processed by Snowflake Cortex LLM functions, approximately equal to four characters of text. The equivalence of raw input or output text to tokens can vary by model. Both input and output tokens incur compute cost. If you use the API to provide a conversational or chat user experience, all previous prompts and responses are processed to generate each new response, with corresponding costs. ## Monitoring usage Use the following views in the `SNOWFLAKE.ACCOUNT_USAGE` schema to monitor Cortex REST API consumption, token usage, and rate limit utilization: - [CORTEX_REST_API_USAGE_HISTORY](/sql-reference/account-usage/cortex_rest_api_usage_history): Request-level usage including model, tokens, user, region, and timestamps. - [CORTEX_REST_API_RATE_LIMIT_POLICIES](/sql-reference/account-usage/cortex_rest_api_rate_limit_policies): Configured RPM and TPM limits per model. For a basic query, see [View your rate limits](#label-cortex-rest-api-view-rate-limits). ### Required privileges To query ACCOUNT_USAGE views, your role needs imported privileges on the SNOWFLAKE database. If you encounter a permissions issue, run the following: ```sql USE ROLE ACCOUNTADMIN; GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE ; ``` ### Usage examples #### Total requests and tokens (last 7 days) ```sql SELECT COUNT(*) AS total_requests, COUNT(DISTINCT MODEL_NAME) AS models_used, SUM(TOKENS) AS total_tokens, COUNT(DISTINCT USER_ID) AS unique_users FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_REST_API_USAGE_HISTORY WHERE START_TIME >= DATEADD('day', -7, CURRENT_TIMESTAMP()); ``` #### Daily request and token volume ```sql SELECT START_TIME::DATE AS day, MODEL_NAME, COUNT(*) AS requests, SUM(TOKENS) AS total_tokens, SUM(TOKENS_GRANULAR:"input"::INT) AS input_tokens, SUM(TOKENS_GRANULAR:"output"::INT) AS output_tokens FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_REST_API_USAGE_HISTORY WHERE START_TIME >= DATEADD('day', -30, CURRENT_TIMESTAMP()) GROUP BY 1, 2 ORDER BY 1, 2; ``` #### Peak RPM and TPM per model ```sql WITH per_minute AS ( SELECT MODEL_NAME, DATE_TRUNC('minute', START_TIME) AS minute, COUNT(*) AS rpm, SUM(TOKENS) AS tpm FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_REST_API_USAGE_HISTORY WHERE START_TIME >= DATEADD('day', -7, CURRENT_TIMESTAMP()) GROUP BY 1, 2 ) SELECT MODEL_NAME, MAX(rpm) AS peak_rpm, MAX(tpm) AS peak_tpm, APPROX_PERCENTILE(rpm, 0.5) AS p50_rpm, APPROX_PERCENTILE(rpm, 0.9) AS p90_rpm, APPROX_PERCENTILE(rpm, 0.99) AS p99_rpm FROM per_minute GROUP BY 1 ORDER BY peak_rpm DESC; ``` #### Rate limit utilization (last 24 hours) ```sql WITH per_minute AS ( SELECT MODEL_NAME, DATE_TRUNC('minute', START_TIME) AS minute, COUNT(*) AS rpm, SUM(TOKENS) AS tpm FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_REST_API_USAGE_HISTORY WHERE START_TIME >= DATEADD('hour', -24, CURRENT_TIMESTAMP()) GROUP BY 1, 2 ) SELECT q.MODEL_NAME, q.RPM AS limit_rpm, q.TPM AS limit_tpm, COALESCE(MAX(p.rpm), 0) AS peak_rpm_24h, COALESCE(MAX(p.tpm), 0) AS peak_tpm_24h, ROUND(COALESCE(MAX(p.rpm), 0) / NULLIF(q.RPM, 0) * 100, 1) AS pct_rpm_used, ROUND(COALESCE(MAX(p.tpm), 0) / NULLIF(q.TPM, 0) * 100, 1) AS pct_tpm_used FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_REST_API_RATE_LIMIT_POLICIES q LEFT JOIN per_minute p ON q.MODEL_NAME = p.MODEL_NAME GROUP BY q.MODEL_NAME, q.RPM, q.TPM ORDER BY pct_rpm_used DESC; ``` #### Usage by user ```sql SELECT u.NAME AS user_name, r.MODEL_NAME, COUNT(*) AS requests, SUM(r.TOKENS) AS total_tokens, SUM(r.TOKENS_GRANULAR:"input"::INT) AS input_tokens, SUM(r.TOKENS_GRANULAR:"output"::INT) AS output_tokens FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_REST_API_USAGE_HISTORY r JOIN SNOWFLAKE.ACCOUNT_USAGE.USERS u ON r.USER_ID = u.USER_ID WHERE r.START_TIME >= DATEADD('day', -30, CURRENT_TIMESTAMP()) GROUP BY 1, 2 ORDER BY total_tokens DESC; ``` #### Usage by inference region ```sql SELECT INFERENCE_REGION, COUNT(*) AS requests, SUM(TOKENS) AS total_tokens FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_REST_API_USAGE_HISTORY WHERE START_TIME >= DATEADD('day', -7, CURRENT_TIMESTAMP()) GROUP BY 1 ORDER BY requests DESC; ``` ### Exporting historical data You can export usage data to a stage for long-term retention or external analysis: ```sql COPY INTO @my_stage/cortex_rest_api_export/ FROM ( SELECT REQUEST_ID, START_TIME, END_TIME, MODEL_NAME, TOKENS, TOKENS_GRANULAR:"input"::INT AS input_tokens, TOKENS_GRANULAR:"output"::INT AS output_tokens, INFERENCE_REGION, USER_ID FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_REST_API_USAGE_HISTORY WHERE START_TIME >= DATEADD('day', -90, CURRENT_TIMESTAMP()) ) FILE_FORMAT = (TYPE = 'PARQUET') OVERWRITE = TRUE; ``` - ACCOUNT_USAGE views can have up to 45 minutes of latency. - The `TOKENS_GRANULAR` column is a VARIANT containing `"input"` and `"output"` token counts. - Rate limit policies reflect the current configuration, not historical values. --- title: Cortex Search source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/cortex-search-overview.md section: Snowflake Cortex (AI & ML) --- # Cortex Search - [](/sql-reference/sql/alter-cortex-search) - [](/sql-reference/sql/create-cortex-search) - [](/sql-reference/sql/desc-cortex-search) - [](/sql-reference/sql/drop-cortex-search) - [](/sql-reference/sql/show-cortex-search) ## Overview Cortex Search enables low-latency, high-quality "fuzzy" search over your Snowflake data. It powers a broad array of search experiences for Snowflake users including [Retrieval Augmented Generation (RAG)](https://en.wikipedia.org/wiki/Prompt_engineering#Retrieval-augmented_generation) applications leveraging Large Language Models (LLMs). Cortex Search gets you up and running with a hybrid (vector and keyword) search engine on your text data in minutes, without having to worry about embedding, infrastructure maintenance, search quality parameter tuning, or ongoing index refreshes. This means you can spend less time on infrastructure and search quality tuning, and more time developing high-quality chat and search experiences using your data. Check out the [Cortex Search tutorials](/user-guide/snowflake-cortex/cortex-search/overview-tutorials) for step-by-step instructions on using Cortex Search to power AI chat and search applications. ## When to use Cortex Search The two primary use cases for Cortex Search are retrieval augmented generation (RAG) and enterprise search. - **RAG engine for LLM chatbots**: Use Cortex Search as a RAG engine for chat applications with your text data by leveraging semantic search for customized, contextualized responses. - **Enterprise search**: Use Cortex Search as a backend for a high-quality search bar embedded in your application. ### Cortex Search for RAG Retrieval augmented generation (RAG) is a technique for retrieving data from a knowledge base to enhance the generated response of a large language model. The following architecture diagram shows how you can combine Cortex Search with [Cortex LLM Functions](/user-guide/snowflake-cortex/aisql) to create enterprise chatbots with RAG using your Snowflake data as a knowledge base. ![Using Cortex Search for RAG in Snowflake](/static/images/cortex-search-rag.png) Cortex Search is the retrieval engine that provides the Large Language Model with the context it needs to return answers that are grounded in your most up-to-date proprietary data. ## Example: Create and query a Cortex Search service This example takes you through the steps of creating a Cortex Search Service and querying it using the REST API. Refer to the [Querying a Cortex Search Service](#label-cortex-search-query-syntax) topic for more details about querying the service. This example uses a sample customer support transcript dataset. Run the following commands to setup the example database and schema. ```sql CREATE DATABASE IF NOT EXISTS cortex_search_db; CREATE OR REPLACE WAREHOUSE cortex_search_wh WITH WAREHOUSE_SIZE='X-SMALL'; CREATE OR REPLACE SCHEMA cortex_search_db.services; ``` Run the following SQL commands to create the dataset. ```sql CREATE OR REPLACE TABLE support_transcripts ( transcript_text VARCHAR, region VARCHAR, agent_id VARCHAR ); INSERT INTO support_transcripts VALUES ('My internet has been down since yesterday, can you help?', 'North America', 'AG1001'), ('I was overcharged for my last bill, need an explanation.', 'Europe', 'AG1002'), ('How do I reset my password? The email link is not working.', 'Asia', 'AG1003'), ('I received a faulty router, can I get it replaced?', 'North America', 'AG1004'); ``` ### Create the service You can create a Cortex Search Service with a single SQL query or from the Snowflake AI & ML Studio. When you create a Cortex Search Service, Snowflake performs transformations on your source data to get it ready for low-latency serving. The following sections show how to create a service using both SQL and in the Snowflake AI & ML Studio in %sf-web-interface%. When you create a search service, the search index is built as part of the create process. This means the CREATE CORTEX SEARCH SERVICE statement may take longer to complete for larger datasets. #### Use SQL The following example demonstrates how to create a Cortex Search Service with [](/sql-reference/sql/create-cortex-search) on the sample customer support transcript dataset created in the previous section. ```sql CREATE OR REPLACE CORTEX SEARCH SERVICE transcript_search_service ON transcript_text ATTRIBUTES region WAREHOUSE = cortex_search_wh TARGET_LAG = '1 day' EMBEDDING_MODEL = 'snowflake-arctic-embed-l-v2.0' AS ( SELECT transcript_text, region, agent_id FROM support_transcripts ); ``` This command triggers the building of the search service for your data. In this example:
- Queries to the service will search for matches in the `transcript_text` column. - The `TARGET_LAG` parameter dictates that the Cortex Search Service will check for updates to the base table `support_transcripts` approximately once per day. - The columns `region` and `agent_id` will be indexed so that they can be returned along with results of queries on the `transcript_text` column. - The column `region` will be available as a filter column when querying the `transcript_text` column. - The warehouse `cortex_search_wh` will be used for materializing the results of the specified query initially and each time the base table is changed.
- Depending on the size of the warehouse specified in the query and the number of rows in your table, this CREATE command may take up to several hours to complete. - Snowflake recommends using a dedicated warehouse of size no larger than MEDIUM for each service. - Columns in the ATTRIBUTES field must be included in the source query, either via explicit enumeration or wildcard, ( `*` ) . #### Use %sf-web-interface% Follow these steps to create a Cortex Search Service in %sf-web-interface%: 1. Sign in to %sf-web-interface-link%. 2. Choose a role that is granted the SNOWFLAKE.CORTEX_USER database role. 3. In the navigation menu, select **AI & ML** %raa% **Cortex Search**. 4. Select **Create**. 5. Select a role and warehouse. The role must be granted the SNOWFLAKE.CORTEX_USER database role. The warehouse is used for materializing the results of the source query when the service is created and refreshed. 6. Select a database and schema in which the service is defined. 7. Enter a name for your service, then select **Next**. 8. Select data to be indexed. - To select a table or view, select **Table or view**. Select the table or view that contains the text data to be indexed for searching, then select **Next**. For example, select the `support_transcripts` table. - To select files from a stage, select **Stage**. (Preview) Select the stage that contains the files to be indexed for searching, then select **Next**. If you want to specify multiple data sources or perform transformations when defining your service, [use SQL](#label-cortex-search-overview-example-sql). 9. If you selected **Table or view**: - Select the columns you want included in the search results, for example, `transcript_text`, `region`, and `agent_id`, then select **Next**. - Select the column that will be searched, for example, `transcript_text`, then select **Next**. - If you want to be able to filter your search results based on particular columns, select those columns, then select **Next**. If you don't need any filters, select **Skip this option**. If you selected **Stage** (Preview): - Select the destination for your processed data, then select **Next**. 10. Select the configuration parameters for the service. Set your target lag, which is the amount of time your service content should lag behind updates to the base data, then select **Create**. The final step confirms that your service has been created and displays the service name and its data source. When you create the service from %sf-web-interface%, the name of the service is double-quoted. For details on what that means when referencing the service in SQL, see [](#label-delimited-identifier). ### Grant usage permissions After the service and index are created, you can grant usage on the service, its database, and schema to other roles like customer_support. ```sql GRANT USAGE ON DATABASE cortex_search_db TO ROLE customer_support; GRANT USAGE ON SCHEMA services TO ROLE customer_support; GRANT USAGE ON CORTEX SEARCH SERVICE transcript_search_service TO ROLE customer_support; ``` ### Preview the service To confirm that the service is populated with data properly, you can preview the service via the [SEARCH_PREVIEW function](#label-cortex-search-query-syntax-sql-preview) from a SQL environment: ```sql SELECT PARSE_JSON( SNOWFLAKE.CORTEX.SEARCH_PREVIEW( 'cortex_search_db.services.transcript_search_service', '{ "query": "internet issues", "columns":[ "transcript_text", "region" ], "filter": {"@eq": {"region": "North America"} }, "limit":1 }' ) )['results'] as results; ``` Sample successful query response: ```json-object [ { "transcript_text" : "My internet has been down since yesterday, can you help?", "region" : "North America" } ] ``` This response confirms that the service is populated with data and serving reasonable results for the given query. You can also use the [CORTEX_SEARCH_DATA_SCAN](/sql-reference/functions/cortex_search_data_scan) table function to inspect the contents of the service. ```sql SELECT * FROM TABLE ( CORTEX_SEARCH_DATA_SCAN ( SERVICE_NAME => 'transcript_search_service' ) ); ``` ```text + ---------------------------------------------------------- + --------------- + -------- + ------------------------------ + | transcript_text | region | agent_id | _GENERATED_EMBEDDINGS_MY_MODEL | | ---------------------------------------------------------- | --------------- | -------- | ------------------------------ | | 'My internet has been down since yesterday, can you help?' | 'North America' | 'AG1001' | [0.1, 0.2, 0.3, 0.4] | | 'I was overcharged for my last bill, need an explanation.' | 'Europe' | 'AG1002' | [0.1, 0.2, 0.3, 0.4] | + ---------------------------------------------------------- + --------------- + -------- + ------------------------------ + ``` ### Query the service from your application Once you've created the search service, granted usage on it to your role, and previewed it, you can now query it from your application using the [Python API](#label-cortex-search-query-syntax-python). The following code shows using the Python API to retrieving the support ticket most relevant to a query about `internet issues`, filtered to return results in the `North America` region: ```python from snowflake.core import Root from snowflake.snowpark import Session CONNECTION_PARAMETERS = {"..."} session = Session.builder.configs(CONNECTION_PARAMETERS).create() root = Root(session) transcript_search_service = (root .databases["cortex_search_db"] .schemas["services"] .cortex_search_services["transcript_search_service"] ) resp = transcript_search_service.search( query="internet issues", columns=["transcript_text", "region"], filter={"@eq": {"region": "North America"} }, limit=1 ) print(resp.to_json()) ``` Sample successful query response: ```json-object { "results": [ { "transcript_text": "My internet has been down since yesterday, can you help?", "region": "North America" } ], "request_id": "5d8eaa5a-800c-493c-a561-134c712945ba" } ``` Cortex Search Services return all columns specified in the `columns` field in your query. ## Required privileges - To create a Cortex Search Service, your role must have the required privileges to use the Cortex embedding functions, which requires granting the [SNOWFLAKE.CORTEX_USER](#label-snowflake-db-roles-cortex-user) database role or the [SNOWFLAKE.CORTEX_EMBED_USER](#label-snowflake-db-roles-cortex-embed-user) database role to the service creator role. You must also have the following privileges: - The CREATE CORTEX SEARCH SERVICE or OWNERSHIP privilege on the schema where you create the service. - The SELECT privilege on the underlying table(s) or view(s) that the service queries. - The USAGE privilege on the warehouse that refreshes the service. - Change tracking must be enabled on all underlying objects used by a Cortex Search Service. For more information about change tracking requirements, see [](#label-cortex-search-change-tracking-requirements). - To query a Cortex Search Service, the role of the querying user must have USAGE privileges on the service itself, as well as on the database and schema in which the service resides. See [Cortex Search Access Control Requirements](#label-cortex-search-query-syntax-access-controls). - To suspend or resume a Cortex Search Service using the ALTER command, the role of the querying user must have the OPERATE privilege on the service. See [](/sql-reference/sql/alter-cortex-search). Cortex Search Services perform searches with [owner's rights](/developer-guide/stored-procedure/stored-procedures-rights) and follow the same security model as other Snowflake objects that run with owner's rights. For more information, see [Cortex Search Access Control Requirements](#label-cortex-search-query-syntax-access-controls) ## Understanding Cortex Search quality Cortex Search leverages an ensemble of retrieval and ranking models to provide you with a high level of search quality with little to no tuning required. Under the hood, Cortex Search takes a "hybrid" approach to retrieving and ranking documents. Each search query utilizes: - **Vector search** for retrieving semantically similar documents. - **Keyword search** for retrieving lexically similar documents. - **Semantic reranking** for reranking the most relevant documents in the result set. This hybrid retrieval approach, coupled with a semantic reranking step, achieves high search quality across a broad range of datasets and queries. You can customize the scoring of search results by applying numeric boosts, time decays, adjusting component weights, or disabling reranking. For more information, see [](/user-guide/snowflake-cortex/cortex-search/cortex-search-customize-scoring). ### Cortex Search Embedding Models Cortex Search allows users to select a hosted embedding model to be leveraged in the vector search stage of retrieval. The following embedding models are available in Cortex Search. Model pricing varies. Canonical model pricing is available in the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf). If a price shown below differs from the price shown for the model in the Snowflake Service Consumption Table, the Snowflake Service Consumption table shall govern. | Model name | Output Dimensions | Context window size (tokens) | Language support | Description | | ----------------------------------------- | ----------------- | ---------------------------- | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `snowflake-arctic-embed-m-v1.5` (default) | 768 | 512 | English-only | Snowflake's most practical, English-only embedding model. This open-source, 110M-parameter model yields the fastest indexing times of the available models in Cortex Search. For more information, see the [Arctic Embed 1.5 blog post](https://www.snowflake.com/en/engineering-blog/arctic-embed-m-v1-5-enterprise-retrieval) and [Arctic Embed 1.5 model card](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5). | | `snowflake-arctic-embed-l-v2.0` | 1024 | 512 | Multilingual | Snowflake's price-performant multilingual embedding model with a context window of 512 tokens. This open-source, 568M-parameter model yields high quality on both English and non-English datasets. For more information, see the [Arctic Embed 2 blog post](https://www.snowflake.com/en/engineering-blog/snowflake-arctic-embed-2-multilingual/) and [Arctic Embed 2 model card](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0). | | `snowflake-arctic-embed-l-v2.0-8k` | 1024 | 8192 | Multilingual | Snowflake's price-performant multilingual embedding model, with an increased context window of 8000 tokens. This open-source, 568M-parameter model yields high quality on both English and non-English datasets. | | `voyage-multilingual-2` | 1024 | 32,000 | Multilingual | Voyage's multilingual embedding model. This model yields high quality on both English and non-English datasets. For more information, see the [Voyage Multilingual 2 blog post](https://blog.voyageai.com/2024/06/10/voyage-multilingual-2-multilingual-embedding-model/) | Some embedding models are only available in certain cloud regions for Cortex Search. For an availability list by model by region, see [Cortex Search Regional Availability](#label-cortex-search-overview-regional-availability). | Model name | Output Dimensions | Context window size (tokens) | Language support | Description | | ----------------------------------------- | ----------------- | ---------------------------- | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `snowflake-arctic-embed-m-v1.5` (default) | 768 | 512 | English-only | Snowflake's most practical, English-only embedding model. This open-source, 110M-parameter model yields the fastest indexing times of the available models in Cortex Search. For more information, see the [Arctic Embed 1.5 blog post](https://www.snowflake.com/en/engineering-blog/arctic-embed-m-v1-5-enterprise-retrieval) and [Arctic Embed 1.5 model card](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5). | | `snowflake-arctic-embed-l-v2.0` | 1024 | 512 | Multilingual | Snowflake's price-performant multilingual embedding model with a context window of 512 tokens. This open-source, 568M-parameter model yields high quality on both English and non-English datasets. For more information, see the [Arctic Embed 2 blog post](https://www.snowflake.com/en/engineering-blog/snowflake-arctic-embed-2-multilingual/) and [Arctic Embed 2 model card](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0). | | `snowflake-arctic-embed-l-v2.0-8k` | 1024 | 8192 | Multilingual | Snowflake's price-performant multilingual embedding model, with an increased context window of 8000 tokens. This open-source, 568M-parameter model yields high quality on both English and non-English datasets. | Each model has different performance, cost, context window size, and quality characteristics. Carefully review the model specifications to determine the best model for your specific workload. Refer to the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf) for most accurate view of each model's cost in credits per million tokens. ### Tokens, model context windows, and text splitting A token is a sequence of characters and is the smallest unit of text that can be processed by a large language model. As an approximation, one token is equivalent to about 3/4 of an English word, or around 4 characters. To calculate the number of tokens in a string, use the [COUNT_TOKENS Cortex Function](/sql-reference/functions/count_tokens-snowflake-cortex). For example, calculating the tokens for a string to be embedded with the `snowflake-arctic-embed-m-v1.5` model: ```sql SELECT SNOWFLAKE.CORTEX.COUNT_TOKENS('snowflake-arctic-embed-m', '') as token_count ``` Each vector embedding model supports a fixed size context window for text inputs, indicated in the preceding embedding model table. During both indexing and serving, when the number of tokens in a value in the search column exceeds the context window size, Cortex Search truncates the string to the size of the context window before embedding it into vector space for semantic search. However, Cortex Search uses the full body of text for keyword-based retrieval. Snowflake provides built-in functions to assist in splitting of text into smaller chunks. For more information, see [SPLIT_TEXT_RECURSIVE_CHARACTER](/sql-reference/functions/split_text_recursive_character-snowflake-cortex). For best search results with Cortex Search, Snowflake recommends splitting the text in your search column into chunks of no more than 512 tokens (about 385 English words). While there are longer-context embedding models available today, such as `snowflake-arctic-embed-l-v2.0-8k`, [research](https://www.snowflake.com/en/engineering-blog/impact-retrieval-chunking-finance-rag/) shows that *a smaller chunk size typically results in higher retrieval and downstream LLM response quality*. With smaller chunks, retrieval can be more precise for a given query and, in a retrieval-augmented generation (RAG) scenario, the downstream LLM receives text chunks that are more relevant to the query. ## Refreshes The content served in a Cortex Search Service is based on the results of a specific query. When the data underlying a Cortex Search Service changes, the service updates to reflect those changes. These updates are referred to as a refresh. This process is automated, and it involves analyzing the query that underlies the table. Cortex Search Services have the same refresh properties as Dynamic Tables. See [](/user-guide/dynamic-tables/refresh-modes) topic to understand the refresh characteristics of a Cortex Search Service. The source query for a Cortex Search Service must be a candidate for dynamic table incremental refresh. For details on those requirements, see [](#label-dynamic-tables-limits-incremental-refresh). This restriction is designed to prevent any unwanted runaway costs associated with vector embedding computation. For more information about the constructs that are not supported for dynamic table incremental refresh, see [](/user-guide/dynamic-tables/supported-queries). ### Primary keys A primary key of a Cortex Search Service is an optional set of columns that uniquely identify each row in the source query (that is, only one row has that exact combination of values in the designated columns). To be used with Cortex Search Services, primary key columns must be of the [TEXT](/sql-reference/data-types-text) data type. A primary key can be specified when creating the service as follows: ```sql CREATE OR REPLACE CORTEX SEARCH SERVICE transcript_search_service ON transcript_text PRIMARY KEY (region, agent_id) WAREHOUSE = cortex_search_wh TARGET_LAG = '1 day' AS ( SELECT transcript_text, region, agent_id FROM support_transcripts ); ``` The primary key columns of existing services can be modified with `ALTER CORTEX SEARCH SERVICE ... SET PRIMARY KEY (...)`. For detailed syntax, see [](/sql-reference/sql/alter-cortex-search). Services with primary keys can use an optimized refresh path where each refresh cycle processes only the rows that changed since the last refresh, rather than re-embedding all data. This can result in significant reductions to the cost and latency of a refresh. Over time, these incremental updates accumulate fragmented index segments. The service periodically compacts these segments with a full index rebuild to maintain optimal search performance. You can control how often this compaction occurs by setting the `FULL_INDEX_BUILD_INTERVAL_DAYS` property on the service. A full index rebuild does not re-embed unchanged data. For syntax details, see [](/sql-reference/sql/create-cortex-search) and [](/sql-reference/sql/alter-cortex-search). `FULL_INDEX_BUILD_INTERVAL_DAYS` is a soft target. Full rebuilds may occur more frequently than the specified interval to optimize serving performance based on factors such as service target lag, change rate in the service source data, and overall service size. Queries to services with primary keys may also make use of the `@primarykey` [filter operator](#label-cortex-search-query-filter-syntax). The set of primary key column values must be unique for each row in the source query. Duplicates are ignored in the resulting search index. ## Multi-index Cortex Search Cortex Search can index multiple columns or use custom vector embeddings for queries, allowing you additional flexibility in how your Cortex Search Service interprets data and responds to user requests. You should use Multi-index Cortex Search when you have a use case that features one or more of: - **Multiple search fields**: Users need to search across different fields of a record. - **User-provided vector embeddings**: You have pre-computed vector embeddings for one or more columns prior to ingestion into the Cortex Search Service. - **Mixed search types**: You want to support searching different fields with preference to a type of search. - Use *text indexes* for fields where exact or fuzzy keyword matches are important. Some examples are product codes, names, and categories. - Use *vector indexes* for fields with longer text content where semantic understanding is valuable. Examples include product descriptions, user reviews, and support cases. - **Field-specific relevance**: Different fields of your data should contribute differently to relevance of a search result. For example, for a product catalog search use case, you can create a multi-index service where: - Product names and SKUs are *text indexes* for precise lexical matching. - Product descriptions are *vector indexes* for semantic matching. - Category and brand names are both text *and* vector indexes to support both lexical and semantic matches. For examples of creating a multi-index Cortex Search service, see [CREATE CORTEX SEARCH SERVICE ... TEXT INDEXES .. VECTOR INDEXES](/sql-reference/sql/create-cortex-search). For examples of querying a multi-index service, see [Query a Cortex Search service - Multi-index queries](#label-cortex-search-multi-query). ### User-provided vector embeddings Multi-index Cortex Search allows you to use pre-computed vector embeddings from any embedding model (including open-source, commercial, and custom-trained models). Use user-provided vector embeddings when: - You want to use an embedding model not natively available in Cortex Search, or you want to reuse embeddings you have already generate to reduce cost and improve performance. - You want to combine your vector embeddings with Cortex Search text indexes for hybrid retrieval. When you specify a bare column name in the VECTOR INDEXES clause, but do not specify a model, Cortex Search treats the contents of the column as user-provided vector embeddings. User-provided vectors are indexed as-is and do not incur any embedding cost. You cannot load vectors directly into a Snowflake table. Instead, cast an array of numbers to the VECTOR data type when inserting or updating data in the source table for your Cortex Search Service. See [](#label-data-type-vector-conversion) for details and examples of how to do this. Cortex Search chooses one of the following modes at search time, depending on whether you provide a query vector or query text in your search request:
| Mode | Index time | Query time | | ------------------------------------------ | ---------------------------------- | --------------------------------------------------------- | | Fully user-managed | Provide vectors in a VECTOR column | Provide a query vector via multi_index_query | | User-managed with managed query embeddings | Provide vectors in a VECTOR column | Cortex Search embeds query text using the specified model |
## Suspension of indexing and serving A Cortex Search Service has two states that can be running or suspended independently: indexing and serving. Use [](/sql-reference/sql/alter-cortex-search) to manually suspend or resume indexing, serving, or both. Indexing also suspends automatically after repeated refresh failures, as a built-in safeguard. Separately, you can configure serving to auto-suspend after a period of query inactivity to reduce cost on idle services. ### Indexing suspension on refresh errors Much like Dynamic Tables, Cortex Search Services automatically suspend their indexing state when they encounter five consecutive refresh failures related to the source query. If you encounter this failure for your service, you can view the specific SQL error using either [](/sql-reference/sql/desc-cortex-search) or the [](/sql-reference/info-schema/cortex_search). The output from both includes the following columns: - The INDEXING_STATE column, which is SUSPENDED for a suspended service. - The INDEXING_ERROR column, which contains the specific SQL error encountered in the source query. Once the root issue is resolved, you can resume the service with `ALTER CORTEX SEARCH SERVICE RESUME INDEXING`. For detailed syntax, see [](/sql-reference/sql/alter-cortex-search). ### Auto-suspend serving on inactivity Available to all accounts. To reduce serving cost when a service isn't in use, set the `AUTO_SUSPEND` property to a number of seconds of query inactivity. After that period passes, the service automatically suspends its serving compute, and resumes when it next receives a query. Set `AUTO_SUSPEND` when creating a service in [](/sql-reference/sql/create-cortex-search), or on an existing service with [](/sql-reference/sql/alter-cortex-search). Minimum: 1800 seconds (30 minutes). You can also resume an auto-suspended service early with `ALTER CORTEX SEARCH SERVICE RESUME`. After a manual resume, the next auto-suspension can occur only after another full `AUTO_SUSPEND` period of inactivity. Usage notes: - When an auto-suspended service receives a query, the first request is paused until the service resumes and then completes. Concurrent requests during the resume window return HTTP 429 with a `Retry-After` header; implement retry logic in your client to handle these responses gracefully. - Resuming may take up to a few minutes. Services with smaller indexed data volumes resume faster. - The service suspends within 5 minutes after the inactivity threshold passes. To check whether a service is currently serving and whether auto-suspend is configured, use [](/sql-reference/sql/show-cortex-search) or [](/sql-reference/sql/desc-cortex-search) and look at the `SERVING_STATE` and `AUTO_SUSPEND` columns. ## Cost considerations A Cortex Search Service incurs cost in the following ways: | Category | Description | | ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Virtual warehouse compute | A Cortex Search Service requires a [virtual warehouse](#label-virtual-warehouse-credit-usage) to refresh the service: to run queries against base objects when they are initialized and refreshed, including orchestrating text embedding jobs and building the search index. These operations use compute resources, which consume [credits](#label-what-are-credits). If no changes are identified during a refresh, virtual warehouse credits aren't consumed since there's no new data to refresh. | | EMBED_TEXT tokens compute | A Cortex Search Service automatically embeds each text row in the search column specified in the `ON` parameter into vector space to enable semantic search, which incurs a credit cost per token embedded. This involves calling [EMBED_TEXT_768](/sql-reference/functions/embed_text-snowflake-cortex) or [EMBED_TEXT_1024](/sql-reference/functions/embed_text_1024-snowflake-cortex) to convert each document as a series of numbers that encodes its meaning. Embeddings are computed each time a row is inserted or updated. Embeddings are processed incrementally in the evaluation of the source query, so the embedding cost is only incurred for added or changed documents. See [Vector Embeddings](/user-guide/snowflake-cortex/vector-embeddings) for more information on vector embedding costs. | | Multi-index Cortex Search | Multi-index Cortex Search Services have costs dependent on how you embed tokens and the number of columns you index. Larger embedding vectors or higher numbers of index columns incur higher costs. Embeddings are computed each time a row is inserted or updated. Embeddings are processed incrementally in the evaluation of the source query, so the embedding cost is only incurred for added or changed documents. | | Serving compute | A Cortex Search Service uses multi-tenant serving compute, separate from a user-provided Virtual Warehouse, to establish a low-latency, high-throughput service. The compute cost for this component is incurred per GB per month (GB/mo) of uncompressed indexed data, where indexed data is the user-provided data in the Cortex Search source query, plus vector embeddings computed on the user's behalf. You incur these costs while the service is available to respond to queries, even if no queries are served during a given period. For the Cortex Search Serving credit rate per GB/mo of indexed data, see the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf). | | Storage | Cortex Search Services materialize the source query into a table stored in your account. This table is transformed into data structures that are optimized for low-latency serving, also stored in your account. Storage for the table and intermediate data structures are based on a flat rate per terabyte (TB). | | Cloud services compute | Cortex Search Services use [Cloud Services compute](#label-cloud-services-credit-usage) to identify changes in underlying base objects and whether the virtual warehouse needs to be invoked. Cloud services compute cost is subject to the constraint that Snowflake only bills if the daily cloud services cost is greater than 10% of the daily warehouse cost for the account. | For best practices on managing the costs of a Cortex Search Service, see [](/user-guide/snowflake-cortex/cortex-search/cortex-search-costs). To view the **AI Services**-related consumption costs for each Cortex Search Service in your account, aggregated daily, see the [CORTEX_SEARCH_DAILY_USAGE_HISTORY view](/sql-reference/account-usage/cortex_search_daily_usage_history) ## Known limitations Usage of Cortex Search is subject to the following limitations: - **Base table size**: The result of the materialized query in the search service must be less than 100M rows in size to maintain optimal serving performance. If the materialized result of your query has more than 100M rows, the creation query fails with an error. To increase the row scaling limits on a Cortex Search Service above 100M, please contact your Snowflake account team. - **Throughput and rate limiting**: Cortex Search returns a 429 HTTP status code if a client sends requests too quickly or if the service becomes overloaded. Client logic calling the search service should implement backoff and retry logic to handle these 429 responses gracefully. To increase throughput beyond 20 QPS for a single search service or 140 QPS across all services in your account, contact your Snowflake account team. - **Query constructs**: Cortex Search Service source queries must adhere to the same query restrictions that Dynamic Tables have. Please see the [](/user-guide/dynamic-tables/decision-guide#label-dynamic-tables-limitations) for more detail. - **Data retention**: Cortex Search Services have the same requirements as dynamic tables around data retentions. Specifically, you can't set the [](#label-data-retention-time-in-days) object parameter in your base tables to zero or set this parameter on the schema or database containing the search service. Additionally, search services can become stale if they are not refreshed within [](#label-max-data-extension-time-in-days). Once stale, they must be recreated to resume refreshes. Please see the [](/user-guide/dynamic-tables/decision-guide#label-dynamic-tables-limitations) for more detail. - **Cloning**: Cortex Search Services do not currently support [cloning](/user-guide/object-clone). Snowflake intends to provide this capability in some future release, but cannot guarantee a specific timeline. - **Table immutability**: While running, your Cortex Search Services require tables they access aren't modified or dropped. To safely update tables used by a Cortex Search Service, stop the service before making your changes. ## Regional availability Cortex Search is available in the People's Republic of China. Support for this feature is available to accounts in the following Snowflake regions. Availability for specific embedding models within a region is denoted with a checkmark. | Cloud Provider | Region | `snowflake-arctic-embed-m-v1.5` | `snowflake-arctic-embed-l-v2.0` | `snowflake-arctic-embed-l-v2.0-8k` | `voyage-multilingual-2` | | -------------- | -------------------------------------- | ------------------------------- | ------------------------------- | ---------------------------------- | ----------------------- | | AWS | US West 2 (Oregon) | %cm% | %cm% | %cm% | %cm% | | AWS | US East 2 (Ohio) | %cm% | %cm% | %cm% | | | AWS | US East 1 (N. Virginia) | %cm% | %cm% | %cm% | %cm% | | AWS | US East (Commercial Gov - N. Virginia) | %cm% | %cm% | %cm% | %cm% | | AWS | Canada (Central) | %cm% | %cm% | %cm% | | | AWS | South America (São Paulo) | %cm% | %cm% | %cm% | | | AWS | Europe (Ireland) | %cm% | %cm% | %cm% | | | AWS | Europe (London) | %cm% | %cm% | %cm% | | | AWS | Europe Central 1 (Frankfurt) | %cm% | %cm% | %cm% | %cm% | | AWS | Europe (Stockholm) | %cm% | %cm% | %cm% | | | AWS | Asia Pacific (Tokyo) | %cm% | %cm% | %cm% | %cm% | | AWS | Asia Pacific (Mumbai) | %cm% | %cm% | %cm% | | | AWS | Asia Pacific (Sydney) | %cm% | %cm% | %cm% | | | AWS | Asia Pacific (Jakarta) | %cm% | %cm% | %cm% | | | AWS | Asia Pacific (Seoul) | %cm% | %cm% | %cm% | | | Azure | East US 2 (Virginia) | %cm% | %cm% | %cm% | | | Azure | West US 2 (Washington) | %cm% | %cm% | %cm% | | | Azure | South Central US (Texas) | %cm% | %cm% | %cm% | | | Azure | UK South (London) | %cm% | %cm% | %cm% | | | Azure | North Europe (Ireland) | %cm% | %cm% | %cm% | | | Azure | West Europe (Netherlands) | %cm% | %cm% | %cm% | %cm% | | Azure | Switzerland North (Zürich) | %cm% | %cm% | %cm% | | | Azure | Central India (Pune) | %cm% | %cm% | %cm% | | | Azure | Japan East (Tokyo, Saitama) | %cm% | %cm% | %cm% | | | Azure | Southeast Asia (Singapore) | %cm% | %cm% | %cm% | | | Azure | Australia East (New South Wales) | %cm% | %cm% | %cm% | | | GCP | Europe West 2 (London) | %cm% | %cm% | %cm% | | | GCP | Europe West 3 (Frankfurt) | %cm% | %cm% | %cm% | | | GCP | Europe West 4 (Netherlands) | %cm% | %cm% | %cm% | | | GCP | Middle East Central 2 (Dammam) | %cm% | %cm% | %cm% | | | GCP | US Central 1 (Iowa) | %cm% | %cm% | %cm% | | | GCP | US East 4 (N. Virginia) | %cm% | %cm% | %cm% | | You can specify the [cross-region inference parameter](#label-use-cross-region-inference) in any of the above regions to access models which aren't directly supported from your default region. Cortex Search is available in the following regions **only** using cross-region inference. To use Cortex Search with cross-region inference, use the [cross-region inference parameter](#label-use-cross-region-inference). - AWS Europe (Paris) - AWS Europe (Zurich) - AWS Asia Pacific (Singapore) - AWS Asia Pacific (Osaka) - Azure Canada Central (Toronto) - Azure Central US (Iowa) - Azure UAE North (Dubai) When using cross-region inference, query latency between regions depends on the cloud provider infrastructure and network status. Snowflake recommends that you test your specific use-case with cross-region inference enabled. ## Legal notices The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). --- title: Cortex Search tutorials source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/overview-tutorials.md section: Snowflake Cortex (AI & ML) --- # Cortex Search tutorials - [Cortex Search overview](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) - [](/user-guide/snowflake-cortex/cortex-search/query-cortex-search-service) These tutorials provide step-by-step instructions for you to explore how to use Cortex Search.
[](/user-guide/snowflake-cortex/cortex-search/tutorials/cortex-search-tutorial-1-search)
Walks through building a simple search experience using Cortex Search on a dataset consisting of AirBnb listing reviews.
[](/user-guide/snowflake-cortex/cortex-search/tutorials/cortex-search-tutorial-2-chat)
Walks through building a basic chatbot with Cortex Search and LLM functions on a dataset consisting of TED Talk transcripts.
[](/user-guide/snowflake-cortex/cortex-search/tutorials/cortex-search-tutorial-3-chat-advanced)
Walks through an end-to-end setup for creating a Chatbot using Cortex Search on a PDF dataset consisting of Federal Open Market Committee (FOMC) meeting minutes.
--- title: Cost considerations for Cortex AI Functions source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/aisql-cost.md section: Snowflake Cortex (AI & ML) --- # Cost considerations for Cortex AI Functions This feature is not available in the People's Republic of China. - [Cortex AI Functions](/user-guide/snowflake-cortex/aisql) - [Managing Cortex AI Function costs with Account Usage](/user-guide/snowflake-cortex/ai-func-cost-management) - [](/sql-reference/account-usage/cortex_functions_usage_history) - [](/sql-reference/account-usage/cortex_functions_query_usage_history) Snowflake Cortex AI functions incur compute cost based on the number of tokens processed. Refer to the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf) for each function's cost in credits per million tokens. A token is the smallest unit of text processed by Snowflake Cortex AI functions. An industry convention for text is that a token is approximately equal to four characters, although this can vary by model, as can token equivalence for media files. - For functions that generate new text using provided text (AI_COMPLETE, AI_CLASSIFY, AI_FILTER, AI_AGG, SUMMARIZE (SNOWFLAKE.CORTEX), and AI_TRANSLATE, and their previous versions in the SNOWFLAKE.CORTEX schema), both input and output tokens are billable. - For Cortex Guard, only input tokens are counted. The number of input tokens is based on the number of tokens output from AI_COMPLETE (or COMPLETE). Cortex Guard usage is billed in addition to the cost of the AI_COMPLETE (or COMPLETE) function. - For AI_SIMILARITY, AI_EMBED, and the SNOWFLAKE.CORTEX.EMBED_* functions, only input tokens are counted. - For EXTRACT_ANSWER, the number of billable tokens is the sum of the number of tokens in the `from_text` and `question` fields. - AI_CLASSIFY, AI_FILTER, AI_AGG, AI_SENTIMENT, AI_SUMMARIZE_AGG, SUMMARIZE, TRANSLATE, AI_TRANSLATE, EXTRACT_ANSWER, ENTITY_SENTIMENT, and SENTIMENT add a prompt to the input text in order to generate the response. As a result, the billed token count is higher than the number of tokens in the text you provide. - AI_CLASSIFY labels, descriptions, and examples are counted as input tokens for each record processed, not just once for each AI_CLASSIFY call. - For AI_PARSE_DOCUMENT (or SNOWFLAKE.CORTEX.PARSE_DOCUMENT), billing is based on the number of document pages processed. - For AI_EXTRACT, both input and output tokens are counted. The `responseFormat` argument is counted as input tokens. For document formats consisting of pages, the number of pages processed is counted as input tokens. Each page in a document is counted as 970 tokens. - For AI_REDACT, both input and output tokens are counted. - AI_COUNT_TOKENS incurs only compute cost to run the function. No additional token-based costs are incurred. For models that support media files such as images or audio: - Audio files are billed at 50 tokens per second of audio. - The token equivalence of images is determined by the model used. The cost associated with keeping a warehouse active continues to apply when executing a query that calls a Snowflake Cortex LLM Function. For general information on compute costs, see [Understanding compute cost](/user-guide/cost-understanding-compute). ## Warehouse sizing Snowflake recommends using a warehouse size no larger than MEDIUM when calling Snowflake Cortex AI Functions. Using a larger warehouse than necessary does not increase performance, but can result in unnecessary costs. This recommendation may change in the future as we continue to evolve Cortex AI Functions. ## Track costs for AI services To track credits used for AI Services including LLM Functions in your account, use the [](/sql-reference/account-usage/metering_history): ```sql SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.METERING_DAILY_HISTORY WHERE SERVICE_TYPE='AI_SERVICES'; ``` ## Track credit consumption for Cortex AI Functions To view the credit and token consumption for each AI Function call, use the [](/sql-reference/account-usage/cortex_functions_usage_history): ```sql SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_FUNCTIONS_USAGE_HISTORY; ``` You can also view the credit and token consumption for each query within your Snowflake account. Viewing the credit and token consumption for each query helps you identify queries that are consuming the most credits and tokens. The following example query uses the [](/sql-reference/account-usage/cortex_functions_query_usage_history) to show the credit and token consumption for all of your queries within your account. ```sql SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_FUNCTIONS_QUERY_USAGE_HISTORY; ``` You can also use the same view to see the credit and token consumption for a specific query. ```sql SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_FUNCTIONS_QUERY_USAGE_HISTORY WHERE query_id=''; ``` You can't get granular usage information for requests made with the REST API. The query usage history is grouped by the models used in the query. For example, if you ran: ```sql SELECT AI_COMPLETE('mistral-7b', 'Is a hot dog a sandwich'), AI_COMPLETE('mistral-large', 'Is a hot dog a sandwich'); ``` The query usage history would show two rows, one for `mistral-7b` and one for `mistral-large`. ## See also For day-to-day cost governance (usage views, account-level alerts, per-user spending limits, runaway query detection), see [Managing Cortex AI Function costs with Account Usage](/user-guide/snowflake-cortex/ai-func-cost-management). --- title: Cross-region inference source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cross-region-inference.md section: Snowflake Cortex (AI & ML) --- # Cross-region inference This feature is not available in the People's Republic of China. Accessing frontier AI models and the full suite of Snowflake Cortex AI products requires cross-region inference. You choose the boundaries: route to any region for maximum access, or restrict to specific cloud providers and geographies. Snowflake Cortex AI supports cross-region inference across AWS, Azure, and GCP regions. You control the routing behavior through a single account-level parameter, [CORTEX_ENABLED_CROSS_REGION](#label-cortex-enable-cross-region), choosing the option that best fits your performance, compliance, and data residency requirements. ## How it works When cross-region inference is enabled, Snowflake automatically routes inference requests to available capacity within the bounds that you set. Your customer data remains stored only in the region where your account is located. During cross-region inference, the inference payload—the input prompt you send and the output response you receive—is transmitted transiently to the processing region for the duration of processing. Your customer data is not persisted in the processing region. ## Choosing a routing option Snowflake provides four routing options, from most flexible to most restrictive. Each option is a valid production configuration—choose the one that matches your requirements.
### Any region Setting the parameter to `ANY_REGION` provides access to the full set of supported models and Cortex products, with the highest available capacity. Snowflake routes requests to the optimal region automatically. ```sql ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'ANY_REGION'; ``` For new accounts created in new organizations within commercial regions after March 9, 2026, `ANY_REGION` is the default. ### Specify cloud regions For organizations that need inference data to remain within a particular cloud provider's network, you can specify one or more cloud regions. Requests are routed only to regions within the designated boundaries. ```sql ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'AWS_GLOBAL'; ``` For a full list of supported regions, see the [CORTEX_ENABLED_CROSS_REGION](#label-cortex-enable-cross-region) parameter reference. ### Account region only To restrict inference processing to your account's home region, set the parameter to `DISABLED`. This provides the strictest data residency posture but limits the models and features available to those deployed in your region. ```sql ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'DISABLED'; ``` ## Data security and residency Cross-region inference is designed with enterprise data security as a foundational requirement. Data in transit is always encrypted. How your inference data travels depends on the regions involved: - **Within the same cloud provider** — Data stays entirely within the provider's private backbone network and never traverses the public internet. - **Across cloud providers** — Data traverses the public internet using Mutual Transport Layer Security (mTLS), providing encrypted, mutually authenticated connections between Snowflake endpoints. **No customer data is stored at the processing region.** The processing region handles the request and returns the result; no customer data is persisted. **Billing is based on your account region.** Credits are consumed in your requesting region, regardless of where the request is processed. You do not incur data egress charges for cross-region inference. ## US Commercial Gov regions Cross-region inference for Snowflake's government-authorized, FIPS-compliant commercial environments is designed to maintain data-handling boundaries while providing access to supported AI models. When enabled, inference requests remain within the same cloud and compliance boundary, and processing occurs on FIPS-validated infrastructure such as AWS Bedrock FIPS endpoints. This approach allows customers in select U.S. government-authorized regions to use Snowflake AI capabilities securely and to meet your compliance requirements. To enable this feature, set the `CORTEX_ENABLED_CROSS_REGION` parameter to `AWS_US` for workloads in a supported government-authorized region: ```sql ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'AWS_US'; ``` Cross-region inference is available for US Commercial Gov in these regions: - US East (Commercial Gov - N. Virginia) - US West (Commercial Gov - Oregon) ## Access control requirements The `CORTEX_ENABLED_CROSS_REGION` parameter can only be set at the account level, not at the user or session levels. Only the ACCOUNTADMIN role can modify this parameter using the [](/sql-reference/sql/alter-account) command. This parameter cannot be set by the ORGADMIN role. ```sql ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'ANY_REGION'; ``` ## Cost considerations You are charged credits for the use of LLM as listed in the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf). Credits are considered consumed in the requesting region. For example, if you call an LLM Function from the `us-east-2` region and the request is processed in the `us-west-2` region, the credits are considered consumed in the `us-east-2` region. You do not incur data egress charges for using cross-region inference. ## Additional considerations - Latency between regions depends on the cloud provider infrastructure and network status. Snowflake recommends that you test your specific use case with cross-region inference enabled. - Cross-region inference for [Cortex Search](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) is not supported in [all regions](#label-cortex-search-overview-regional-availability). ## Next steps - For details on the cross-region inference parameter, see the [CORTEX_ENABLED_CROSS_REGION](#label-cortex-enable-cross-region) section of the SQL parameter reference. --- title: Custom instructions in Cortex Analyst source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst/custom-instructions.md section: Snowflake Cortex (AI & ML) --- # Custom instructions in %cortex-analyst% This feature is not available in the People's Republic of China. Custom instructions let you have greater control over SQL generation. Using natural language, you can tell %cortex-analyst% exactly how to generate SQL queries from within your semantic model YAML file. For example, use custom instructions to tell %cortex-analyst% what you mean by *performance* or *financial year*. In this way, you can improve the accuracy of the generated SQL by incorporating custom logic or additional elements. For more granular control, you can also specify custom instructions for individual modules in the SQL generation pipeline. See [](#label-cortex-analyst-module-custom-instructions) for more information. ## How custom instructions work %cortex-analyst% introduces the `custom_instructions` field into the semantic model YAML file. This field enables you to apply defining modifications or additions to SQL generation. For more information about the semantic model syntax, see [](/user-guide/views-semantic/sql). ## Examples To explore possible use cases for custom instructions, consider the following examples. ### Formatting data output Ensure that all numbers in the output are rounded to two decimal points. #### The `custom_instructions` field in the semantic model YAML file ```yaml custom_instructions: "Ensure that all numeric columns are rounded to 2 decimal points in the output." ``` #### Generated SQL query ```sql SELECT ROUND(column_name, 2) AS column_name, ... FROM your_table; ``` ### Adjusting percentages Automatically multiply percentage or rate calculations by 100 for consistency. #### The `custom_instructions` field in the semantic model YAML file ```yaml custom_instructions: "For any percentage or rate calculation, multiply the result by 100." ``` #### Generated SQL query ```sql SELECT (column_a / column_b) * 100 AS percentage_rate, ... FROM your_table; ``` ### Adding default filters Apply a filter if the user doesn't specify one (for example, default to the last year). #### The `custom_instructions` field in the semantic model YAML file ```yaml custom_instructions: "If no date filter is provided, apply a filter for the last year." ``` #### Generated SQL query ```sql SELECT ... FROM your_table WHERE date_column >= DATEADD(YEAR, -1, CURRENT_DATE); ``` ### Linking column filters Apply additional filters on related columns based on user input. #### The `custom_instructions` field in the semantic model YAML file ```yaml custom_instructions: "If a filter is applied on column X, ensure that the same filter is applied to dimension Y." ``` #### Generated SQL query ```sql SELECT ... FROM your_table WHERE column_x = 'filter_value' AND dimension_y = 'filter_value'; ``` ## Module custom instructions Set the `module_custom_instructions` key in the top level of your semantic model to define custom instructions for specific components in the SQL generation pipeline. This feature is useful for use cases like the following: - Define logic that influences how user questions are interpreted before SQL is generated - Maintain separate, more structured instructions for different parts of the Analyst workflow - Transition from existing `custom_instructions` to a more modular format as your usage grows Currently, `module_custom_instructions` supports the following components: - `question_categorization`: Define how %cortex-analyst% should classify user questions (for example, by blocking certain topics or guiding user behavior). - `sql_generation`: Specify how SQL should be generated (for example, data formatting and filtering). Instructions for either or both of these components can be set under the `module_custom_instructions` key. Migrate any existing `custom_instructions` to the `sql_generation` component, as shown in the following example. ### Migrating existing custom instructions If your model already has a `custom_instructions` field, migrate its content to the `sql_generation` field under `module_custom_instructions`. Before: ```yaml custom_instructions: "Ensure that all numeric columns are rounded to 2 decimal points." ``` After: ```yaml module_custom_instructions: sql_generation: | "Ensure that all numeric columns are rounded to 2 decimal points." ``` ### Blocking questions about specific topics You can use the `question_categorization` component to block questions about specific topics. For example, if you want to block questions about users, you might set the following instructions. %cortex-analyst% then rejects questions about users with a message telling them to contact their administrator. ```yaml module_custom_instructions: question_categorization: | Reject all questions asking about users. Ask users to contact their admin. ``` You can also use question categorization instructions to ask for missing details. In the following example, %cortex-analyst% asks the user to provide a product type if they ask about users and do not specify one. ```yaml module_custom_instructions: question_categorization: | - If the question asks for users without providing a product_type, consider this question UNCLEAR and ask the user to specify product_type. ``` ### Custom instructions through Cortex Agents When %cortex-analyst% is used as a tool within a [Cortex Agent](/user-guide/snowflake-cortex/cortex-agents), the agent follows your custom instructions with higher fidelity. In this mode, the agent interprets your natural-language instructions directly, rather than relying on specific state keywords like `UNCLEAR`. You can write custom instructions in plain language without referencing categorization states. For example: ```yaml module_custom_instructions: question_categorization: | - If the question asks for users without providing a product_type, ask the user to specify product_type. - Reject all questions about salary data and tell the user this information is not available. ``` The categorization state keywords (such as `UNCLEAR`) continue to work when using the %cortex-analyst% API directly. However, when your semantic model is accessed through a Cortex Agent, writing instructions in plain natural language is recommended, as the agent can follow nuanced instructions more precisely. ## Best practices
Be specific.
Clearly describe the modifications; for example, "Add a column with a fixed value of 42" or "Include a sum calculation for column X."
Start small.
Start with simple modifications, such as adding a static column or default filters, before moving to more complex scenarios.
Preview the generated SQL query.
Ensure that the instructions apply as intended and that the generated SQL query is correct.
Iterate gradually.
Experiment with more complex use cases as your familiarity with the feature grows.
--- title: Customize charts in Snowflake CoWork source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/snowflake-cowork/chart-customization.md section: Snowflake Cortex (AI & ML) --- # Customize charts in %sf-intelligence% This feature is not available in the People's Republic of China. Available to all accounts. %sf-intelligence% generates charts automatically from your data. You can customize those charts, controlling colors, fonts, chart types, and more, by adding configuration to your agent or semantic view. ## Overview Customization works at two levels: - **Agent level**: Applies to all charts across every semantic view attached to the agent. Use this for global defaults such as brand colors and fonts. - **Semantic view level**: Applies only to charts generated from that specific semantic view. Use this for column-specific rules and domain-specific chart type preferences. At each level, two mechanisms are available: - **vega_template**: A partial [Vega-Lite](https://vega.github.io/vega-lite/) JSON spec that is deterministically merged into every generated chart. Use this for anything that must always apply. - **Free-text instructions**: Natural language guidance injected into the chart generation prompt. The LLM makes a best effort to follow these, but they aren't guaranteed. When both agent and semantic view define a `vega_template`, the agent template is applied first and the semantic view template is applied second. On conflicting keys, the semantic view wins. ## Agent-level customization Add a `` block inside `instructions.orchestration` in your agent configuration. You can combine font theming and a global default palette: ``` Prefer horizontal bar charts for ranked data. vega_template: { "background": "antiquewhite", "config": { "title": { "font": "monospace", "fontStyle": "italic", "fontSize": 20, "fontWeight": "lighter" }, "axis": { "labelFont": "monospace", "titleFont": "monospace", "titleFontSize": 15, "labelFontSize": 10 }, "header": { "labelFont": "monospace", "titleFont": "monospace", "labelFontSize": 10 }, "legend": { "labelFont": "monospace", "titleFont": "monospace", "titleFontSize": 18, "labelFontSize": 15 }, "mark": { "font": "monospace" } } } ``` Use the agent level for: - Brand color palette - Default visual theme (fonts, background) - Cross-domain style preferences - Number or currency formatting defaults Avoid column-specific color mappings at the agent level. Column names differ between semantic views and are silently ignored when not found. ## Semantic-view-level customization Add a `` block inside the `module_custom_instructions.sql_generation` field of your semantic view YAML. This field takes precedence over the legacy `custom_instructions` field when both are set. ```sql CREATE OR REPLACE SEMANTIC VIEW my_db.my_schema.my_view FROM @my_stage/semantic_view.yaml; ``` ```yaml # semantic_view.yaml name: my_view module_custom_instructions: sql_generation: | Always use a line chart for time series data. vega_template: { "transform": [ { "calculate": "datum.CATEGORY === 'Furniture' ? '#4e79a7' : datum.CATEGORY === 'Technology' ? '#f28e2b' : datum.CATEGORY === 'Office Supplies' ? '#e15759' : ''", "as": "_color" } ], "encoding": { "color": { "field": "CATEGORY", "type": "nominal", "scale": { "range": { "field": "_color" } } } } } tables: ... ``` Use the semantic view level for: - Per-column color mappings - Domain-specific chart type rules - Metric-specific formatting - Overriding agent-level defaults ## Use with caution: templates affect every chart `vega_template` is merged into **every chart** generated at that level. There's no per-question or per-chart-type filtering. If you add an `encoding.y` override at the agent level, it applies to bar charts, line charts, scatter plots, and pie charts alike. Before adding a template, consider: - **Scope**: Agent-level templates affect all charts across all semantic views. Prefer the semantic view level when a rule is specific to one domain or dataset. - **Wildcard encodings**: A template encoding that omits `field` (for example, `"y": {"axis": {"format": "..."}}`) applies to every chart's `y` axis regardless of what column is plotted. Use `field` to pin it to a specific column when the semantic view is known. - **Mark overrides**: Setting `"mark": "line"` at the agent level forces every chart to a line, including ones where the LLM would correctly choose a bar or pie. Only override `mark` at the semantic view level where you have domain knowledge about the data. - **Transform arrays**: A `calculate` transform in the template (for example, `_color`) is injected into every chart's `transform` array. If the data doesn't contain the referenced column, Vega-Lite silently produces `null` values for the calculated field. When in doubt, start at the semantic view level and promote to the agent level only after confirming the rule is safe for all charts. To validate a template before deploying it, paste a representative chart spec (with your `vega_template` already merged in) into the [Vega Editor](https://vega.github.io/editor). The editor shows live warnings and errors in the console. A valid template should produce no warnings. Common things to catch this way: invalid property names, type mismatches, unreachable `calculate` expressions, and scale configuration errors. ## Fonts Font settings are controlled through the `config` block in `vega_template`. All font properties are applied globally to the chart and affect every chart generated, regardless of data. Use CSS generic font families for maximum compatibility. Charts in %sf-intelligence% are rendered in two contexts: in the %sf-web-interface% browser UI (client-side, fonts depend on the user's OS and browser) and server-side in a Linux container for validation and image export. Named fonts like `Arial` or `Georgia` might not be installed in the server-side container. CSS generic families always resolve correctly in both contexts:
If you need a custom brand font, it must be installed in the server-side rendering container **and** served through CSS `@font-face` in %sf-web-interface%. ```json { "config": { "title": { "font": "serif", "fontSize": 20, "fontWeight": "bold", "fontStyle": "italic" }, "axis": { "labelFont": "monospace", "titleFont": "monospace", "labelFontSize": 11, "titleFontSize": 13 }, "header": { "labelFont": "serif", "titleFont": "serif", "labelFontSize": 11 }, "legend": { "labelFont": "serif", "titleFont": "serif", "labelFontSize": 12, "titleFontSize": 13 }, "mark": { "font": "serif" } } } ``` Common `config` font properties:
You can also set a global `background` color alongside fonts: ```json { "background": "#f9f9f9", "config": { "title": { "font": "monospace", "fontStyle": "italic", "fontSize": 20, "fontWeight": "lighter" }, "axis": { "labelFont": "monospace", "titleFont": "monospace", "titleFontSize": 15, "labelFontSize": 10 }, "header": { "labelFont": "monospace", "titleFont": "monospace", "labelFontSize": 10 }, "legend": { "labelFont": "monospace", "titleFont": "monospace", "titleFontSize": 18, "labelFontSize": 15 }, "mark": { "font": "monospace" } } } ``` ## Colors ### LLM instructions (soft) The simplest way to apply color rules is to describe them in free text. The LLM interprets these on a best-effort basis. ``` Color Active status green, Inactive status red, and Pending status yellow. ``` Use this for quick, approximate color guidance when exact hex values aren't required. ### Exact value mapping with _color Map specific column values to exact hex colors using a `calculate` transform. Values not listed receive an empty string, and Vega-Lite renders those with its own default. ```json { "transform": [ { "calculate": "datum.STATUS === 'Active' ? '#22c55e' : datum.STATUS === 'Inactive' ? '#ef4444' : datum.STATUS === 'Pending' ? '#eab308' : ''", "as": "_color" } ], "encoding": { "color": { "field": "STATUS", "type": "nominal", "scale": { "range": { "field": "_color" } } } } } ``` Use this when you need exact, guaranteed colors for every known value. The `_color` transform and the `encoding.color` block are always merged into the chart, regardless of which column the LLM chose to color by. This means: - The mapping only works correctly when the chart's color channel actually uses the same column referenced in the `calculate` expression (for example, `STATUS`). If the LLM assigns color to a different column, the `_color` field is present in the data but the colors don't match. - Only one column can be targeted per template. ### Pinned values with palette fallback Pin colors for key values and let the rest be auto-assigned from a palette. Use `"merge": "extend"` to preserve the LLM's existing color choices and only add new mappings. ```json { "encoding": { "color": { "scale": { "domain": ["Furniture", "Technology", "Office Supplies"], "range": ["#4e79a7", "#f28e2b", "#e15759"], "scheme": "tableau10" } } }, "usermeta": { "merge": "extend" } } ``` Data values not in `domain` are automatically assigned the next available color from `scheme`. After assignment, `scheme` is removed from the final spec. Supported scheme names: `tableau10`, `tableau20`, `category10`, `category20`, `category20b`, `category20c`, `dark2`, `paired`, `pastel1`, `pastel2`, `set1`, `set2`, `set3`, `accent`. ## Disabling Snowsight styling By default, %sf-intelligence% applies %sf-web-interface% UI theme adjustments on top of the generated chart. To opt out and render the chart exactly as specified in your `vega_template`, set `ui-merge` to `"none"` in `usermeta`: ```json { "usermeta": { "ui-merge": "none" } } ``` This is useful when you want full control over the visual output, for example, when applying a custom brand theme and you don't want %sf-web-interface% to override colors, fonts, or backgrounds. `ui-merge` is interpreted by the Snowsight client-side renderer, not by the orchestrator backend. It has no effect on the chart spec produced by the merge engine. It only controls how Snowsight applies its own theme on top of the final spec when displaying the chart in the browser. ## Number and currency formatting (experimental) Axis and legend labels can be formatted using [D3 format strings](https://d3js.org/d3-format) through `vega_template`. This is useful for enforcing consistent currency symbols, decimal places, or SI suffixes across all charts. Set `axis.format` for quantitative axes (`x`, `y`) and `legend.format` for color/size legends: ```json { "encoding": { "y": { "axis": { "format": "$,.0f" } } } } ``` `axis.format` is applied by Vega-Lite only when the channel's data type is `"quantitative"`. If the LLM infers a different type (for example, `"ordinal"` for a year or ID column), the format string is silently ignored. This is an accepted limitation of the `vega_template` approach because the merge is applied without inspecting inferred types. **Workaround**: Force the type explicitly in the template (`override` mode): ```json { "encoding": { "y": { "type": "quantitative", "axis": { "format": "$,.0f" } } } } ``` This guarantees the format applies but may affect other type-dependent rendering (axis ticks, binning). Common D3 format strings:
To apply formatting to all quantitative channels at the agent level (without knowing the specific column name): ```json { "encoding": { "y": { "axis": { "format": "$,.0f" } }, "x": { "axis": { "format": "$,.0f" } }, "color": { "legend": { "format": "$,.0f" } } }, "usermeta": { "merge": "extend" } } ``` Use `"merge": "extend"` so the format is added only to channels the LLM already populated, without overwriting their `field` or `type` settings. ## Merge modes Control how `vega_template` interacts with the LLM-generated chart by setting `"usermeta": {"merge": ""}` inside the template.
Rules that apply to both modes: - The `data` block is never overwritten. - Encoding overrides apply only when the template's `field` matches the chart's `field`, or the template omits `field`. - After merging, domain entries not present in the actual data are automatically removed. **Example: force a line chart** ```json { "mark": "line", "usermeta": { "merge": "override" } } ``` --- title: Customizing Cortex Search scoring source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/cortex-search-customize-scoring.md section: Snowflake Cortex (AI & ML) --- # Customizing Cortex Search scoring - [](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) - [](/user-guide/snowflake-cortex/cortex-search/query-cortex-search-service) By default, queries to Cortex Search Services leverage vector similarity, text matching, and reranking to determine the relevance of each result. You can customize the scoring of search results in several ways: - Apply [numeric boosts](#label-cortex-search-boosts-decays) based on numeric metadata columns. - Apply [time decays](#label-cortex-search-boosts-decays) based on timestamp metadata columns. - Disable [reranking](#label-cortex-search-reranking) to reduce query latency. - Modify [component weights](#label-cortex-search-weights) to adjust the weight of individual scoring components (vector, text, reranking) in the overall search ranking. - Disable the [query prefix for vector embeddings](#label-cortex-search-disable-query-prefix) for advanced use cases. - Modify [index-specific boosts](#label-cortex-search-index-boosts) to adjust the weight of individual indices in a multi-index search. ## Numeric boosts and time decays You can boost or apply decays search results based on numeric or timestamp metadata. This feature is useful when you have structured metadata, such as popularity or recency signals, for each result that can help determine the relevance of documents at query time. You can specify two categories of ranking signals when making a query: | Type | Description | Applicable column types | Example metadata fields (illustrative) | | ------------- | --------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------- | ----------------------------------------------------------- | | Numeric boost | Numeric metadata that boosts results having more attention or activity. | [Numeric data type](/sql-reference/data-types-numeric) | `clicks`, `likes`, `comments` | | Time decay | Date or time metadata that boosts more recent results. The influence of recency signals decays over time. | [Date and time data type](/sql-reference/data-types-datetime) | `created_timestamp`, `last_opened_timestamp`, `action_date` | Boost and decay metadata come from columns in the source table from which a Cortex Search Service is created. You specify the metadata columns to use for boosting or decaying when you make the query, but those columns must be included when creating the Cortex Search service. When querying a Cortex Search Service, specify the columns to use for boosting or decaying in the optional `numeric_boosts` and `time_decays` fields in the `scoring_config.functions` field. You can also specify the weight for each boost or decay. ```json { "scoring_config": { "functions": { "numeric_boosts": [ { "column": "column_name", "weight": 1 }, /* ... */ ], "time_decays": [ { "column": "column_name", "weight": 1, "limit_hours": 120 }, /* ... */ ] } } } ``` ### Properties - `numeric_boosts` (array, optional): - `` (object, optional): - `column_name` (string): Specifies the numeric column to which the boost should be applied. - `weight` (float): Specifies the weight or importance assigned to the boosted column in the ranking process. When multiple columns are specified, a higher weight increases the influence of the field. - `time_decays` (array, optional): - `` (object, optional): - `column_name` (string): Specifies the time or date column to which the decay should be applied. - `weight` (float): Specifies the weight or importance assigned to the decayed column in the ranking process. When multiple columns are specified, a higher weight increases the influence of the field. - `limit_hours` (float): Sets the boundary after which time starts to have less effect on the relevance or importance of the document. For example, a `limit_hours` value of 240 indicates that documents with timestamps greater than 240 hours (10 days) in the past from the `now` timestamp do not receive significant boosting, while documents with a timestamp within the last 240 hours should receive a more significant boost. - `now` (string, optional): Optional reference timestamp from which decays are calculated in ISO-8601 format `yyyy-MM-dd'T'HH:mm:ss.SSSXXX`. For example, `"2025-02-19T14:30:45.123-08:00"`. Defaults to the current timestamp if not specified. Numeric boosts are applied as weighted averages to the returned fields, while decays leverage a log-smoothed function to demote less recent values. Weights are relative across the specified boost or decay fields. If only a single field is provided within a `boosts` or `decays` array, the value of its weight is irrelevant. If more than one field is provided, the weights are applied relative to each other. A field with a weight of 10, for example, affects the record's ranking twice as much as a field with a weight of 5. ## Reranking By default, queries to Cortex Search Services leverage *semantic reranking* to improve search result relevance. While reranking can measurably increase result relevance, it can also noticeably increase query latency. You can disable reranking in any Cortex Search query if you've found that the quality benefit that reranking provides can be sacrificed for faster query speeds in your business use case. Disabling reranking reduces query latency by 100-300ms on average, but the exact reduction in latency, as well as the magnitude of the quality degradation, varies across workloads. Evaluate results side-by-side, with and without reranking, before you decide to disable it in queries. Reranking is not supported for [batch search](/user-guide/snowflake-cortex/cortex-search/batch-cortex-search) queries. Any reranker settings in `scoring_config` are ignored when using `CORTEX_SEARCH_BATCH`. You can disable the reranker for an individual query at query time in the `scoring_config.reranker` field in the following format: ```json { "scoring_config": { "reranker": "none" } } ``` ### Properties - `reranker` (string, optional): Parameter that can be set to "none" if the reranker should be turned off. If excluded or null, the default reranker is used. ## Component weights The `weights` field in the `scoring_config` object allows you to specify the weights of individual scoring components (`vectors`, `texts`, `reranker`) in the overall score for each result. By default, the weights are set to 1.0 for each component, with an equal contribution to the overall scoring. You can specify weights in the following format: ```json { "scoring_config": { "weights": { "texts": 3, "vectors": 2, "reranker": 1 }, "functions": { // ... } } } ``` ### Properties - `weights` (object, optional): Specifies weights for combining text, vector, and reranker scores for each document. Weights are applied relative to one another within this field. For example, the following specifies that text scores should be weighted 3 times more than vector scores, and reranker scores should be weighted 2 times more than vector scores: ```json { "scoring_config": { "weights": { "texts": 3, "vectors": 1, "reranker": 2 } } } ``` ## Disabling query prefix for vector embeddings By default, Cortex Search adds a prefix to queries before computing vector embeddings. This prefix varies by model, but generally has the following format: Represent this sentence for searching relevant passages: query. This improves search quality in many cases by providing context to the embedding model, which helps differentiate search queries from other texts you have stored in the Cortex Search service. However, you might want to disable this prefix in some cases such as the following scenario: - When you want to use similarity search without the prefix. For example, if you want to search "what is the best data cloud" and you want to get "Snowflake" as a result, then use the default prefix. However, if you want to search "what is the data cloud" and you want to get "which is the best data cloud" as a result, then you can disable the prefix. You can disable the query prefix for an individual query at query time using the `disable_vector_embedding_query_prefix` parameter in the `scoring_config` field: ```json { "scoring_config": { "disable_vector_embedding_query_prefix": true } } ``` ### Properties - `disable_vector_embedding_query_prefix` (boolean, optional): When set to `true`, a search prefix is not added automatically to the query before computing vector embeddings. Defaults to `false`. Disabling the query prefix might reduce search quality in most cases because the prefix helps the embedding model understand that the text is a search query. Only disable this if you have a specific reason to do so and have evaluated the impact on your search results. ## Named scoring profiles Boosts/decays and reranker settings together form a *scoring configuration*, which can be specified in the `scoring_config` parameter when making a query. Scoring configurations can also be given a name and attached to the Cortex Search service. Using a named scoring profile lets you easily use a scoring configuration across applications and queries without having to specify the full scoring configuration each time. If you change the scoring configuration, you only need to update it in one place, not in every query. To add a scoring profile to your Cortex Search Service, use the [ALTER CORTEX SEARCH SERVICE ... ADD SCORING PROFILE](/sql-reference/sql/alter-cortex-search) command, as shown in the following example: ```sql ALTER CORTEX SEARCH SERVICE my_search_service ADD SCORING PROFILE IF NOT EXISTS heavy_comments_with_likes '{ "functions": { "numeric_boosts": [ { "column": "comments", "weight": 6 }, { "column": "likes", "weight": 1 } ] } }' ``` The syntax of the scoring profile definition is the same schema used in the `scoring_config` parameter when making a query. Scoring profiles can't be modified after being created; to change a profile, drop it and recreate it with the new scoring configuration. To delete a named scoring profile, use [ALTER CORTEX SEARCH SERVICE ... DROP SCORING PROFILE](/sql-reference/sql/alter-cortex-search). To query a Cortex Search Service using a named scoring profile, specify the profile name in the `scoring_profile` parameter when making a query, as shown in the following examples: **Python:** ```python results = svc.search( query="technology", columns=["comments", "likes"], scoring_profile="heavy_comments_with_likes", limit=10 ) ``` **REST API:** ```javascript curl --location https:///api/v2/databases//schemas//cortex-search-services/:query \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header "Authorization: Bearer $PAT" \ --data '{ "query": "technology", "columns": ["DOCUMENT_CONTENTS", "LIKES", "COMMENTS"], "scoring_profile": "heavy_comments_with_likes", "limit": 10 }' ``` **SQL:** ```sql SELECT SNOWFLAKE.CORTEX.SEARCH_PREVIEW( 'my_search_service', '{ "query": "technology", "columns": ["comments", "likes"], "scoring_profile": "heavy_comments_with_likes", "limit": 10 }' ); ``` To see a service's stored scoring profiles, query the `CORTEX_SEARCH_SERVICE_SCORING_PROFILES` view in the `INFORMATION_SCHEMA` schema, as shown in the following example: ```sql SELECT * FROM my_db.INFORMATION_SCHEMA.CORTEX_SEARCH_SERVICE_SCORING_PROFILES WHERE service_name = 'my_search_service'; ``` The DESCRIBE CORTEX SEARCH SERVICE and SHOW CORTEX SEARCH SERVICE results contain a column named `scoring_profile_count` that indicates the number of scoring profiles for each service. ## Component Scores Component Scores provide detailed scoring information for search results. They allow developers to understand how search rankings are determined and debug search performance. Scores for each result are returned in the `@scores` field for each retrieval "component" (text, vector). Component scores are useful in scenarios where there is a need to: - **Establish thresholds:** Use component scores to determine when to pass results on to a downstream process, like an agent. - **Debug search rankings:** Understand why certain documents rank higher than others in search results. ### Understanding Component Scores Component scores provide detailed breakdowns of how Cortex Search calculates the final relevance score for each search result. The scoring system consists of multiple components:
**Cosine Similarity**
Scores based on semantic similarity between the query and vector indexes. Higher scores indicate stronger conceptual or meaning-based matches using vector embeddings.
**Text Match**
Scores based on keyword/lexical similarity between the query and text indexes. Higher scores indicate stronger exact or fuzzy keyword matches.
**Reranker Score**
Scores based on meaning-based matches between the query and the value in the text index. Higher scores indicate stronger conceptual or meaning-based matches using reranker. Scores are provided only for the top results which are reranked.
**Function Scores**
Additional detailed scoring information from boost functions when applied (such as `text_boosts`, `vector_boosts`, numeric boosts, time decay). Contains nested objects for each boost type (such as `text_boost` and `vector_boost`) showing individual column scores, weights, and weighted totals. Useful for understanding how matches in different fields contribute to the final scoring of the document.
### Response format With component scores enabled, the following scoring information is returned for all your Cortex Search queries. For more information on Cortex Search Query syntax, see [](/user-guide/snowflake-cortex/cortex-search/query-cortex-search-service). ```text { "results": [ { "@scores": { "cosine_similarity": , "text_match": } } ] } ``` #### Score fields - `@scores.cosine_similarity`: Cosine similarity score between the query and the value in the vector index, in the range [-1, 1]. - `@scores.text_match`: Text match score between the query and the value in the text index. This score is unbounded and its range depends on the query. - `@scores.reranker_score`: Reranker score between the query and the value in the text index. This score is unbounded and its range depends on the query. - `@scores.function_scores`: Nested object containing detailed boost function scoring (only present when `functions` are specified in the query): - text_boost.column_scores.column_name.score: Individual score for the specified column from text boost. - text_boost.column_scores.column_name.weight: Applied weight for the specified column from text boost. - `text_boost.weighted_score`: Final weighted score from text boost function. - vector_boost.column_scores.column_name.score: Individual score for the specified column from vector boost. - vector_boost.column_scores.column_name.weight: Applied weight for the specified column from vector boost. - `vector_boost.weighted_score`: Final weighted score from vector boost function. - numeric_boost.column_scores.column_name.score: Individual score for the specified column from numeric boost. - numeric_boost.column_scores.column_name.weight: Applied weight for the specified column from numeric boost. - `numeric_boost.weighted_score`: Final weighted score from numeric boost function. - time_decay.column_scores.column_name.score: Individual score for the specified column from time decay. - time_decay.column_scores.column_name.weight: Applied weight for the specified column from time decay. - `time_decay.weighted_score`: Final weighted score from time decay function. #### Usage Notes - `cosine_similarity` scores are: - Returned for any query that includes a VECTOR INDEX. - Bounded in the range [-1, 1] and comparable across different queries. - Computed assuming normalized vectors. - Subject to minor precision loss due to compression in the vector index, which means that `cosine_similarity(v, v)` might return `1.0 +/- epsilon` rather than exactly `1.0`. Compression details might vary over time, and epsilon might not be stable. - Computed after prepending each query with a prefix that increases search quality in many cases. This prefix varies per model, but generally looks like: `Represent this sentence for searching relevant passages: {query}`. The returned cosine similarity score is the cosine similarity between the query with the prefix and the value in the vector index. - `text_match` scores are: - Returned for any query that includes a TEXT INDEX. `text_match` scores are unbounded. - Not comparable across different queries. For example, a text match score of 0.95 on a result for a given query is not comparable to a text match score of 0.95 on a result for a different query to the same service. - `@scores` values are not affected by the `weights` parameter. The weights only affect the final ordering of the results. ## Index-specific boosts Index-specific boosts adjust the weight of influence for indexes in a [multi-index Cortex Search service](#label-cortex-multi-index-search). You can adjust the text matching and vector matching weights, which are applied relative to the other provided weights. Higher values take priority over lower values, using the same behavior as component weights. ### Properties - `text_boosts` (array, optional): Index-specific weights to be applied to text index columns. When this value is present, you're required to include a weight for all text columns. Column weights are applied relative to one another. - `vector_boosts` (array, optional): Index-specific weights to be applied to vector columns. When this value is present, you're required to include a weight for all vector columns. Column weights are applied relative to one another. Index-specific weights are objects containing `column` and `weight` keys: ```text { "column": "", "weight": } ``` As an example, consider the following table indexed for search: ```sql CREATE TABLE feedback_info ( id VARCHAR, comment VARCHAR, support_note VARCHAR, sentiment VECTOR(FLOAT, 3), issue_category VECTOR(FLOAT, 3) ); ``` The following JSON shows a `scoring_config` for a multi-index Cortex Search service that de-ranks the `id` text column while boosting the `comment` text column, and adjusting the vector rankings of `sentiment` to be twice as important as other vector columns. ```json { "scoring_config": { "functions": { "text_boosts": [ { "column": "id", "weight": 1 }, { "column": "support_note", "weight": 2}, { "column": "comment", "weight": 3}, ], "vector_boosts": [ { "column": "issue_category", "weight": 1 }, { "column": "sentiment", "weight": 2 } ] } } } ``` ## Diversity In some cases, one type of result may return more results than others. To prevent a certain type of result from dominating the search results, use the `diversity` parameter. For example, if a Cortex Search Service is created using long documents and these documents are indexed by chunking, the `diversity` parameter can be used to ensure that multiple chunks from the same document are not surfaced in the final result set. You can enable diversity for an individual query at query time in the `scoring_config.diversity` field in the following format: ``` { "scoring_config": { "diversity": { "group_by": , "max_results": , } } } ``` ### Properties - `diversity` (object, optional): Parameter that can be set to "none" if result diversity should be turned off. - `group_by` (array): Columns to group by. - `max_results` (integer): Maximum number of results for each group. --- title: Detect and redact personally identifiable information (PII) source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/redact-pii.md section: Snowflake Cortex (AI & ML) --- # Detect and redact personally identifiable information (PII) This feature is not available in the People's Republic of China. - [](/sql-reference/functions/ai_redact) Personally identifiable information (PII) includes names, addresses, phone numbers, email addresses, tax identification numbers, and other data that can be used (alone or with other information) to identify an individual. Most organizations have regulatory and compliance requirements around handling PII data. [AI_REDACT](/sql-reference/functions/ai_redact) is a fully-managed Cortex AI Function that uses a large language model (LLM) to help you detect, locate, and redact PII from unstructured text data. AI_REDACT can help you prepare text for call center coaching, sentiment analysis, insurance and medical analysis, and machine learning (ML) model training, among other use cases. Use AI_PARSE_DOCUMENT or AI_TRANSCRIBE to convert document or speech data into text before applying AI_REDACT. ## AI_REDACT The AI_REDACT function has two modes of operation: `detect` and `redact`. The default is `redact`. Use AI_REDACT in `detect` mode to identify PII locations then programmatically choose which PII to redact. Use AI_REDACT in `redact` mode to replace PII in the input text with placeholder values. AI_REDACT performs detection and redaction in a best-effort manner using AI models. Always review the output to ensure compliance with your organization's data privacy policies. If AI_REDACT fails to detect or redact any PII in your data, [contact Snowflake Support](/user-guide/contacting-support). ## Regional availability See [](#label-cortex-llm-availability). ## Limitations - AI_REDACT uses AI models and may not find all personally identifiable information. Always review the output to ensure compliance with your organization's data privacy policies. If AI_REDACT fails to redact certain PII, contact [Snowflake Support](/user-guide/contacting-support). - AI_REDACT works best with well-formed English text. Performance may vary with other languages or text with many spelling, punctuation, or grammatical errors. - AI_REDACT currently supports only US PII and some UK and Canadian PII, where noted in [](#label-ai-redact-pii-categories). - AI_REDACT is currently limited in the number of tokens it can input and output. Input and output together can be up to 4,096 tokens. Output is limited to 1,024 tokens. If the input text is longer, split it into smaller chunks and redact each chunk separately, perhaps using [SPLIT_TEXT_RECURSIVE_CHARACTER](/sql-reference/functions/split_text_recursive_character-snowflake-cortex). See [](#label-ai-redact-pii-example-chunking) for an example of redacting text that exceeds token limits. A token is the smallest unit of data processed by the AI model. For English text, industry guidelines consider one token to be approximately four characters, or 0.75 words. ## Detected PII categories AI_REDACT supports the detection and redaction of the following categories of PII. The values in the Category column are the strings that are supported in the optional `categories` argument.
AI_REDACT supports partial matches for some PII categories. For example, a first name alone is sufficient to trigger redaction with the [NAME] placeholder. ## Retain specific PII with detect mode By default, AI_REDACT replaces all detected PII with placeholder values. In some cases, you might want to retain certain PII while redacting the rest. For example, you might want to redact all names in call center transcripts or customer reviews except for known employee names. Use `detect` mode to build a selective redaction workflow: 1. Call AI_REDACT with the `mode` argument set to `detect` to identify and locate PII in the input text. 2. Compare the detected spans against an allowlist of values you want to keep. 3. Redact only the PII that is not in the allowlist. When you call AI_REDACT in `detect` mode, the function returns an OBJECT containing a `spans` array. Each element in the array is an OBJECT with the following fields:
| Field | Type | Description | | ---------- | ------- | ---------------------------------------------------------------------------------------------------------------- | | `category` | VARCHAR | The PII category, such as `NAME` or `ADDRESS`. See [](#label-ai-redact-pii-categories) for supported categories. | | `start` | NUMBER | The start index of the detected PII in the input text. | | `end` | NUMBER | The end index of the detected PII in the input text. | | `text` | VARCHAR | The matched PII text from the input. |
For examples of using `detect` mode, see [](#label-ai-detect-pii-examples). ## Handle row-level errors in multi-row queries If your query fails on every row, the cause might be a known constraint rather than a row-level error. See [](#label-ai-redact-pii-limitations) for details on token limits, language support, and other restrictions. AI_REDACT raises an error if it cannot process the input text. When a query redacts multiple rows, an error causes the entire query to fail. To allow processing to continue with other rows, you can set the session parameter `AI_SQL_ERROR_HANDLING_USE_FAIL_ON_ERROR` to FALSE. Errors then return NULL instead of stopping the query. ```sql ALTER SESSION SET AI_SQL_ERROR_HANDLING_USE_FAIL_ON_ERROR=FALSE; ``` With this parameter set to FALSE, you can also pass TRUE as the final argument to AI_REDACT, which causes the return value to be an OBJECT that contains separate fields for the redacted text and any error message. One of these fields is NULL depending on whether the AI_REDACT call processed successfully. The following example shows how to use error handling when processing multiple rows: 1. Create a table with unredacted text. ```sql CREATE OR REPLACE TABLE raw_table AS SELECT 'My previous manager, Washington, used to live in Kirkland. His first name was Mike.' AS my_column UNION ALL SELECT 'My name is William and I live in San Francisco. You can reach me at (415).450.0973'; ``` 2. Set the session parameter. ```sql ALTER SESSION SET AI_SQL_ERROR_HANDLING_USE_FAIL_ON_ERROR=FALSE; ``` 3. Create a redaction table with columns for `value` and `error`. ```sql CREATE OR REPLACE TABLE redaction_table ( value VARCHAR, error VARCHAR ); ``` 4. Redact PII from `raw_table` and insert the rows into `redaction_table` to store the redacted text and error messages. ```sql INSERT INTO redaction_table SELECT result:value::STRING AS value, result:error::STRING AS error FROM (SELECT AI_REDACT(my_column, TRUE) AS result FROM raw_table); ``` ## Cost considerations AI_REDACT incurs costs based on the number of input and output tokens processed, as with other Cortex AI Functions. See the [Snowflake Pricing Guide](https://www.snowflake.com/pricing/pricing-guide/) for details. ### Estimate token usage for large datasets Before you run AI_REDACT on a large dataset, estimate input tokens on representative rows so you can plan for cost and token limits. Sample text from your table and call [AI_COUNT_TOKENS](/sql-reference/functions/ai_count_tokens) with `'ai_redact'` as the function name, using the same input text and optional `categories` argument you plan to pass to AI_REDACT. Compare the result against [AI_REDACT token limits](#label-ai-redact-pii-limitations). If a row exceeds the limit, split the text into chunks before redacting. See [Chunking example](#label-ai-redact-pii-example-chunking). AI_COUNT_TOKENS returns an estimate of input tokens only. Output token counts depend on the redacted text and aren't included in the estimate. For examples, see [AI_COUNT_TOKENS examples for AI_REDACT](/sql-reference/functions/ai_count_tokens#label-ai-count-tokens-ai-redact-examples). ## Redaction examples - [Basic redaction examples](#basic-redaction-examples) - [End-to-end example](#end-to-end-example) - [Chunking example](#chunking-example) ### Basic redaction examples The following example redacts a name and an address from the input text. ```sql SELECT AI_REDACT( input => 'My name is John Smith and I live at twenty third street, San Francisco.' ); ``` Basic redaction output: ```text My name is [NAME] and I live at [ADDRESS] ``` The following example redacts only names and email addresses from the input text. Note that the text only contains a first name, which is recognized and redacted as [NAME]. The input text does not contain an email address, so no email placeholder appears in the output. ```sql SELECT AI_REDACT( input => 'My name is John and I live at twenty third street, San Francisco.', categories => ['NAME', 'EMAIL'] ); ``` Selective redaction output: ```text My name is [NAME] and I live at twenty third street, San Francisco. ``` ### End-to-end example The following example processes rows from one table and inserts the redacted output into another table. You could use a similar approach to store the redacted data in a column in an existing table. After redaction, the text is passed to the [AI_SENTIMENT](/sql-reference/functions/ai_sentiment) function to extract overall sentiment information. 1. Create a table with unredacted text. ```sql CREATE OR REPLACE TABLE raw_table AS SELECT 'My previous manager, Washington, used to live in Kirkland. His first name was Mike.' AS my_column UNION ALL SELECT 'My name is William and I live in San Francisco. You can reach me at (415).450.0973'; ``` 2. View unredacted data. ```sql SELECT * FROM raw_table; ``` 3. Create a redaction table. ```sql CREATE OR REPLACE TABLE redaction_table (value VARCHAR); ``` 4. Redact PII from `raw_table` and insert the rows into `redaction_table`. ```sql INSERT INTO redaction_table SELECT AI_REDACT(my_column) AS value FROM raw_table; ``` 5. View redacted results. ```sql SELECT * FROM redaction_table; ``` 6. Run the AI_SENTIMENT function on redacted text. ```sql SELECT value AS redacted_text, AI_SENTIMENT(value) AS summary_sentiment FROM redaction_table; ``` ### Chunking example This example illustrates how to redact PII from long text by splitting the text into smaller chunks, redacting each chunk separately, and then recombining the redacted chunks into the final output. This approach works around AI_REDACT's token limits. 1. Create a table with patient data. ```sql CREATE OR REPLACE TABLE patients ( patient_id INT PRIMARY KEY, patient_notes TEXT ); ``` 2. Split the text into chunks, apply AI_REDACT to each chunk, and concatenate the redacted chunks. ```sql CREATE OR REPLACE TABLE final_temp_table AS WITH chunked_data AS ( SELECT patient_id, chunk.value AS chunk_text, chunk.index AS chunk_index FROM patients, LATERAL FLATTEN( input => SNOWFLAKE.CORTEX.SPLIT_TEXT_RECURSIVE_CHARACTER( patient_notes, 'none', 1000 ) ) AS chunk WHERE patient_notes IS NOT NULL AND LENGTH(patient_notes) > 0 ), redacted_chunks AS ( SELECT patient_id, chunk_index, chunk_text, TO_VARIANT(results:value) AS redacted_chunk, TO_VARIANT(results:error) AS error_string FROM ( SELECT patient_id, chunk_index, chunk_text, AI_REDACT(chunk_text,TRUE) AS results FROM chunked_data ) ), final AS ( SELECT chunk_text AS original, IFF(error_string IS NOT NULL, chunk_text, redacted_chunk) AS redacted_text, patient_id, chunk_index FROM redacted_chunks ) SELECT * FROM final; ``` 3. Query the results. ```sql SELECT patient_id, LISTAGG(redacted_text, '') WITHIN GROUP (ORDER BY chunk_index) AS full_output FROM final_temp_table GROUP BY patient_id; ``` ## Detection and selective redaction examples - [Basic detection example](#basic-detection-example) - [End-to-end with allowlist example](#end-to-end-with-allowlist-example) ### Basic detection example The following example identifies and returns the category, location, and text of each detected PII instance without redacting the input. ```sql SELECT AI_REDACT( input => 'My old manager, Washington, used to live in Washington. His first name was Mike.', return_error_details => FALSE, mode => 'detect' ); ``` Basic detection output: ```text { "spans": [ { "category": "NAME", "end": 26, "start": 16, "text": "Washington" }, { "category": "ADDRESS", "end": 54, "start": 44, "text": "Washington" }, { "category": "NAME", "end": 79, "start": 75, "text": "Mike" } ] } ``` ### End-to-end with allowlist example The following example demonstrates a selective redaction workflow that uses `detect` mode and an allowlist. It loads a list of names to retain from a staged file, uses AI_REDACT in `detect` mode to identify PII locations, and then passes the results to a Python UDF that redacts only the PII not in the allowlist. 1. Retain an allowlist of values by loading the list from a stage into a temporary table. ```sql CREATE OR REPLACE TEMP TABLE string_list (value STRING); COPY INTO string_list FROM @mystage/allowlist.txt FILE_FORMAT = ( TYPE = 'CSV' RECORD_DELIMITER = '\n' FIELD_DELIMITER = '\t' -- any char NOT in file TRIM_SPACE = TRUE SKIP_HEADER = 0 ); ``` 2. View the allowlist table ```sql SELECT * FROM string_list; ``` Allowlist table output: ```text VALUE Mike David ``` 3. Create a Python UDF that selectively redacts PII based on the allowlist. ```sql CREATE OR REPLACE FUNCTION redact_spans_with_allowlist( SPAN_DATA VARIANT, ALLOWLIST ARRAY, ORIGINAL_TEXT STRING ) RETURNS STRING LANGUAGE PYTHON RUNTIME_VERSION = '3.8' HANDLER = 'redact_text' AS $$ def redact_text(span_data, allowlist, original_text): spans = span_data.get('spans', []) # Sort descending to maintain index integrity sorted_spans = sorted(spans, key=lambda x: x['start'], reverse=True) result = original_text for span in sorted_spans: text_val = span.get('text') if text_val in allowlist: continue start, end = span['start'], span['end'] label = f"[{span['category']}]" # Splice the string result = result[:start] + label + result[end:] return result $$; ``` 4. Test the UDF. ```sql SELECT redact_spans_with_allowlist( PARSE_JSON('{"spans": [{"category": "NAME", "end": 26, "start": 16, "text": "Washington"}, {"category": "NAME", "end": 79, "start": 75, "text": "Mike"}]}'), ARRAY_CONSTRUCT('Washington'), -- This will NOT be redacted 'Hello, my name is Washington and his is Mike.' ); ``` 5. Run AI_REDACT in `detect` mode. ```sql CREATE OR REPLACE TABLE raw (message TEXT); INSERT INTO raw (message) VALUES ('My old manager, Washington, used to live in Washington. His first name was Mike.'); SELECT t.message AS message, AI_REDACT(input=>t.message, return_error_details=>FALSE, mode=>'detect') AS spans, redact_spans_with_allowlist(spans, l.str_list, message) AS result FROM raw t CROSS JOIN ( SELECT ARRAY_AGG(value) AS str_list FROM string_list ) l; ``` End-to-end with allowlist example output:
## Legal notices The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). --- title: Document Processing Playground source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/document-processing-playground.md section: Snowflake Cortex (AI & ML) --- # Document Processing Playground This feature is not available in the People's Republic of China. - [](/sql-reference/functions/ai_extract) - [](/sql-reference/functions/ai_parse_document) - [](/user-guide/snowflake-cortex/parse-document) The Document Processing Playground provides a user interface for exploring the AI_EXTRACT and AI_PARSE_DOCUMENT functions. You can upload your own documents from stage, ask questions to extract information using AI_EXTRACT, and preview both the layout and OCR results generated by AI_PARSE_DOCUMENT. The playground lets you explore how the functions process your documents, and copy the corresponding code snippets for further use. For more information, see [](/sql-reference/functions/ai_extract) and [](/user-guide/snowflake-cortex/parse-document). ## Required privileges Users must use a role that has been granted the [SNOWFLAKE.CORTEX_USER database role](#label-snowflake-db-roles-cortex-user). For information about granting this privilege, see [](#label-cortex-llm--privileges). ## Get started with the Document Processing Playground To access the Document Processing Playground: 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **AI & ML** %raa% **AI Studio**. The Document Processing Playground appears among the other Studio functions. 3. To open the Document Processing Playground, select **Open**. ## Upload your documents You can upload up to 10 documents. ### Upload your documents from a local machine - To upload documents from a local machine, you must have a personal database enabled. For more information, see [](/user-guide/personal-databases). - The maximum file size is 50 MB. 1. Select **Select Warehouse**, and then select the warehouse from the drop-down list. 2. Select **Choose file**. 3. Drag and drop files, or select **Browse** to select files from your local machine. 4. Select **Upload**. The playground appears. ### Upload your documents from a stage When you upload files from a stage, the default warehouse is selected. To change the warehouse or if you don't have a default warehouse, use **Select Warehouse** to choose a warehouse from the drop-down list. 1. Select **Add from stage**. A dialog appears. 2. Select the database, schema, and stage that contains your documents. 3. Select the document files that you want to add to the playground. 4. Select **Open playground**. The playground appears. ## The Document Processing Playground interface The Document Processing Playground interface displays a preview of a document on the right and a prompt area on the left where you can enter prompts. To change the document that you are previewing, select the document name, and then select another document from the list. The Document Processing Playground interface consists of the following tabs: - **Extraction**: The view where you can ask questions to extract information from the document. - **Markdown**: The view where you can see the markdown representation of the document. It's the LAYOUT mode output from AI_PARSE_DOCUMENT. - **Text**: The view where you can see the text representation of the document. It's the OCR mode output from AI_PARSE_DOCUMENT. ## Extract information by asking questions You can ask questions to extract information from the document. 1. Select the **Extraction** tab. 2. Select the extraction type: - To ask a question, select **Ask**. - To extract a list, select **List**. - To extract a table, select **Extract table**. 3. Create key and question pairs, for example: - Key: `company` - Question: `What is the name of the company?` 4. To confirm, select **Add Prompt**. ## Preview the markdown and text versions of the document The **Markdown** and the **Text** tabs display the results of the AI_PARSE_DOCUMENT function. - To see the Layout mode results, select the **Markdown** tab. - To see the OCR mode results, select the **Text** tab. ## Get the code snippets for further use The playground creates code snippets that use the AI_EXTRACT and AI_PARSE_DOCUMENT functions to process your documents. If you uploaded files from a local machine, you can preview and copy the code snippets: 1. In the top right corner of the interface, select **Code Snippets**. 2. Select the language of the code snippet: SQL or Python. You can now copy the code snippet. If you uploaded files from a stage, you can open the code snippet directly in Workspaces: - In the top right corner of the interface, select **Open in Workspaces**. A new workspace opens with the code snippet. ## Regional availability The Document Processing Playground is available in the following regions:
## Limitations Limitations of the AI_EXTRACT and AI_PARSE_DOCUMENT functions apply to the Document Processing Playground. For more information, see [](/sql-reference/functions/ai_extract) and [](/user-guide/snowflake-cortex/parse-document). --- title: Evaluate AI applications source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/ai-observability/evaluate-ai-applications.md section: Snowflake Cortex (AI & ML) --- # Evaluate AI applications To evaluate a generative AI application, follow these steps: 1. Build the app and instrument it using Trulens SDK (Applications built using Python are supported). 2. Register the app in Snowflake. 3. Create a run by specifying the input dataset. 4. Execute the run to generate traces and compute evaluation metrics. 5. View the evaluation results in Snowsight. ## Instrument the app After you create your generative AI application in Python, import the TruLens SDK to instrument it. The TruLens SDK provides an `@instrument()` decorator to instrument the functions in your application to generate the traces and compute the metric. - To use the decorator, add the following import to your Python application: ```python from trulens.core.otel.instrument import instrument ``` You can change the granularity of the `@instrument()` decorator depending on your requirements. ### Scenario 1: Trace a function You can add `@instrument()` before the function you need to trace. This automatically captures the inputs to the function, the outputs (return values), and the latency of execution. For example, the following code demonstrates tracing an `answer_query` function that automatically captures input query and the final response: ```python @instrument() def answer_query(self, query: str) -> str: context_str = self.retrieve_context(query) return self.generate_completion(query, context_str) ``` ### Scenario 2: Trace a function with a specific span type A span type specifies the nature of the function and improves the readability and understanding of the traces. For example, in a RAG application you can specify span type as `RETRIEVAL` for your search service (or retriever) and specify the span type as `GENERATION` for the LLM inference call. The following span types are supported: - `RETRIEVAL`: Span type for retrieval or search functions - `GENERATION`: Span type for model inference calls from an LLM - `RECORD_ROOT`: Span type for the main function in your application If you don’t specify a span type with the `@instrument()`, an `UNKNOWN` span type is assigned by default. To use span attributes, add the following import to your Python application. ```python from trulens.otel.semconv.trace import SpanAttributes ``` The following code snippet demonstrates tracing a RAG application. The span type must always be prefixed with `SpanAttributes.SpanType`. ```python @instrument(span_type=SpanAttributes.SpanType.RETRIEVAL) def retrieve_context(self, query: str) -> list: """ Retrieve relevant text from vector store. """ return self.retrieve(query) @instrument(span_type=SpanAttributes.SpanType.GENERATION) def generate_completion(self, query: str, context_str: list) -> str: """ Generate answer from context by calling an LLM. """ return response @instrument(span_type=SpanAttributes.SpanType.RECORD_ROOT) def answer_query(self, query: str) -> str: context_str = self.retrieve_context(query) return self.generate_completion(query, context_str) ``` ### Scenario 3: Trace a function and compute evaluations In addition to providing span types, you must assign relevant parameters in your application to span attributes to compute the metrics. For example, to compute context relevance in a RAG application, you must assign the relevant query and retrieval results parameter to appropriate attributes `RETRIEVAL.QUERY_TEXT` and `RETRIEVAL.RETRIEVED_CONTEXTS` respectively. The attributes required to compute each individual metric can be found in the Metrics page. The following span attributes are supported for each span type: - `RECORD_ROOT`: `INPUT`, `OUTPUT`, `GROUND_TRUTH_OUTPUT` - `RETRIEVAL`: `QUERY_TEXT`, `RETRIEVED_CONTEXTS` - `GENERATION`: None To use span attributes, you need to add the following import to your Python application. ```python from trulens.otel.semconv.trace import SpanAttributes ``` The following code snippet provides an example to compute context relevance for a retrieval service. The attributes must always follow the format `SpanAttributes..` (e.g., `SpanAttributes.RETRIEVAL.QUERY_TEXT`). ```python @instrument( span_type=SpanAttributes.SpanType.RETRIEVAL, attributes={ SpanAttributes.RETRIEVAL.QUERY_TEXT: "query", SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return", } ) def retrieve_context(self, query: str) -> list: """ Retrieve relevant text from vector store. """ return self.retrieve(query) ``` In the preceding example, `query` represents the input parameter to `retrieve_context()` and `return` represents the value returned. These are assigned to the attributes `RETRIEVAL.QUERY_TEXT` and `RETRIEVAL.RETRIEVED_CONTEXTS` to compute context relevance. ### Auto-instrument framework applications In addition to manual instrumentation using the `@instrument()` decorator, TruLens provides specialized wrappers that automatically instrument applications built with popular LLM frameworks. These wrappers provide integration and automatic tracing without requiring manual decoration of individual functions. #### TruChain for LangChain `TruChain` provides automatic instrumentation for applications built with [LangChain](https://www.langchain.com/). It automatically captures the execution of key LangChain classes including chains, LLMs, prompts, and retrievers. ```python from trulens.apps.langchain import TruChain # Wrap your LangChain application tru_recorder = TruChain( rag_chain, app_name="my_langchain_app", app_version="v1.0" ) # Use the recorder as a context manager with tru_recorder as recording: response = rag_chain.invoke(input_query) ``` `TruChain` supports: - Automatic instrumentation of LangChain Expression Language (LCEL) chains - Async support through the `ainvoke` method - Built-in selectors (`on_input`, `on_output`, `on_context`) for RAG triad evaluation #### TruGraph for LangGraph `TruGraph` provides automatic instrumentation for applications built with [LangGraph](https://langchain-ai.github.io/langgraph/). It automatically detects LangGraph applications and instruments both LangChain and LangGraph components. ```python from trulens.apps.langgraph import TruGraph # Wrap your LangGraph application tru_recorder = TruGraph( graph, app_name="my_langgraph_app", app_version="v1.0" ) # Use the recorder as a context manager with tru_recorder as recording: response = graph.invoke({"messages": [("user", input_query)]}) ``` `TruGraph` supports: - Automatic `@task` instrumentation with intelligent attribute extraction - Multi-agent evaluation capabilities - Combined instrumentation of both LangChain and LangGraph components #### TruLlama for LlamaIndex `TruLlama` provides automatic instrumentation for applications built with [LlamaIndex](https://www.llamaindex.ai/). It automatically captures the execution of key LlamaIndex classes including query engines, retrievers, and response synthesizers. ```python from trulens.apps.llamaindex import TruLlama # Wrap your LlamaIndex query engine tru_recorder = TruLlama( query_engine, app_name="my_llamaindex_app", app_version="v1.0" ) # Use the recorder as a context manager with tru_recorder as recording: response = query_engine.query(input_query) ``` `TruLlama` supports: - Automatic instrumentation of query engines, chat engines, and retrievers - Async support through `aquery`, `achat`, and `astream_chat` methods - Streaming support for LlamaIndex applications - Built-in selectors (`on_input`, `on_output`, `on_context`) for RAG triad evaluation For more information about framework-specific instrumentation, see the [TruLens documentation](https://www.trulens.org/component_guides/instrumentation/). ## Register app in Snowflake To register your generative AI application in Snowflake for capturing traces and conducting evaluations, you need to create a `TruApp` object using the TruLens SDK that records the invocation (execution) of the user's app and exports traces to Snowflake. ```python tru_app = TruApp( app: Any, app_name: str, app_version: str, connector: SnowflakeConnector, main_method: callable # i.e. app.query ) ``` If your application is built using LangChain, LangGraph, or LlamaIndex, you can use `TruChain`, `TruGraph`, or `TruLlama` respectively in place of `TruApp`. These framework-specific wrappers provide the same registration functionality while also enabling automatic instrumentation of your application. See [](#label-evaluate-auto-instrument) for more details. Parameters: - `app: Any`: an instance of the user-defined application that will later be invoked during a run for evaluation. i.e. `app = RAG()` - `app_name: str`: is the name of the application user can specify and will be maintained in the user's Snowflake account. - `app_version: str`: is the version user can specify for the app to allow experiments tracking and comparison. - `connector: SnowflakeConnector`: a wrapper class that manages snowpark session and Snowflake DB connection. - `main_method: callable` (Optional): is the entry point method for the user’s application, which tells the SDK how the app is expected to be called by users and where to start tracing the invocation of the user app (specified by app). For the example of RAG class, the main_method can be specified as `app.answer_query`, assuming the answer method is the entry point of the app. Alternatively, instrument the entry point method with span attribute RECORD_ROOT. In that case, this parameter is not required. ## Create Run To begin an evaluation job, you need to create a run. Creating a run requires a run configuration to be specified. The `add_run()` function uses the run configuration to create a new run. ### Run Configuration A run is created from a `RunConfig` ```python run_config = RunConfig( run_name=run_name, description="desc", label="custom tag useful for grouping comparable runs", source_type="DATAFRAME", dataset_name="My test dataframe name", dataset_spec={ "RETRIEVAL.QUERY_TEXT": "user_query_field", "RECORD_ROOT.INPUT": "user_query_field", "RECORD_ROOT.GROUND_TRUTH_OUTPUT": "golden_answer_field", }, llm_judge_name: "mistral-large2" ) ``` - `run_name: str`: name of the run, should be unique under the same `TruApp` - `description: str` (optional): string description of the run - `label: str` (optional): label used to group run together - `source_type: str`: specifies the source of the dataset. It can either be `DATAFRAME` for a python dataframe or `TABLE` for a user table in the Snowflake account. - `dataset_name: str`: any arbitrary name specified by the user if source_type is `DATAFRAME`. Or, a valid Snowflake table name under the user’s account under current context (database and schema) or Snowflake fully-qualified name in the form of “database.schema.table_name”. - `dataset_spec: Dict[str, str]`: a dictionary mapping supported span attributes to user’s column names in the dataframe or table. The allowed keys are span attributes as specified in the Dataset page and the allowed values are column names in the user’s specified dataframe or table. For example, “golden_answer_field” in the run config example above must be a valid column name - `llm_judge_name: str` (Optional): name to use as LLM judges during LLM-based metric computation. Please see the models page for supported judges. If not specified, the default value is `llama3.1-70b` ```python run = tru_app.add_run(run_config=run_config) ``` Request Parameters: - `run_config: RunConfig`: contains the configuration for the run. ### Retrieve Run Retrieves the run. ```python run = tru_app.get_run(run_name=run_name) ``` Request parameters: - `run_name: str`: name of the run ### View Run metadata Describes the details of the run. ```python run.describe() ``` ### Invoke Run You can invoke the run using the `run.start()` function. It reads the inputs from the dataset specified in the run configuration, invokes the application for each input, generates the traces, and ingests the information for storage in your Snowflake account. `run.start()` is a blocking call until the application is invoked for all inputs in your dataset and ingestion is completed or timed out. ```python run.start() # if source_type is "TABLE" run.start(input_df=user_input_df) # if source_type is "DATAFRAME" ``` Request Parameters: - `input_df: DataFrame` (Optional): is a pandas dataframe from the SDK. If the source_type in run configuration is specified as `DATAFRAME`, this field is mandatory. If the source_type is `TABLE`, this field is not required. ### Compute metrics You can start metric computations using `run.compute_metrics()` after the application is invoked and all traces are ingested. As long as the status of the run is `INVOCATION_IN_PROGRESS`, computation cannot be started. Once the status is `INVOCATION_COMPLETED` or `INVOCATION_PARTIALLY_COMPLETED`, `run.compute_metrics()` can be initiated. `run.compute_metrics()` is an asynchronous non-blocking function. You can call `compute_metrics` multiple times on the same run with a different set of metrics, and each call will trigger a new computation job. Note that metrics once computed cannot be re-computed again for the same run. ```python run.compute_metrics(metrics=[ "coherence", "answer_relevance", "groundedness", "context_relevance", "correctness", ]) ``` Request Parameters: - `metrics: List[str]`: list of string names of the metrics listed in Metrics. The name of metrics should be specified in snake cases. i.e. Context Relevance should be specified as `context_relevance`. ### Check Run Status You can check the status of the run after it is in progress. The list of statuses are in Run Status section. ```python run.get_status() ``` ### Cancel Run You can cancel an existing run using `run.cancel()`. This operation will prevent any future updates to the run, including run status and metadata fields. ```python run.cancel() ``` ### Delete Run You can delete an existing run using `run.delete()`. This operation deletes the metadata associated with the run and the evaluation results cannot be accessed. However, the traces and evaluations generated as part of the runs are not deleted and remain stored. Please refer to Observability data section for more information about storage and deletion of evaluation and traces. ```python run.delete() ``` ### List Runs for an application You can see the list of all available runs corresponding to a specific `TruApp` application object using the `list_runs()` function. ```python tru_app.list_runs() ``` Response: Return a list of all Runs created under the `tru_app`. ## View Evaluations and Traces To view evaluation results do the following: 1. Sign in to %sf-web-interface-link%. 2. In the navigation menu, select **AI & ML** %raa% **Evaluations**. Do the following to view the evaluation results for your application runs: - To view runs corresponding to a specific application, select the application. - To view the evaluation results for a run, select the run. You view the aggregated results and the results corresponding to each record. - To view traces for a record, select it. You can view detailed traces, latency, inputs and outputs into each stage of the application, evaluation results, and explanation provided by the LLM judge for the accuracy score that have been generated. To compare runs that use the same dataset, select multiple runs and select **Compare** to compare the outputs and the evaluation scores. --- title: Extracting information from documents with AI_EXTRACT source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/document-extraction.md section: Snowflake Cortex (AI & ML) --- # Extracting information from documents with AI_EXTRACT Available natively to accounts in [select regions](#label-cortex-llm-availability). Available through [cross-region inference](/user-guide/snowflake-cortex/cross-region-inference) to accounts in all regions. This feature is not available in the People's Republic of China. - [](/sql-reference/functions/ai_extract) AI_EXTRACT is a Cortex AI Function that lets you extract structured information, such as entities, lists, and tables, from text or document files, by asking questions in natural language or by describing information to be extracted. It can be used with other functions to create custom document processing pipelines for a variety of use cases (see [](/user-guide/snowflake-cortex/ai-documents)). AI_EXTRACT can process documents of various formats in multiple languages and extract information from both text-heavy paragraphs and content in a graphical form, such as logos, handwritten text (for example, signatures), tables, or checkmarks). AI_EXTRACT can extract information in the following structured formats: - Entity: Ask questions in natural language or describe the information to be extracted (such as city, street, or ZIP code). - List (or array): You can provide a JSON schema to extract an array or list of information present in the document, such as the name of all account holders in a bank statement or a list of all addresses in a Document. - Table: Provide a JSON schema to extract tabular data present in the document by specifying the table title and a list of columns that should be extracted. AI_EXTRACT scales automatically with your workload by processing multiple documents simultaneously. Documents can be processed directly from object storage to avoid unnecessary data movement. AI_EXTRACT is currently incompatible with custom [network policies](/user-guide/network-policies). For more information on AI_EXTRACT, including supported languages, regional availability, and cost, see [](/sql-reference/functions/ai_extract). ## Extraction quality AI_EXTRACT uses `arctic-extract`, a proprietary vision based large language model (LLM) that delivers high extraction accuracy. The following table presents the model's scores on various standard benchmarks, with the scores of other popular models for comparison: ### Visual question answering (VQA) | Offering | DocVQA score | | ------------------------------------ | ------------ | | Human evaluation | 0.9811 | | **Snowflake Arctic-Extract** | 0.9433 | | Azure OpenAI GPT-o3 | 0.9339 | | Google Gemini 2.5-Pro | 0.9316 | | Google Anthropic Claude 4 Sonnet | 0.9119 | | Azure Document Intelligence + GPT-o3 | 0.8853 | | Google Document AI + Gemini | 0.8497 | | Azure OpenAI GPT-o3 | 0.9339 | | AWS Textract | 0.8313 | ### Text-only question answering (SQuAD v2) | Offering | ANLS | Exact match | | ---------------------------- | ----- | ----------- | | **Snowflake Arctic-Extract** | 81.18 | 78.74 | | Anthropic Claude 4 Sonnet | 80.54 | 77.10 | | Meta LLaMA 3.1 405B | 80.37 | 76.56 | | Meta LLaMA 4 Scout | 74.30 | 70.70 | | OpenAI GPT 4.1 | 70.71 | 66.81 | | Meta LLaMA 3.1 8B | 59.13 | 54.48 | ## Question optimization for extracting information When you work with AI_EXTRACT, use natural language to ask questions about your documents. To ask a question that returns an accurate answer, follow these guidelines: - Use plain English. - For each question, know what answers you expect. - Be specific; for example, if the document includes several dates (such as issuing date and signature date), do not ask "What is the date?" without including more details. - Ask for a single value in each question. - Do not expect AI_EXTRACT to guess your intentions or have extended knowledge in a specific domain. Consider the following document as an example. This purchase and sale agreement includes information such as the offer expiration date, the names of the buyers and the seller, and the included items. ![Example document (condominium purchase and sale agreement).](/static/images/cortex-document/condo-agreement.png) The following table provides examples of questions you can ask AI_EXTRACT and the expected answers.
## Table extraction best practices This section provides best practices when working with table extraction in AI_EXTRACT. ### Use one schema for a specific type of document Each extraction workload must contain documents of the same type, and the data that you want to extract should be similar for most of the tables. If the number of columns in the source document differs from one document to another, but all documents contain a defined subset of columns to be extracted and the common columns have the same or a similar name and location, then those common columns can be extracted. For example, invoices may have different numbers of columns with various data, but if all of the tables have the same first three columns — `Item Description`, `Quantity`, and `Price` — that data can be extracted. ### Use natural language to define column names You can copy the column names from the document so that they're exactly the same. For example, don't name the columns `product_code` or `REPORT_DATE`; instead, name them `Product Code` or `Report Date`. ### Skip the empty rows When you create a fine-tuning dataset, skip the rows with no answer (where the returned answer would be `None`). ### Define the columns in the same order they appear in the document To improve accuracy, define the columns in the same order as they appear in the document, which is usually from left to right, or top to bottom for the [transposed tables](#label-ai-extract-table-extraction-best-practices-transpose-tables). If you choose to define the order differently, training might be needed. However, for columns where values are the same for multiple rows, such as `Invoice Number` and `Invoice Date`, add these columns at the beginning. For example: - `Invoice Number` - `Invoice Date` - `Item Code` - `Item Name` - `Quantity` ### Define values using casing from the document When possible, define values using casing (uppercase and lowercase) from the document. If the casing in the document is varied, use capitalization. ### Use the description field The `description` field in the AI_EXTRACT response format is optional; in most cases, you don't have to fill it in. However, if there are multiple similar tables in a document, the model might answer inaccurately. If the answers come from a different source table than expected or the model can't find the table, try using the `description` field. Add information that helps the model identify the right table, such as the table title or number. ### Add a section column to describe the layout of the table If the table is divided into multiple named sections, add a section column. This helps the model understand the layout better and improve the accuracy. For example, you can name the column `Section`, `Item section`, or `Item category`. If there is a second level of nesting in the sections, you can add two columns: `Section` and `Subsection`. ### To group values, create an additional column You can add a column to the existing table to group values. In this way, you can join results from the whole document set in a single table; for example:
| Invoice Number | Item Details | Item Price | Quantity | | -------------- | ------------ | ---------- | -------- | | A | Item A1 | 10.00 | 1 | | A | Item A2 | 20.00 | 1 | | A | Item A3 | 30.00 | 1 | | B | Item B1 | 15.00 | 1 | | B | Item B2 | 25.00 | 1 | | B | Item B3 | 35.00 | 1 |
Note that the value in the first column is repeated for corresponding items. ### Make the column names distinguishable between documents Try to semantically distinguish a column. Don't use names such as `col1`, `val1`, `item1`. In some cases, transposition can work better, especially when the row names don't differ between documents or differ slightly and are within a closed set of values. Note that training on the specified column set might improve the results. ### Use the parent name as a prefix when working with hierarchical headers To extract information from tables with hierarchical headers, join the header path using each parent name as a prefix. For example, for the following table, define the columns as: - `Category A Type X Column 1` - `Category A Type Y Column 2` - `Category A Type Y Column 3` - `Category B Column 4` - `Category B Column 5` ![A table with headers named Category A and Category B, where Category A includes subheaders: Type X and Type Y.](/static/images/cortex-document/hierarchical-table-example.png) ### Transpose the tables if needed You can extract information from transposed tables by using values from the first column of the table in the document as column names in the output table. For example, for the following table, name the columns: - `Type A: Item 1` - `Type A: Item 2` - `Type B: Item 3` - `Type B: Item 4` ![An example of a table that can be transposed.](/static/images/cortex-document/transposing-table-example.png) Note that this example includes [hierarchical headers](#label-ai-extract-table-extraction-best-practices-hierarchical). ### For large tables, split the document The model for table extraction returns answers that are up to 4096 tokens long. It means that the model stops extracting when it reaches that limit. You can approach this in the following ways: - If the table covers several pages, split the document into multiple one-page documents, and join the results in postprocessing. - If the table is so dense that the data can't be extracted even from a single page, divide the table by columns. For example, if the table contains 10 columns, try defining two separate values: one with 5 columns from the left half, and the other with 5 columns from the right half of the table. You might need to experiment with the column choice for best results. ### Create names for the columns that don't have a name in the document If the first column in the document doesn't have a name, you must create that name yourself when defining the value. You can approach it in the following ways: - Use the table title or a significant part of the title. - Create a descriptive name that represents the data in the column; for example, `description`, `type of asset`, `year`, `category`. ### Compare data from two different periods of time If you want to compare data from two different periods of time, for example, years 2023 and 2024 in financial documents such as annual reports, you can prefix the columns with "current" and "previous". Note that training might be needed to improve the results. ## Examples: Extract information from a purchase and sale agreement The following examples extract information from the condominium purchase and sale agreement which you can view in the [](#label-ai-extract-question-optimization) section. ### Extract an entity Extract the seller name and the offer expiration date: ```sql SELECT AI_EXTRACT( file => TO_FILE('@db.schema.stage','document.pdf'), responseFormat => [['seller_name', 'What is the seller name?'], ['address', 'What is the offer expiration date?']] ); ``` Result: ```json { "error": null, "response": { "address": "12/12/2023", "seller_name": "Paul Doyle" } } ``` ### Extract checkbox information Extract information about items that are not included, based on the checkboxes marked in the document: ```sql SELECT AI_EXTRACT( file => TO_FILE('@db.schema.stage','document.pdf'), responseFormat => [['flat_items', 'What items are not included with the flat?'], ['default', 'What Default is selected?']] ); ``` Result: ```json { "error": null, "response": { "default": "Forfeiture of Earnest Money", "flat_items": "dryer, security system, satellite dish, wood stove, fireplace insert, hot tub, attached speaker(s), generator, other" } } ``` ### Extract signature status Extract information about whether the agreement has been signed: ```sql SELECT AI_EXTRACT( file => TO_FILE('@db.schema.stage','document.pdf'), responseFormat => [['signature', 'Is this document signed?']] ); ``` Result: ```json { "error": null, "response": { "signature": "no" } } ``` ### Extract a list of entities Extract a list of buyer names: ```sql SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'buyer_list': { 'description': 'What are the buyer names?', 'type': 'array' } } } } ); ``` Result: ```json { "error": null, "response": { "buyer_list": [ "John Davis", "Jane Davis" ] } } ``` ## Example: Extract information from a table This example extracts information from the following document: ![Table: Granger Causality Tests - P-values](/static/images/cortex-document/granger-causality.png) ```sql SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'income_table': { 'description': 'Table 2: Granger Causality Tests - P-values', 'type': 'object', 'column_ordering': ['description', 'countries','lags','z','z_approx'], 'properties': { 'description': { 'description': 'Description', 'type': 'array' }, 'countries': { 'description': 'Countries', 'type': 'array' }, 'lags': { 'description': 'Lags', 'type': 'array' }, 'z': { 'description': 'Z', 'type': 'array' }, 'z_approx': { 'description': 'Z approx.', 'type': 'array' } } } } } ); ``` Result: ```json { "error": null, "response": { "income_table": { "countries": [ "33","80","29","84","34" ], "description": [ "Commodity exporters", "Non-commodity exporters", "AE", "EMDE", "Large or market-dominant countries" ], "lags": [ "2","1","1","1","1" ], "z": [ "0.11","0.08","0.89","0.12","0.07" ], "z_approx": [ "0.25","0.19","0.95","0.25","0.14" ] } } } ``` ## Legal notices The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). --- title: Feedback REST API source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-feedback-rest-api.md section: Snowflake Cortex (AI & ML) --- # Feedback REST API This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/cortex-agents) Use this API to collect feedback about Cortex Agents from end users. ## Collect feedback about a Cortex Agent `POST /api/v2/databases/{database}/schemas/{schema}/agents/{name}:feedback` Creates a feedback event for a Cortex Agent response. ### Request #### Path parameters
#### Request headers
#### Request body The request body contains the feedback details for the agent response.
#### Example request body for agent-level feedback ```json { "categories": [ "Something worked well" ], "feedback_message": "this is fantastic!", "positive": true } ``` #### Example request body for request-level feedback ```json { "orig_request_id": "aa123456-789a-a1-2a34-a1a234a56789", "categories": [ "Something worked well" ], "feedback_message": "this is fantastic!", "positive": true } ``` ### Response A successful response returns a confirmation message. #### Response headers
#### Response body ```json { "status": "Feedback submitted successfully" } ``` ## View feedback for Cortex Agents For information about required privileges and how to query feedback events (including example SQL), see [](#label-cortex-agents-view-feedback). --- title: Fine-tuning (Snowflake Cortex) source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-finetuning.md section: Snowflake Cortex (AI & ML) --- # Fine-tuning (Snowflake Cortex) This feature is not available in the People's Republic of China. Support for this feature is available to accounts in the following regions:
- AWS US West 2 (Oregon) - AWS US East 1 (N. Virginia) - AWS Europe Central 1 (Frankfurt) - Azure East US 2 (Virginia)
Support for inference of fine-tuned models is available to accounts in regions that support the COMPLETE function for the base model. For details, see [](#label-cortex-llm-availability). - [](/sql-reference/functions/finetune-snowflake-cortex) The %sf-fine-tuning% function offers a way to customize large language models for your specific task. This topic describes how the feature works and how to get started with creating your own fine-tuned model. ## Overview %sf-fine-tuning-short% allows users to leverage parameter-efficient fine-tuning (PEFT) to create customized adaptors for use with pre-trained models on more specialized tasks. If you don't want the high cost of training a large model from scratch but need better latency and results than you're getting from prompt engineering or even retrieval augmented generation (RAG) methods, fine-tuning an existing large model is an option. Fine-tuning allows you to use examples to adjust the behavior of the model and improve the model's knowledge of domain-specific tasks. %sf-fine-tuning-short% is a fully managed service that lets you fine-tune popular LLMs using your data, all within Snowflake. %sf-fine-tuning-short% features are provided as a Snowflake Cortex function, [FINETUNE](/sql-reference/functions/finetune-snowflake-cortex), with the following arguments: - [CREATE](#label-cortex-finetuning-create): Creates a fine-tuning job with the given training data. - [SHOW](#label-cortex-finetuning-show): Lists all the fine-tuning jobs in the current account. - [DESCRIBE](#label-cortex-finetuning-describe): Describes the progress and status of a particular fine-tuning job. - [CANCEL](#label-cortex-finetuning-cancel): Cancels a given fine-tuning job. ## Cost considerations The %sf-fine-tuning% function incurs compute cost based on the number of tokens used in training. In addition, running the [](/sql-reference/functions/ai_complete) function on a fine-tuned model incurs compute costs based on the number of tokens processed. Refer to the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf) for each cost in credits per million tokens. A token is the smallest unit of text processed by the %sf-fine-tuning% function, approximately equal to four characters of text. The equivalence of raw input or output text to tokens can vary by model. - For the COMPLETE function, which generates new text in the response, both input and output tokens are counted. - Fine-tuning trained tokens are calculated as follows: ``` Fine-tuning trained tokens = number of input tokens * number of epochs trained ``` Use the [](/sql-reference/functions/finetune-describe) to see the number of trained tokens for your fine-tuning job. - In addition to tuning and inference charges, standard [storage](/user-guide/cost-understanding-data-storage) and [warehouse](/user-guide/cost-understanding-compute) costs apply for storing the customized adaptors and running SQL commands. ### Track credit consumption for Fine-tuning training To view the credit and token consumption for fine-tuning training jobs, use the [](/sql-reference/account-usage/cortex_fine_tuning_usage_history): ```sql SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_FINE_TUNING_USAGE_HISTORY; ``` ## Other considerations - Fine-tuning jobs are often long running and are not attached to a worksheet session. - The number of rows in the training/validation dataset is limited by the base model and the number of training epochs. The following table shows the limits for 3 epochs: ``` Effective row count limit = 1 epoch limit for base model / number of epochs trained ```
| Model | 1 epoch | 3 epochs (default) | | -------------- | ------- | ------------------ | | `llama3-8b` | 186k | 62k | | `llama3-70b` | 21k | 7k | | `llama3.1-8b` | 150k | 50k | | `llama3.1-70b` | 13.5k | 4.5k | | `mistral-7b` | 45k | 15k | | `mixtral-8x7b` | 27k | 9k |
## Access control requirements To run a fine-tuning job, the role that creates the fine-tuning job needs the following privileges:
| Privilege | Object | Notes | | ------------------------- | -------- | ---------------------------------------------------------------------- | | USAGE | DATABASE | The database that the training (and validation) data are queried from. | | CREATE MODEL or OWNERSHIP | SCHEMA | The schema that the model is saved to. |
The following SQL is an example of granting the CREATE MODEL privilege to a role, `my_role`, on `my_schema`. ```sql GRANT CREATE MODEL ON SCHEMA my_schema TO ROLE my_role; ``` Additionally, to use the FINETUNE function, the ACCOUNTADMIN role must grant the SNOWFLAKE.CORTEX_USER database role to the user who will call the function. See [LLM Functions required privileges](#label-cortex-llm--privileges) topic for details. To give other roles access to use the fine-tuned model, you must grant usage on the model. For details, see [](#label-security-access-control-model-privileges). ## Models available to fine-tune You have the following base models that you can fine-tune. Models available for fine-tuning may be added or removed in the future: | Name | Description | | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `llama3-8b` | A large language model from Meta that is ideal for tasks that require low to moderate reasoning like text classification, summarization, and sentiment analysis. | | `llama3-70b` | An LLM from Meta that delivers state of the art performance ideal for chat applications, content creation, and enterprise applications. | | `llama3.1-8b` | A large language model from Meta that is ideal for tasks that require low to moderate reasoning. It's a light-weight, ultra-fast model with a context window of 24K. | | `llama3.1-70b` | An open source model that demonstrates state-of-the-art performance ideal for chat applications, content creation, and enterprise applications. It is a highly performant, cost effective model that enables diverse use cases. | | `mistral-7b` | 7 billion parameter large language model from Mistral AI that is ideal for your simplest summarization, structuration, and question answering tasks that need to be done quickly. It offers low latency and high throughput processing for multiple pages of text with its 32K context window. | | `mixtral-8x7b` | A large language model from Mistral AI that is ideal for text generation, classification, and question answering. Mistral models are optimized for low latency with low memory requirements, which translates into higher throughput for enterprise use cases. | ## How to fine-tune a model The overall workflow for tuning a model is as follows: 1. [Prepare the training data](#label-cortex-finetuning-data-prep). 2. [Start the fine-tuning job with the required parameters](#label-cortex-finetuning-start-job). 3. [Monitor training job](#label-cortex-finetuning-monitor). Once training is complete, you can use the model name provided by %sf-fine-tuning-short% to run inference on your model. ### Prepare the fine-tuning data The fine-tuning data must come from a Snowflake table or view and the query result must contain columns named `prompt` and `completion`. If your table or view does not contain columns with the required names, use a column alias in your query to name them. This query is given as a parameter to the FINETUNE function. You will get an error if the results do not contain `prompt` and `completion` column names. All columns other than the prompt and completion columns will be ignored by the FINETUNE function. Snowflake recommends using a query that selects only the columns you need. The following code calls the FINETUNE function and uses the `SELECT ... AS` syntax to set two of the columns in the query result to `prompt` and `completion`. ```sql SELECT SNOWFLAKE.CORTEX.FINETUNE( 'CREATE', 'my_tuned_model', 'mistral-7b', 'SELECT a AS prompt, d AS completion FROM train', 'SELECT a AS prompt, d AS completion FROM validation' ); ``` To get responses that follow a schema you define, use structured outputs to generate fine-tuning data. For more information about structured outputs, see [](/user-guide/snowflake-cortex/complete-structured-outputs). A prompt is an input to the LLM and completion is the response from the LLM. Your training data should include prompt and completion pairs that show how you want the model to respond to particular prompts. The following are additional recommendations and requirements regarding your training data for getting optimal performance from fine-tuning. - Start with a few hundred examples. Starting with too many examples may increase tuning time drastically with minimal improvement in performance. - For each example, you must use only a portion of the allotted context window for the base model you are tuning. Context window is defined in terms of tokens. A token is the smallest unit of text processed by Snowflake Cortex functions, approximately equal to four characters of text. Prompt and completion pairs that exceed this limit will be truncated, which may negatively impact the quality of the trained model. - The portion of the context window allotted for `prompt` and `completion` for each base model is defined in the following table:
| Model | Context Window | Input Context (prompt) | Output Context (completion) | | ------------ | -------------- | ---------------------- | --------------------------- | | llama3-8b | 8k | 6k | 2k | | llama3-70b | 8k | 6k | 2k | | llama3.1-8b | 24k | 20k | 4k | | llama3.1-70b | 8k | 6k | 2k | | mistral-7b | 32k | 28k | 4k | | mixtral-8x7b | 32k | 28k | 4k |
### Start the fine-tuning job You can start a fine-tuning job by calling the [SNOWFLAKE.CORTEX.FINETUNE function and passing in 'CREATE' as the first argument](#label-cortex-finetuning-create) or using %sf-web-interface%. #### Use SQL This example uses the `mistral-7b` model as the base model to create a job with a model output name of `my_tuned_model` and training and validation data querying from the `my_training_data` and `my_validation_data` tables respectively. ```sql USE DATABASE mydb; USE SCHEMA myschema; SELECT SNOWFLAKE.CORTEX.FINETUNE( 'CREATE', 'my_tuned_model', 'mistral-7b', 'SELECT prompt, completion FROM my_training_data', 'SELECT prompt, completion FROM my_validation_data' ); ``` You can use absolute paths for each of the database objects such as the model or data if you want to use different database and schema for each. The following example shows creating a fine-tuning job with data from `mydb2.myschema2` database and schema and saving the fine-tuned model to the `mydb.myschema` database and schema. ```sql SELECT SNOWFLAKE.CORTEX.FINETUNE( 'CREATE', 'mydb.myschema.my_tuned_model', 'mistral-7b', 'SELECT prompt, completion FROM mydb2.myschema2.my_training_data', 'SELECT prompt, completion FROM mydb2.myschema2.my_validation_data' ); ``` The [SNOWFLAKE.CORTEX.FINETUNE function with 'CREATE' as the first argument](#label-cortex-finetuning-create) returns a fine-tuned model ID as the output. Use this ID to get status or job progress using the [SNOWFLAKE.CORTEX.FINETUNE function with 'DESCRIBE' as the first argument](#label-cortex-finetuning-describe). #### Use %sf-web-interface% Follow these steps to create a fine-tuning job in the %sf-web-interface%: 1. Sign in to %sf-web-interface-link%. 2. Choose a role that is granted the SNOWFLAKE.CORTEX_USER database role. 3. In the navigation menu, select **AI & ML** %raa% **AI Studio**. 4. Select **Fine-tune** from the **Create Custom LLM** box. 5. Select a base model using the drop-down menu. 6. Select the role under which the fine-tuning job will execute and the warehouse where it will run. The role must be granted the SNOWFLAKE.CORTEX_USER database role. 7. Select a database in which to store the fine-tuned model. 8. Enter a name for your fine-tuned model, then select **Let's go**. 9. Select the table or view that contains your training data, then select **Next**. The training data can come from any database or schema that the role has access to. 10. Select the column that contains the prompts in your training data, then select **Next**. 11. Select the column that contains the completions in your training data, then select **Next**. 12. If you have a validation dataset, select the table or view that contains your validation data, then select **Next**. If you don't have separate validation data, select **Skip this option**. 13. Verify your choices, then select **Start training**. The final step confirms that your fine-tuning job has started and displays the **Job ID**. Use this ID to get status or job progress using the [SNOWFLAKE.CORTEX.FINETUNE function with 'DESCRIBE' as the first argument](#label-cortex-finetuning-describe). ### Manage fine-tuned jobs Fine-tuning jobs are long running, which means they are not tied to a worksheet session. You can check the status of your tuning job using the [SNOWFLAKE.CORTEX.FINETUNE](/sql-reference/functions/finetune-snowflake-cortex) function with [SHOW](#label-cortex-finetuning-show) or ['DESCRIBE'](#label-cortex-finetuning-describe) as the first argument. If you no longer need a fine-tuning job, you can terminate the job using the [SNOWFLAKE.CORTEX.FINETUNE](/sql-reference/functions/finetune-snowflake-cortex) function with [CANCEL](#label-cortex-finetuning-cancel) as the first argument and the job ID as the second argument. ### Analyze fine-tuned models After a fine-tuning job completes, you can analyze the results of the training process by examining the fine-tuned model's artifacts. The OWNERSHIP privilege on the model is required to access the fine-tuned model's artifacts; for details, see [](#label-security-access-control-model-privileges). The artifacts include a `training_results.csv` file. This CSV file contains one header row followed by a row for each training step recorded by the fine-tuning job. The file contains the following columns:
| Column name | Description | | --------------- | ------------------------------------------------------------------------------------------------------ | | step | Number of training steps completed in the entire training process. Starts at 1. | | epoch | The epoch in the training process. Starts at 1. | | training_loss | The loss for the training batch. A lower number indicates a closer fit between the model and the data. | | validation_loss | The loss on the validation dataset. This is only available at the last step in each epoch. |
The `training_results.csv` file can be found in the [Model Registry UI](/developer-guide/snowflake-ml/model-registry/snowsight-ui) in Snowsight and accessed directly via SQL or Python API. For more information, see [](#label-snowpark-model-registry-model-artifacts). ## Use your fine-tuned model for inference Use the [AI_COMPLETE function](/sql-reference/functions/ai_complete) with the name of your fine-tuned model to make inferences. This example shows a call to the [AI_COMPLETE](/sql-reference/functions/ai_complete) function with the name of your fine-tuned model. ```sql SELECT AI_COMPLETE( 'my_tuned_model', 'How to fine-tune mistral models' ); ``` The following is a snippet of the output from the example call:
```text Mistral models are a type of deep learning model used for image recognition and classification. Fine-tuning a Mistral model involves adjusting the model's parameters to ... ```
## Limitations and known issues - Fine-tuning jobs are listable at the account-level only. - The fine-tuning jobs returned from [](/sql-reference/functions/finetune-show) are not permanent and may be garbage collected periodically. - If a base model is removed from the Cortex LLM Functions, your fine-tuned model will no longer work. ## Sharing models Fine-tuned models can be shared to other accounts with the USAGE privilege via [Data Sharing](/user-guide/data-sharing-intro). ## Replicating models [](/user-guide/snowflake-cortex/cross-region-inference) does not support fine-tuned models. Inference must take place in the same region where the model object is located. You can use database replication to replicate the fine-tuned model object to a region you want to make inference from if it's different than the region the model was trained in. For example, if you create a fine-tuned model based on `mistral-7b` in your account in the AWS US West 2 region, you can use data sharing to share it with another account in this region, or you can use database replication to replicate the model to another account in your organization in a different region that supports the `mistral-7b` model, such as AWS Europe West. For details on replicating objects, see [](/user-guide/account-replication-config). ## Legal notices The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). The data classification of inputs and outputs are as set forth in the following table.
For additional information, refer to [Snowflake AI and ML](/guides-overview-ai-features). --- title: Fine-tuning `arctic-extract` models source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/arctic-extract-finetuning.md section: Snowflake Cortex (AI & ML) --- # Fine-tuning `arctic-extract` models This feature is not available in the People's Republic of China. Support for this feature is available to accounts in the following regions:
- AWS US West 2 (Oregon) - AWS US East 1 (N. Virginia) - AWS Europe Central 1 (Frankfurt) - Azure East US 2 (Virginia)
- [](/sql-reference/functions/ai_extract) - [](/developer-guide/snowflake-ml/dataset) - [](/sql-reference/sql/create-dataset) - [](/user-guide/snowflake-cortex/cortex-finetuning) You can fine-tune `arctic-extract` models using the [Snowflake Cortex Fine-tuning](/user-guide/snowflake-cortex/cortex-finetuning) function and [](/developer-guide/snowflake-ml/dataset). The fine-tuned model can then be used for inference with the [](/sql-reference/functions/ai_extract) function. ## Syntax For specific syntax, usage notes, and examples, see: - [](#label-arctic-extract-finetune-create) - [](#label-arctic-extract-finetune-describe) - [](/sql-reference/functions/finetune-show) - [](/sql-reference/functions/finetune-cancel) ### FINETUNE ('CREATE') (SNOWFLAKE.CORTEX) Creates a fine-tuning job. #### Syntax ```sqlsyntax SNOWFLAKE.CORTEX.FINETUNE( 'CREATE', '@..', 'arctic-extract', '' [ , '' [, '' ] ] ) ``` #### Required parameters
`'CREATE'`
Specifies that you want to create a fine-tuning job.
'training_dataset'
Dataset object to use for training. For more information, see [](#label-arctic-extract-finetune-dataset-requirements).
#### Optional parameters
'validation_dataset'
Dataset object to use for validation. For more information, see [](#label-arctic-extract-finetune-dataset-requirements).
'options'
Optional. A string representation of a JSON object that sets training hyperparameters for the job. You can specify `max_epochs` with an integer from 2 through 10 (inclusive) to control how many epochs the job runs. If you omit `options`, the number of epochs is determined automatically by the system. For the JSON format of the `options` argument, see [FINETUNE ('CREATE') (SNOWFLAKE.CORTEX)](/sql-reference/functions/finetune-create).
#### Access control requirements
| Privilege | Object | Notes | | ------------------ | -------- | -------------------------------------------------------------------------------------------------------------------------------------- | | USAGE or OWNERSHIP | DATABASE | The database that the Dataset object is stored in. | | USAGE or OWNERSHIP | SCHEMA | The schema that the Dataset object is stored in. | | READ or OWNERSHIP | STAGE | The internal or external named stage that stores the document files. For more information, see [](/user-guide/snowflake-cortex/aisql). | | USAGE or OWNERSHIP | SCHEMA | The schema that the fine-tuned model is stored in. | | CREATE MODEL | SCHEMA | The schema that the fine-tuned model is stored in. |
Additionally, to use the FINETUNE function, the ACCOUNTADMIN role must grant the SNOWFLAKE.CORTEX_USER database role to the user who will call the function. See [LLM Functions required privileges](#label-cortex-llm--privileges) topic for details. #### Example ```sql SELECT SNOWFLAKE.CORTEX.FINETUNE( 'CREATE', '@database.schema.model_name', 'arctic-extract', 'snow://dataset/training_ds/versions/2', 'snow://dataset/validation_ds/versions/4' ); ``` The following example adds the `options` argument to set `max_epochs`: ```sql SELECT SNOWFLAKE.CORTEX.FINETUNE( 'CREATE', '@database.schema.model_name', 'arctic-extract', 'snow://dataset/training_ds/versions/2', 'snow://dataset/validation_ds/versions/4', '{"max_epochs": 3}' ); ``` ### FINETUNE ('DESCRIBE') (SNOWFLAKE.CORTEX) Describes the properties of a fine-tuning job. For syntax and parameters, see [](/sql-reference/functions/finetune-describe). An example output for a successful job when fine-tuning `arctic-extract` model: ```text { "base_model":"arctic-extract", "created_on":1717004388348, "finished_on":1717004691577, "id":"ft_6556e15c-8f12-4d94-8cb0-87e6f2fd2299", "model":"mydb.myschema.my_tuned_model", "progress":1.0, "status":"SUCCESS", "training_data":"snow://dataset/training_ds/versions/2", "trained_tokens":2670734, "training_result":{"validation_loss":1.0138969421386719,"training_loss":0.6477728401547047}, "validation_data":"snow://dataset/validation_ds/versions/4", } ``` ## Dataset requirements The [Dataset](/developer-guide/snowflake-ml/dataset) used for training and validation must contain the following columns:
File:
A string containing the file path to the document for extraction. The path can reference a file on an internal stage or a named external stage (for example, Amazon S3, Google Cloud Storage, or Microsoft Azure). For example: `@db.schema.stage/file.pdf`
Prompt:
A JSON value that specifies key and question pairs for extraction in one of the formats supported by the `responseFormat` argument of the [](/sql-reference/functions/ai_extract) function. For more information, see [](/sql-reference/functions/ai_extract).
Response:
A JSON object containing key and response pairs.
Column names are case-insensitive and can be in any order in the Dataset; however, all required columns (`File`, `Prompt`, and `Response`) must be present for the Dataset to be valid. Additional columns in the Dataset are ignored. When preparing the Dataset, note the following: - The schema of the fine-tuned model is the unique set of all questions in the Dataset. - The answers in the `Response` column should match the questions in the `Prompt` column by matching keys in the `Prompt` and `Response` columns. - You don't have to specify the same set of questions for every document. - To improve model accuracy, add a prompt and response row for each question, even if the model's default response is correct. This action confirms that the default answer is accurate. For more information about Datasets, see [](/developer-guide/snowflake-ml/dataset). ### Example Dataset
When you create the Dataset, set the response to `None` if the document does not contain an answer to the question. ## Usage notes - Snowflake recommends using at least 20 documents for fine-tuning. - In the training Dataset, at most 100 unique questions are supported for entity extraction, and at most 10 unique questions are supported for table extraction. - Training and validation documents can reside on an internal stage or a named external stage. For access requirements and setup when you use cloud storage, see [](/user-guide/snowflake-cortex/aisql). - Client-side encrypted stages are not supported. For more information, see [](/sql-reference/functions/ai_extract). - Fine-tuning `arctic-extract` models is currently incompatible with custom [network policies](/user-guide/network-policies). - Supported file formats for documents are: - PDF - PNG - JPG, JPEG - TIFF, TIF - The maximum number of pages per document is: - 64 pages for AWS US West 2 (Oregon) and AWS Europe Central 1 (Frankfurt) - 125 pages for AWS US East 1 (N. Virginia) and Azure East US 2 (Virginia) - The maximum number of unique document files in the Dataset is 1,000. You can reference the same document file multiple times. - A limit exists on how many questions and documents can be in a fine-tuning job. Number of questions multiplied by total number of pages in all document files in the Dataset must be equal or less than 50,000. For example, some valid combinations are:
| Number of questions | Number of pages | Number of document file references [1] | | ------------------- | --------------- | --------------------------------------------------------------------------- | | 10 | 1 | 5,000 | | 100 | 1 | 500 | | 10 | 10 | 500 | | 25 | 10 | 200 |
## Create a fine-tuning job To create a fine-tuning job, you must create a Dataset object that contains the training data. The following example shows how to create a Dataset object and use the Dataset to create a fine-tuning job for an `arctic-extract` model. 1. Create the table which will contain the training data: ```sql CREATE OR REPLACE TABLE my_data_table (f FILE, p VARCHAR, r VARCHAR); ``` 2. Populate the table with the training data: ```sql INSERT INTO my_data_table (f, p, r) SELECT TO_FILE('@db.schema.stage', '1.pdf'), '{"net": "What is the net value?"}', '{"net": "3,762.56"}'; ``` 3. Create the Dataset object: ```sql CREATE OR REPLACE DATASET my_dataset; ``` 4. Create a new version of the Dataset that adds the training data, using the [](/sql-reference/functions/fl_get_stage) and the [](/sql-reference/functions/fl_get_relative_path) functions to get the file paths: ```sql ALTER DATASET my_dataset ADD VERSION 'v1' FROM ( SELECT FL_GET_STAGE(f) || '/' || FL_GET_RELATIVE_PATH(f) AS "file", p AS "prompt", r AS "response" FROM my_data_table ); ``` 5. Create a fine-tuning job: ```sql SELECT SNOWFLAKE.CORTEX.FINETUNE( 'CREATE', 'my_tuned_model', 'arctic-extract', 'snow://dataset/db.schema.my_dataset/versions/v1' ); ``` ## Use your fine-tuned `arctic-extract` model for inference To use the fine-tuned `arctic-extract` model for inference, ensure you have the following privileges on the model object: - OWNERSHIP - USAGE - READ To use the fine-tuned `arctic-extract` model for inference with the [AI_EXTRACT](/sql-reference/functions/ai_extract) function, specify the model using the `model` parameter as shown in the following example: ```sql SELECT AI_EXTRACT( model => 'db.schema.my_tuned_model', file => TO_FILE('@db.schema.files','document.pdf') ); ``` You can overwrite questions used for fine-tuning by using the `responseFormat` parameter as shown in the following example: ```sql SELECT AI_EXTRACT( model => 'db.schema.my_tuned_model', file => TO_FILE('@db.schema.files','document.pdf'), responseFormat => [['name', 'What is the first name of the employee?'], ['city', 'Where does the employee live?']] ); ``` For more information, see [](/sql-reference/functions/ai_extract). You can copy your fine-tuned `arctic-extract` model between databases and/or schemas within an account or between accounts. For more information, see [](/user-guide/snowflake-cortex/copy-arctic-extract-models). --- title: Getting started with Snowflake CoWork source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/snowflake-cowork/getting-started.md section: Snowflake Cortex (AI & ML) --- # Getting started with %sf-intelligence% This feature is not available in the People's Republic of China. This topic provides information about getting started with %sf-intelligence% with a simple example of creating an enterprise agent. This agent can be used with %sf-intelligence% to respond to questions by reasoning over both structured and unstructured data. For a more detailed guide, see [Getting Started with %sf-intelligence%](https://www.snowflake.com/en/developers/guides/getting-started-with-snowflake-intelligence/). ## Prerequisites - [Git installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) - A Snowflake account - Access to the ACCOUNTADMIN role ## Create a database, schema, and tables and load data from AWS S3 To create the building blocks for the enterprise agent, you must create a database, schema, tables, and load data from AWS S3. 1. Clone the [Getting Started with %sf-intelligence% GitHub repository](https://github.com/Snowflake-Labs/sfguide-getting-started-with-snowflake-intelligence/) to your local machine: ```bash git clone https://github.com/Snowflake-Labs/sfguide-getting-started-with-snowflake-intelligence.git ``` 2. Sign in to %sf-web-interface-link%. 3. In the navigation menu, select **Projects** %raa% **Workspaces**. 4. Select **+ Add new**. 5. Select **SQL File**. 6. Enter a name for the file. 7. Open the file. 8. Copy the contents of the [setup.sql](https://github.com/Snowflake-Labs/sfguide-getting-started-with-snowflake-intelligence/blob/main/setup.sql) file to the workspace. 9. Run all statements in order. 10. Run the following SQL statements in the workspace: ```sql USE ROLE ACCOUNTADMIN; CREATE SNOWFLAKE INTELLIGENCE SNOWFLAKE_INTELLIGENCE_OBJECT_DEFAULT; GRANT USAGE ON SNOWFLAKE INTELLIGENCE SNOWFLAKE_INTELLIGENCE_OBJECT_DEFAULT TO ROLE snowflake_intelligence_admin; GRANT CREATE SEMANTIC VIEW ON SCHEMA DASH_DB_SI.RETAIL TO ROLE ACCOUNTADMIN; ``` 11. Optionally, run the following SQL statement to enable cross-region inference: ```sql ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'ANY_REGION'; ``` 12. Switch the user role in Snowsight to SNOWFLAKE_INTELLIGENCE_ADMIN. ## Create tools for the agent to use Create the tools that the agent will use. **Create a semantic view for use with Cortex Analyst.** 1. In the navigation menu, select **AI & ML** %raa% **Cortex Analyst**. 2. Select **Create new**, then select **Create new Semantic View**. 3. For the location to store the semantic view, select DASH_DB_SI.RETAIL. 4. For the name, enter `SALES_AND_MARKETING_DATA`. 5. For the description, enter `Semantic view for sales and marketing data analysis across campaigns, products, transactions, and social media engagement.`. 6. Select **Next**. 7. Select **Skip**. 8. Select the DASH_DB_SI.RETAIL schema. 9. For the tables, select the MARKETING_CAMPAIGN_METRICS, PRODUCTS, SALES, and SOCIAL_MEDIA tables. 10. Select **Next**. 11. For the columns, select all available columns for the selected tables. 12. Select **Next**. 13. Review and accept all of the relationship and metric suggestions. 14. Select **Save**. 15. Wait for the semantic view to be created. **Create a Cortex search tool by creating a search service.** 1. In the navigation menu, select **AI & ML** %raa% **Cortex Search**. 2. Select **Create**. 3. For **Service database and schema**, select **DASH_DB_SI.RETAIL**. 4. For **Service name**, enter **Support_Cases**, and then select **Next**. 5. In the list of data sources, select the SUPPORT_CASES table, and then select **Next**. 6. In the list of search columns, select **TRANSCRIPT**, and then select **Next**. 7. For the attribute columns, select **TITLE** and **PRODUCT**, and then select **Next**. 8. For the columns to include, select **Select all**, and then select **Next**. 9. For the warehouse, select **DASH_WH_SI** (if that warehouse is not available, select **COMPUTE_WH**), and then select **Create**. ## Create a Cortex Agent To create the agent that will use the tools, follow these steps: 1. In the navigation menu, select **AI & ML** %raa% **Agents**. 2. Select **Create agent**. 3. For the schema, use SNOWFLAKE_INTELLIGENCE.AGENTS. 4. For the agent object name, use `Sales_AI`. 5. For the display name, use `Sales AI`. 6. Select **Create agent**. ## Add the tools to the agent **Add the Cortex Analyst tool to the agent.** 1. From the agent page, select the **Tools** tab. 2. Navigate to the Cortex Analyst entry. 3. Select **+ Add**, then select **Semantic view**. 4. For the database and schema, select DASH_DB_SI.RETAIL. 5. For the semantic view, select `SALES_AND_MARKETING_DATA`. 6. For the name, use `SALES_AND_MARKETING_DATA`. 7. For the description, use the following: ```text The Sales and Marketing Data semantic view in DASH_DB_SI.RETAIL schema provides a complete view of retail business performance by connecting marketing campaigns, product information, sales data, and social media engagement. The view enables tracking of marketing campaign effectiveness through clicks and impressions, while linking to actual sales performance across different regions. Social media engagement is monitored through influencer activities and mentions, with all data connected through product categories and IDs. The temporal alignment across tables allows for comprehensive analysis of marketing impact on sales performance and social media engagement over time. ``` 8. For the warehouse, select **Custom**, then select DASH_WH_SI. 9. For the query timeout, use `60`. 10. Select **Add**. **Add the Cortex Search tool to the agent.** 1. Navigate to the Cortex Search Services entry. 2. Select **+ Add**. 3. For the database and schema, select DASH_DB_SI.RETAIL. 4. For the search service, select `DASH_DB_SI.RETAIL.Support_Cases`. 5. For the ID column, use `ID`. 6. For the title column, use `TITLE`. 7. For the name, use `Support_Cases`. 8. Select **Add**. 9. Select the **Orchestration** tab. 10. Add the following orchestration instructions: ```text Whenever you can answer visually with a chart, always choose to generate a chart even if the user didn't specify to. ``` 11. Select **Save**. ## Use %sf-intelligence% Interact with the agent from %sf-intelligence%. 1. Navigate to %sf-intelligence% using one of the methods described in [](#label-snowflake-intelligence-use-agent). 2. Select the newly created agent. 3. Enter the following prompts: - "What issues are reported with jackets recently in customer support tickets?" - "Show me the trend of sales by product category between June and August." - "Why did sales of Fitness Wear grow so much in July?" --- title: Governance and availability source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/governance-and-availability.md section: Snowflake Cortex (AI & ML) --- # Governance and availability Cortex AI features fit into Snowflake's broader approach to governance, availability, and cost management. This documentation brings together the key guidance for understanding model availability, regional behavior, budgets, and access management across Snowflake AI & ML. ## Cross-region inference Many Snowflake AI features support [cross-region inference](/user-guide/snowflake-cortex/cross-region-inference), which allows inference requests to be processed in a different region when the requested model or feature is not available in your account's default region. Learn when cross-region inference applies, how regional and global availability differ, and how pricing may vary depending on where inference is served. ## Model availability [Model and feature availability](#label-cortex-llm-availability) can vary by region and georegion. Find which models are available where, along with relevant lifecycle and status information to help plan adoption, manage change, and understand whether a feature is available natively in-region or through cross-region inference. ## Cost governance and budgets Snowflake provides usage views and budget features to help monitor AI consumption, understand spend, and respond when usage approaches configured limits. [Track AI-related usage](/user-guide/snowflake-cortex/governance-and-availability/ai-cost-management-and-governance), connect consumption to billing, and use budgets to support internal monitoring, notifications, and spending controls across supported AI features using. ## Access controls Across Snowflake AI & ML, most features already include access controls. The [Cortex User role](/user-guide/snowflake-cortex/aisql), [role based access control for models](#label-cortex-llm-rbac), [Agent user role](#label-cortex-agents-access-control), and [%sf-intelligence% user](/user-guide/snowflake-cortex/snowflake-cowork) span part of the current feature set and show how access is managed in different product areas. Together, these capabilities support how access is granted, managed, and restricted across features. --- title: Improve literal search to enhance Cortex Analyst responses source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst/cortex-analyst-search-integration.md section: Snowflake Cortex (AI & ML) --- # Improve literal search to enhance Cortex Analyst responses This feature is not available in the People's Republic of China. - [](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) This topic describes ways to improve literal string searches to help Cortex Analyst generate more accurate SQL queries. Writing the correct SQL query to answer a question sometimes requires knowing exact literal values to filter on. Since those values can't always be extracted directly from the question, a search of some kind may be needed. For example, if a user asks a question such as: ```text What was my overall sales of iced tea in Q1? ``` You might try the following query: ```sql SELECT DISTINCT name FROM product WHERE name LIKE '%iced%tea%' ``` If you've ever gone through this process yourself, you'll know that this isn't a perfect solution. For example, this query won't show you any products named "Ice Tea", but it will show you some "spiced tea". Cortex Analyst offers two solutions to help improve literal usage: - Semantic search over the provided sample values in your [semantic model](/user-guide/views-semantic/sql). - Semantic search using [Cortex Search Services](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview). This is where integrating with Cortex Search can help. [Cortex Search](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) is a feature that enables low-latency, high-quality "fuzzy" search over text data. You can create a Cortex Search service to do a semantic search over the underlying database column to find any literal values needed for Cortex Analyst to use in the SQL query that answers the user's question. ## Semantic search over sample values For dimensions with relatively low-cardinality (about 1 - 10 distinct values), using a sample value search by specifying enough sample values to show the structure of the response for the dimension is recommended. This solution requires no additional storage besides the minimal increase to the semantic model size. Before Cortex Analyst generates a SQL query for your question, it does a semantic similarity search between your question and the provided sample values to identify any appropriate literal values that may be needed to write your query. Note that the semantic similarity search may retrieve more relevant literals than the fuzzy string matching query approach mentioned above. Only a fixed-sized set of retrieved sample values will be presented to the LLM as literals that may be needed to write the SQL query. That means adding more sample values does not put you at risk of exceeding the LLM's context window. ## Semantic search using Cortex Search Service For dimensions with higher cardinality (more than 10 distinct values) or dimensions whose values change frequently, you can use a Cortex Search Service to search through the literals. This solution reduces data duplication and keeps your semantic model concise. Cortex Search Services do come with additional storage and compute costs. For details, see [](#label-cortex-search-cost-considerations). In this preview, only a single Cortex Search Service per logical dimension is supported. There are two options for creating a Cortex Search Service for a logical dimension in your Cortex Analyst semantic model: - Use the Cortex Analyst UI to create a Cortex Search Service. This is the recommended approach, because it is simpler and less error-prone than manual setup. - Create a Cortex Search Service manually with SQL code. This approach is more flexible but requires you to write code. ### Option 1: Use the Cortex Analyst UI You can create a Cortex Search Service in Snowsight using the Cortex Analyst semantic model creation UI. This approach requires no writing or editing of SQL or YAML, and is suitable for most uses. Sign in to %sf-web-interface-link%. in the navigation menu, select **AI & ML** %raa% **Cortex Analyst** %raa% **Create new model**. Follow the model creation flow to create the Cortex Analyst semantic model. The screen for setting up Cortex Search Services is at the end of this flow. When defining dimensions in the UI, select columns that contain text values you want to improve literal matching for. The wizard automatically selects high cardinality columns for you, but you can choose other columns. Next, the UI lets you choose settings for your new service, then creates the service automatically when you complete the flow. The service is provisioned in database and schema that you selected. Once created, the service is automatically linked to your semantic model. (The wizard also generates the YAML that links the service.) ### Option 2: Create a Cortex Search Service manually The following steps show how to manually set up a Cortex Search Service for a logical dimension in your Cortex Analyst semantic model: 1. Create Cortex Search Service ```sql CREATE OR REPLACE CORTEX SEARCH SERVICE my_logical_dimension_search_service ON my_dimension WAREHOUSE = xsmall TARGET_LAG = '1 hour' AS ( SELECT DISTINCT my_dimension FROM my_logical_dimension_landing_table );` ``` 2. Include the Cortex Search service in your semantic model using the following yaml snippet: ```yaml tables: - name: my_table base_table: database: my_database schema: my_schema table: my_table dimensions: - name: my_dimension expr: my_column cortex_search_service: service: my_logical_dimension_search_service literal_column: my_column # optional database: my_search_database # optional schema: my_search_schema # optional ``` The following fields are optional under `cortex_search_service`: - `literal_column`: Defaults to the search index. - `database`: Defaults to the database of the specified base table. - `schema`: Defaults to the schema of the specified base table. --- title: Integrate tools and data source: https://docs.snowflake.com/en/user-guide/snowflake-cortex/snowflake-cowork/integrate-tools.md section: Snowflake Cortex (AI & ML) --- # Integrate tools and data This feature is not available in the People's Republic of China. In some cases, you may want to integrate other tools and data sources with your agents in %sf-intelligence%. %sf-intelligence% supports the Model Context Protocol (MCP), which is an [open-source standard](https://modelcontextprotocol.io/docs/getting-started/intro) that lets AI agents securely interact with business applications and external data systems, such as databases and content repositories. The MCP server provides a standards-based interface that allows AI agents to discover and invoke tools, such as Cortex Analyst and Cortex Search, and retrieve the data they need. For more information, see [](/user-guide/snowflake-cortex/cortex-agents-mcp). With MCP, you can: - Allow your agent to retrieve data from Snowflake accounts using a Snowflake-managed MCP server without needing to deploy separate infrastructure. You can configure the MCP server to serve Cortex Analyst, Cortex Search, and Cortex Agents as tools, along with custom tools and SQL executions on the standards-based interface. - Connect to your agents in %sf-intelligence% from external MCP clients. For information about creating and managing the Snowflake-managed MCP server, see [](/user-guide/snowflake-cortex/cortex-agents-mcp). ## Use the Snowflake-managed MCP server to connect to your agents from external MCP clients Any agent that you create in Snowflake, or the tools that the agent is connected to, can have a managed endpoint for other systems to connect to your agent with MCP. This provid
Display Name API Name Default Value Allowable Values Description
Hostname * hostname Destination hostname or IP address
Port * port Destination port number
Record Writer * record-sink-record-writer Specifies the Controller Service to use for writing out the records.
Sender Threads * sender-threads 2 Number of worker threads allocated for handling socket communication
Property Description
File Filter Only files contained in the archive whose names match the given regular expression will be extracted (tar/zip only)
Filename Character Set If supplied this character set will be supplied to the Zip utility to attempt to decode filenames using the specific character set. If not specified the default platform character set will be used. This is useful if a Zip was created with a different character set than the platform default and the zip uses non standard values to specify.
Packaging Format The Packaging Format used to create the file
Password Password used for decrypting Zip archives encrypted with ZipCrypto or AES. Configuring a password disables support for alternative Zip compression algorithms.
allow-stored-entries-wdd Some zip archives contain stored entries with data descriptors which by spec should not happen. If this property is true they will be read anyway. If false and such an entry is discovered the zip will fail to process.
Name Description
failure The original FlowFile is sent to this relationship when it cannot be unpacked for some reason
original The original FlowFile is sent to this relationship after it has been successfully unpacked
success Unpacked FlowFiles are sent to this relationship
Name Description
mime.type If the FlowFile is successfully unpacked, its MIME Type is no longer known, so the mime.type attribute is set to application/octet-stream.
fragment.identifier All unpacked FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.index A one-up number that indicates the ordering of the unpacked FlowFiles that were created from a single parent FlowFile
fragment.count The number of unpacked FlowFiles generated from the parent FlowFile
segment.original.filename The filename of the parent FlowFile. Extensions of .tar, .zip or .pkg are removed because the MergeContent processor automatically adds those extensions if it is used to rebuild the original FlowFile
file.lastModifiedTime The date and time that the unpacked file was last modified (tar and zip only).
file.creationTime The date and time that the file was created. For encrypted zip files this attribute always holds the same value as file.lastModifiedTime. For tar and unencrypted zip files if available it will be returned otherwise this will be the same value asfile.lastModifiedTime.
file.lastMetadataChange The date and time the file's metadata changed (tar only).
file.lastAccessTime The date and time the file was last accessed (tar and unencrypted zip files only)
file.owner The owner of the unpacked file (tar only)
file.group The group owner of the unpacked file (tar only)
file.size The uncompressed size of the unpacked file (tar and zip only)
file.permissions The read/write/execute permissions of the unpacked file (tar and unencrypted zip files only)
file.encryptionMethod The encryption method for entries in Zip archives
Property Description
Delete Attributes Expression Regular expression for attributes to be deleted from FlowFiles. Existing attributes that match will be deleted regardless of whether they are updated by this processor.
Stateful Variables Initial Value If using state to set/reference variables then this value is used to set the initial value of the stateful variable. This will only be used in the @OnScheduled method when state does not contain a value for the variable. This is required if running statefully but can be empty if needed.
Store State Select whether or not state will be stored. Selecting 'Stateless' will offer the default functionality of purely updating the attributes on a FlowFile in a stateless manner. Selecting a stateful option will not only store the attributes on the FlowFile but also in the Processors state. See the 'Stateful Usage' topic of the 'Additional Details'section of this processor's documentation for more information
canonical-value-lookup-cache-size Specifies how many canonical lookup values should be stored in the cache
Scopes Description
LOCAL Gives the option to store values not only on the FlowFile but as stateful variables to be referenced in a recursive manner.
Name Description
success All successful FlowFiles are routed to this relationship
Name Description
See additional details This processor may write or remove zero or more attributes as described in additional details
Property Description
Box Client Service Controller Service used to obtain a Box API connection.
File ID The ID of the file for which to update metadata.
Record Reader The Record Reader to use for parsing the incoming data
Template Key The key of the metadata template to update.
Name Description
failure A FlowFile is routed to this relationship if an error occurs during metadata update.
file not found FlowFiles for which the specified Box file was not found will be routed to this relationship.
success A FlowFile is routed to this relationship after metadata has been successfully updated.
template not found FlowFiles for which the specified metadata template was not found will be routed to this relationship.
Name Description
box.id The ID of the file whose metadata was updated
box.template.name The template name used for metadata update
box.template.scope The template scope used for metadata update
error.code The error code returned by Box
error.message The error message returned by Box
Property Description
Object Type Salesforce object type whose state should be updated
Salesforce Bulk Job State Service Controller Service managing Bulk Jobs state
Status Status to set for the object type
Name Description
failure Incoming FlowFile is routed here if update fails
success Incoming FlowFile is routed here after state update
Property Description
Client Service An Elasticsearch client service to use for running queries.
Index The name of the index to use.
Max JSON Field String Length The maximum allowed length of a string value when parsing a JSON document or attribute.
Query A query in JSON syntax, not Lucene syntax. Ex: \{"query":\{"match":\{"somefield":"somevalue"\}\}\}. If this parameter is not set, the query will be read from the flowfile content. If the query (property and flowfile content) is empty, a default empty JSON Object will be used, which will result in a "match_all" query in Elasticsearch.
Query Attribute If set, the executed query will be set on each result flowfile in the specified attribute.
Query Clause A "query" clause in JSON syntax, not Lucene syntax. Ex: \{"match":\{"somefield":"somevalue"\}\}. If the query is empty, a default JSON Object will be used, which will result in a "match_all" query in Elasticsearch.
Query Definition Style How the JSON Query will be defined for use by the processor.
Script A "script" to execute during the operation, in JSON syntax. Ex: \{"source": "ctx._source.count++", "lang": "painless"\}
Type The type of this document (used by Elasticsearch for indexing and searching).
Name Description
failure If the "by query" operation fails, and a flowfile was read, it will be sent to this relationship.
retry All flowfiles that fail due to server/cluster availability go to this relationship.
success If the "by query" operation succeeds, and a flowfile was read, it will be sent to this relationship.
Name Description
elasticsearch.update.took The amount of time that it took to complete the update operation in ms.
elasticsearch.update.error The error message provided by Elasticsearch if there is an error running the update.
Property Description
counter-name The name of the counter you want to set the value of - supports expression language like $\{counterName\}
delta Adjusts the counter by the specified delta for each flow file received. May be a positive or negative integer.
Name Description
success Counter was updated/retrieved
Property Description
Column Name Translation Pattern Column name will be normalized with this regular expression
Column Name Translation Strategy The strategy used to normalize table column name. Column Name will be uppercased to do case-insensitive matching irrespective of strategy
Database Dialect Service Database Dialect Service for generating statements specific to a particular service or vendor.
db-type Database Type for generating statements specific to a particular service or vendor. The Generic Type supports most cases but selecting a specific type enables optimal processing or additional features.
record-reader The service for reading incoming flow files. The reader is only used to determine the schema of the records, the actual records will not be processed.
updatedatabasetable-catalog-name The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty. Note that if the property is set and the database is case-sensitive, the catalog name must match the database's catalog name exactly.
updatedatabasetable-create-table Specifies how to process the target table when it does not exist (create it, fail, e.g.).
updatedatabasetable-dbcp-service The Controller Service that is used to obtain connection(s) to the database
updatedatabasetable-primary-keys A comma-separated list of record field names that uniquely identifies a row in the database. This property is only used if the specified table needs to be created, in which case the Primary Key Fields will be used to specify the primary keys of the newly-created table. IMPORTANT: Primary Key Fields must match the record field names exactly unless 'Quote Column Identifiers' is false and the database allows for case-insensitive column names. In practice it is best to specify Primary Key Fields that exactly match the record field names, and those will become the column names in the created table.
updatedatabasetable-query-timeout Sets the number of seconds the driver will wait for a query to execute. A value of 0 means no timeout. NOTE: Non-zero values may not be supported by the driver.
updatedatabasetable-quoted-column-identifiers Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables and/or forcing the record field names to match the column names exactly.
updatedatabasetable-quoted-table-identifiers Enabling this option will cause the table name to be quoted to support the use of special characters in the table name and/or forcing the value of the Table Name property to match the target table name exactly.
updatedatabasetable-record-writer Specifies the Controller Service to use for writing results to a FlowFile. The Record Writer should use Inherit Schema to emulate the inferred schema behavior, i.e. an explicit schema need not be defined in the writer, and will be supplied by the same logic used to infer the schema from the column types. If Create Table Strategy is set 'Create If Not Exists', the Record Writer 's output format must match the Record Reader's format in order for the data to be placed in the created table location. Note that this property is only used if 'Update Field Names' is set to true and the field names do not all match the column names exactly. If no update is needed for any field names (or 'Update Field Names' is false), the Record Writer is not used and instead the input FlowFile is routed to success or failure without modification.
updatedatabasetable-schema-name The name of the database schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty. Note that if the property is set and the database is case-sensitive, the schema name must match the database's schema name exactly.
updatedatabasetable-table-name The name of the database table to update. If the table does not exist, then it will either be created or an error thrown, depending on the value of the Create Table property.
updatedatabasetable-translate-field-names If true, the Processor will attempt to translate field names into the corresponding column names for the table specified, for the purposes of determining whether the field name exists as a column in the target table. NOTE: If the target table does not exist and is to be created, this property is ignored and the field names will be used as-is. If false, the field names must match the column names exactly, or the column may not be found and instead an error my be reported that the column already exists.
updatedatabasetable-update-field-names This property indicates whether to update the output schema such that the field names are set to the exact column names from the specified table. This should be used if the incoming record field names may not match the table 's column names in terms of upper- and lower-case. For example, this property should be set to true if the output FlowFile is destined for Oracle e.g., which expects the field names to match the column names exactly. NOTE: The value of the'Translate Field Names' property is ignored when updating field names; instead they are updated to match the column name as returned by the database.
Name Description
failure A FlowFile containing records routed to this relationship if the record could not be transmitted to the database.
success A FlowFile containing records routed to this relationship after the record has been successfully transmitted to the database.
Name Description
output.table This attribute is written on the flow files routed to the 'success' and 'failure' relationships, and contains the target table name.
output.path This attribute is written on the flow files routed to the 'success' and 'failure' relationships, and contains the path on the file system to the table (or partition location if the table is partitioned).
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer, only if a Record Writer is specified and Update Field Names is 'true'.
record.count Sets the number of records in the FlowFile, only if a Record Writer is specified and Update Field Names is 'true'.
Property Description
Record Reader Specifies the Controller Service to use for reading incoming data
Record Writer Specifies the Controller Service to use for writing out the records
Replacement Value Strategy Specifies how to interpret the configured replacement values
Name Description
failure If a FlowFile cannot be transformed from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship
success FlowFiles that are successfully transformed will be routed to this relationship
Name Description
record.index This attribute provides the current row index and is only available inside the literal value expression.
record.error.message This attribute provides on failure the error message encountered by the Reader or Writer.
Property Description
Add Column Strategy The strategy to use when the incoming schema has a column that is not present in the existing table
Add Not Null Strategy The strategy to use when the incoming schema has a not-null constraint that is not present in the existing table
Alter Column Type Strategy The strategy to use when the existing table has a column with a different type than the incoming schema.
Column Name Transformation An optional transformation that can be applied to the names of columns defined in the schema. This transformation is applied to the column names before they are compared to the existing columns in the table. This property can reference the following variables via Expression Language, in addition to attributes: _column.name_, _column.type_, _column.nullable_, _column.precision_, _column.scale_, _column.primaryKey_.
Column Removal Strategy The strategy to use when the existing table has a column that is not present in the incoming schema
Connection Pool The connection pool to use to connect to Snowflake
Create Stream Whether or not to create a Snowflake Stream for the table
Creation Parameters Additional parameters to include in the CREATE TABLE statement. For example, 'CLUSTER BY (column_name)'
Desired Schema The desired schema / table definition
Drop Column Strategy The strategy to use when the existing table has a column that is not present in the incoming schema
Drop Not Null Strategy The strategy to use when the existing table has a not-null constraint that is not present in the incoming schema
Include Default Values Whether or not to include DEFAULT values in CREATE TABLE or ALTER TABLE ADD COLUMN statements
Include Not Null Constraints Whether or not to include NOT NULL constraints in CREATE TABLE or ALTER TABLE ADD COLUMN statements
Include Primary Key Constraints Whether or not to include primary key constraints in the creation statement
Max Batch Size The maximum number of FlowFiles that can be processed in a single execution for a given table.
Modify Primary Key Strategy The strategy to use when the incoming schema has a primary key that differs from the existing primary key. Modifying the Primary Key requires dropping the existing one, if any, and adding a new one.
Record Reader Record Reader to use for obtaining the desired schema
Removed Column Name Suffix The suffix to append to a column that was removed. For example, to rename column 'foo' to 'foo__deleted', the property can be set to ___deleted_
Schema Name The name of the schema to update
Stream Creation Parameters Additional parameters to include in the CREATE STREAM statement. For example, 'APPEND_ONLY=TRUE'
Stream Name The name of the stream
Table Metadata Cache Expiration Time The time in seconds after which the cache entry will be removed
Table Name The name of the table to update or create stream on
Table Schema Strategy Specifies how to obtain the desired schema / table definition
Table Stream Creation Parameters Parameters to include in the CREATE STREAM statement. For example, 'APPEND_ONLY=TRUE'. The stream will be created along with the table as it's source.
Table Stream Name The name of the stream created along with the table. Stream source will be the created table.
Update Type The type of update to perform
Use Table Metadata Cache Whether to cache table's metadata instead of reading it directly from Snowflake. Applies to [Create Table If Not Exists, Alter Table]
Name Description
failure The incoming FlowFile is routed to this relationship if the table cannot be updated
success The incoming FlowFile is routed to this relationship after the table has been updated successfully
Name Description
schema.hash A SHA-256 hash of the final table schema after all updates have been completed. Can be used for change detection and caching purposes.
Property Description
Add Column Strategy The strategy to use when the incoming schema has a column that is not present in the existing table
Alter Column Strategy The strategy to use when a column has different data type in the incoming schema from the existing table
Alter Column Type Strategy The strategy to use when the existing table has a column with a different type than the incoming schema.
Connection Pool The connection pool to use to connect to Snowflake
Desired Schema The desired schema / table definition
Drop Column Strategy The strategy to use when the existing table has a column that is not present in the incoming schema
Max Batch Size The maximum number of FlowFiles that can be processed in a single execution for a given table.
Record Reader Record Reader to use for obtaining the desired schema
Schema Name The name of the schema to update
Table Metadata Cache Expiration Time The time in seconds after which the cache entry will be removed
Table Name The name of the table to update
Table Schema Strategy Specifies how to obtain the desired schema / table definition
Use Table Metadata Cache Whether to cache table's metadata instead of reading it directly from Snowflake
Name Description
failure The incoming FlowFile is routed to this relationship if the table cannot be updated
illegal alteration The incoming FlowFile is routed to this relationship if the update requires an alteration that is configured to fail
success The incoming FlowFile is routed to this relationship after the table has been updated successfully
table not found The incoming FlowFile is routed to this relationship if the specified table does not exist.
Name Description
schema.hash A hexadecimal-encoded SHA-256 hash of the final table schema after all updates have been completed.
Property Description
Connection Pool The connection pool to use to connect to Snowflake
Object Identifier Resolution Controls how source object identifiers (schemas, tables, columns) are stored and queried in Snowflake. This setting determines whether you will need to use double quotes in your SQL queries.
Schema Creation Cache Expiration Time The time after which the cache entry will be removed
Schema Name The name of the schema to create
Use Schema Creation Cache Whether to cache schema's creation instead of executing CREATE SCHEMA IF NOT EXISTS statement for each FlowFile.
Name Description
failure The incoming FlowFile is routed to this relationship if the schema cannot be created
success The incoming FlowFile is routed to this relationship after the schema has been created successfully
Property Description
Connection Pool The connection pool to use to connect to Snowflake
Object Identifier Resolution Controls how source object identifiers (schemas, tables, columns) are stored and queried in Snowflake. This setting determines whether you will need to use double quotes in your SQL queries.
Schema Name The name of the schema containing the stream and/or source table
Source Table Name The name of the source table for the stream
Stream Creation Parameters Additional parameters to include in the CREATE STREAM statement. For example, 'APPEND_ONLY=TRUE SHOW_INITIAL_ROWS=TRUE'
Stream Name The name of the stream to create, drop, or replace
Update Type The type of stream operation to perform
Name Description
failure The incoming FlowFile is routed to this relationship if the stream operation cannot be completed
object not found The incoming FlowFile is routed to this relationship if the specified stream or source table does not exist.
success The incoming FlowFile is routed to this relationship after the stream operation has been completed successfully
Property Description
Add Column Strategy The strategy to use when the incoming schema has a column that is not present in the existing table
Add Not Null Strategy The strategy to use when the incoming schema has a not-null constraint that is not present in the existing table
Alter Column Type Strategy The strategy to use when the existing table has a column with a different type than the incoming schema.
Column Name Transformation An optional transformation that can be applied to the names of columns defined in the schema. This transformation is applied to the column names before they are compared to the existing columns in the table. This property can reference the following variables via Expression Language, in addition to attributes: _column.name_, _column.type_, _column.nullable_, _column.precision_, _column.scale_, _column.primaryKey_.The result of applying transformations based on this property will be treated according to the setting of _Object Name Handling_ property.
Column Removal Strategy The strategy to use when the existing table has a column that is not present in the incoming schema
Connection Pool The connection pool to use to connect to Snowflake
Creation Parameters Additional parameters to include in the CREATE TABLE statement. For example, 'CLUSTER BY (column_name)'
Desired Schema The desired schema / table definition
Drop Column Strategy The strategy to use when the existing table has a column that is not present in the incoming schema
Drop Not Null Strategy The strategy to use when the existing table has a not-null constraint that is not present in the incoming schema
Include Default Values Whether or not to include DEFAULT values in CREATE TABLE or ALTER TABLE ADD COLUMN statements
Include Not Null Constraints Whether or not to include NOT NULL constraints in CREATE TABLE or ALTER TABLE ADD COLUMN statements
Include Primary Key Constraints Whether or not to include primary key constraints in the creation statement
Max Batch Size The maximum number of FlowFiles that can be processed in a single execution for a given table.
Modify Primary Key Strategy The strategy to use when the incoming schema has a primary key that differs from the existing primary key. Modifying the Primary Key requires dropping the existing one, if any, and adding a new one.
Object Identifier Resolution Controls how source object identifiers (schemas, tables, columns) are stored and queried in Snowflake. This setting determines whether you will need to use double quotes in your SQL queries.
Record Reader Record Reader to use for obtaining the desired schema
Removed Column Name Suffix The suffix to append to a column that was removed. For example, to rename column 'foo' to 'foo__deleted', the property can be set to ___deleted_. This property value will behave differently depending on the value of _Object Name Handling_ property, i.e. If _Object Name Handling_ is set to _Case Sensitive Name_, then the suffix will be appended as-is. If _Object Name Handling_ is set to _SQL Identifier_, then the suffix and must consist of only letters, numbers, dollar sign ($), and underscore (_) characters, additionally it will be appended as case-insensitive or case-sensitive depending on the column name it is being appended to is case-insensitive (not double-quoted) or case-sensitive (double-quoted) respectively.
Schema Name The name of the schema containing the table
Table Metadata Cache Expiration Time The time in seconds after which the cache entry will be removed
Table Name The name of the table to update
Table Schema Strategy Specifies how to obtain the desired schema / table definition
Update Type The type of table update to perform
Use Table Metadata Cache Whether to cache table's metadata instead of reading it directly from Snowflake. Applies to [Create Table If Not Exists, Alter Table]
Name Description
failure The incoming FlowFile is routed to this relationship if the table cannot be updated
success The incoming FlowFile is routed to this relationship after the table has been updated successfully
Name Description
schema.hash A SHA-256 hash of the final table schema after all updates have been completed. Can be used for change detection and caching purposes.
Property Description
Connection Pool The connection pool to use to connect to Snowflake
Schema Name The name of the schema where the view will be created
Secure Whether to create a secure view. Secure views hide the view definition from unauthorized users.
View Name The name of the view to create or update
Name Description
failure FlowFiles that failed to be processed
success FlowFiles that were successfully processed
unchanged FlowFiles where the view already exists and hasn't changed
Property Description
CDC Schema Registry When the state of the table is removed, the table will also be removed from the specified CDC Schema Registry.
Desired State The desired state of the table
Overwrite Existing Whether to overwrite the existing state of the table. If false, the state will only be updated if the state is currently unknown.
Schema Name The name of the table's schema
Table Name The name of the table
Table State Service The Table State Service to update
Name Description
comms failure A FlowFile is routed to this relationship if the table state could not be updated due to a communication failure with the Table State Service
state exists A FlowFile is routed to this relationship if the table state was not updated because the state is already known for the table and the 'Overwrite Existing' property is set to 'false'
success A FlowFile is routed to this relationship after the table state has been updated
Name Description
table.state The state of the table after updating the Table State Service
previous.table.state The state of the table before the Table State Service was updated
Property Description
Collection Name The name of the Milvus collection name to use
ID Field Name The name of the field in Milvus to use for storing the IDs of vectors. If a record path is not provided along with the field name the IDs will be generated based on the filename in the format of a string.
ID Record Path The path to the ID field in the record
Max Batch Size If the number of Records in a FlowFile is large, creating a single request to Milvus can consume significant amounts of NiFi heap. In order to avoid this, the Max Batch Size can limit the number of Records to send in a single request.
Metadata Field Name The name of the field to use for storing other metadata associated with the vectors. This data must be in the format of valid json.
Metadata Record Path The path to the metadata field in the record
Milvus Connection Service Connection Service for accessing Milvus Database
Partition Partition of the vector database that you want to perform operations in. If the database has only one partition leave empty.
Record Reader The Record Reader to use for reading the FlowFile
Sparse Vector Field Name The name of the field to use for storing the sparse vectors.
Sparse Vector Indices Path If, Sparse Vectors are to be provided, this RecordPath points to the indices of the sparse data to use.
Sparse Vector Values Path If, Sparse Vectors are to be provided, this RecordPath points to the values of the sparse data to use.
Text Field Name The name of the field in Milvus to use for storing the text associated with the vectors.
Text Record Path The path to the field in the record that contains the text associated with the vectors. If specified, the text will be inserted under the text field in Milvus. If not specified, the text will not be sent to the Milvus database.
Vector Field Name The name of the field in Milvus to use for storing the vectors.
Vector Record Path The path to the vector field in the record
Name Description
failure FlowFiles that cannot be sent to Milvus, and for which a retry is not expected to be successful, are routed to this relationship
retry FlowFiles that fail to be sent to Milvus, but for which a retry may help, are routed to this relationship
success FlowFiles that are successfully sent to Milvus are routed to this relationship
Property Description
ID Record Path The path to the ID field in the record
Max Batch Size If the number of Records in a FlowFile is large, creating a single request to Pinecone can consume significant amounts of NiFi heap. In order to avoid this, the Max Batch Size can limit the number of Records to send in a single request. If the number of Records exceeds this value, multiple requests will be sent to Pinecone.
Metadata Record Path The path to the metadata field in the record
Pinecone API Key The API key for the Pinecone service
Pinecone Index The name of the Pinecone index to use
Pinecone Namespace The name of the Pinecone namespace to use
Record Reader The Record Reader to use for reading the FlowFile
Sparse Vector Indices Path If, Sparse Vectors are to be provided, this RecordPath points to the indices of the sparse data to use.
Sparse Vector Values Path If, Sparse Vectors are to be provided, this RecordPath points to the values of the sparse data to use.
Text Field Name The name of the field in the metadata to use for storing the text associated with the vectors.
Text Record Path The path to the field in the record that contains the text associated with the vectors. If specified, the text will be inserted into the metadata when publishing to Pinecone. If not specified, the text will not be sent to Pinecone.
Vector Record Path The path to the vector field in the record
Web Client Service The Web Client Service to use for communicating with Pinecone
Name Description
failure FlowFiles that cannot be sent to Pinecone, and for which a retry is not expected to be successful, are routed to this relationship
retry FlowFiles that fail to be sent to Pinecone, but for which a retry may help, are routed to this relationship
success FlowFiles that are successfully sent to Pinecone are routed to this relationship
Property Description
Object Name The name of the object type for the records included in the FlowFile.
Record Reader Specifies the Controller Service to use for reading incoming data. Each record will be converted into a JSON object and upserted into Salesforce using a dedicated API call.
Salesforce Client Salesforce Client to interact with the APIs
Name Description
comms.failure The FlowFile is routed to this relationship if any record could not be upserted in Salesforce but the operation might be retried
failure The FlowFile is routed to this relationship if any record could not be upserted in Salesforce
success The FlowFile is routed to this relationship after all records have been successfully upserted
Name Description
sObjectId ID of the created object in Salesforce when using this processor with a single record.
Property Description
CSV Source Attribute The name of the attribute containing CSV data to be validated. If this property is blank, the FlowFile content will be validated.
Max Lines Per Row The maximum number of lines that a row can span before an exception is thrown. This option allows the processor to fail fast when encountering CSV with mismatching quotes - the normal behaviour would be to continue reading until the matching quote is found, which could potentially mean reading the whole file (and exhausting all available memory). Zero value will disable this option.
validate-csv-delimiter Character used as 'delimiter' in the incoming data. Example: ,
validate-csv-eol Symbols used as 'end of line' in the incoming data. Example: n
validate-csv-header True if the incoming flow file contains a header to ignore, false otherwise.
validate-csv-quote Character used as 'quote' in the incoming data. Example: "
validate-csv-schema The schema to be used for validation. Is expected a comma-delimited string representing the cell processors to apply. The following cell processors are allowed in the schema definition: [ParseBigDecimal, ParseBool, ParseChar, ParseDate, ParseDouble, ParseInt, ParseLong, Optional, DMinMax, Equals, ForbidSubStr, LMinMax, NotNull, Null, RequireHashCode, RequireSubStr, Strlen, StrMinMax, StrNotNullOrEmpty, StrRegEx, Unique, UniqueHashCode, IsIncludedIn]. Note: cell processors cannot be nested except with Optional. Schema is required if Header is false.
validate-csv-strategy Strategy to apply when routing input files to output relationships.
validate-csv-violations If true, the validation.error.message attribute would include the list of all the violations for the first invalid line. Note that setting this property to true would slightly decrease the performances as all columns would be validated. If false, a line is invalid as soon as a column is found violating the specified constraint and only this violation for the first invalid line will be included in the validation.error.message attribute.
Name Description
invalid FlowFiles that are not valid according to the specified schema, or no schema or CSV header can be identified, are routed to this relationship
valid FlowFiles that are successfully validated against the schema are routed to this relationship
Name Description
count.valid.lines If line by line validation, number of valid lines extracted from the source data
count.invalid.lines If line by line validation, number of invalid lines extracted from the source data
count.total.lines If line by line validation, total number of lines in the source data
validation.error.message For flow files routed to invalid, message of the first validation error
Property Description
JSON Schema A URL or file path to the JSON schema or the actual JSON schema content
JSON Schema Registry Specifies the Controller Service to use for the JSON Schema Registry
JSON Schema Version The JSON schema specification
Max String Length The maximum allowed length of a string value when parsing the JSON document
Schema Access Strategy Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Name Specifies the name of the schema to lookup in the Schema Registry property
Required Permission Explanation
reference remote resources Schema configuration can reference resources over HTTP
Name Description
failure FlowFiles that cannot be read as JSON are routed to this relationship
invalid FlowFiles that are not valid according to the specified schema are routed to this relationship
valid FlowFiles that are successfully validated against the schema are routed to this relationship
Name Description
json.validation.errors If the flow file is routed to the invalid relationship , this attribute will contain the error message resulting from the validation failure.
Property Description
Schema Access Strategy Specifies how to obtain the schema that should be used to validate records
Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
Schema Name Specifies the name of the schema to lookup in the Schema Registry property
Schema Registry Specifies the Controller Service to use for the Schema Registry
Schema Text The text of an Avro-formatted Schema
Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
allow-extra-fields If the incoming data has fields that are not present in the schema, this property determines whether or not the Record is valid. If true, the Record is still valid. If false, the Record will be invalid due to the extra fields.
coerce-types If enabled, the processor will coerce every field to the type specified in the Reader 's schema. If the value of a field cannot be coerced to the type, the field will be skipped (will not be read from the input data), thus will not appear in the output. If not enabled, then every field will appear in the output but their types may differ from what is specified in the schema. For details please see the Additional Details page of the processor's Help. This property controls how the data is read by the specified Record Reader.
invalid-record-writer If specified, this Controller Service will be used to write out any records that are invalid. If not specified, the writer specified by the "Record Writer" property will be used with the schema used to read the input records. This is useful, for example, when the configured Record Writer cannot write data that does not adhere to its schema (as is the case with Avro) or when it is desirable to keep invalid records in their original format while converting valid records to another format.
maximum-validation-details-length Specifies the maximum number of characters that validation details value can have. Any characters beyond the max will be truncated. This property is only used if 'Validation Details Attribute Name' is set
record-reader Specifies the Controller Service to use for reading incoming data
record-writer Specifies the Controller Service to use for writing out the records. Regardless of the Controller Service schema access configuration, the schema that is used to validate record is used to write the valid results.
strict-type-checking If the incoming data has a Record where a field is not of the correct type, this property determines how to handle the Record. If true, the Record will be considered invalid. If false, the Record will be considered valid and the field will be coerced into the correct type (if possible, according to the type coercion supported by the Record Writer). This property controls how the data is validated against the validation schema.
validation-details-attribute-name If specified, when a validation error occurs, this attribute name will be used to leave the details. The number of characters will be limited by the property 'Maximum Validation Details Length'.
Name Description
failure If the records cannot be read, validated, or written, for any reason, the original FlowFile will be routed to this relationship
invalid Records that are not valid according to the schema will be routed to this relationship
valid Records that are valid according to the schema will be routed to this relationship
Name Description
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer
record.count The number of records in the FlowFile routed to a relationship
Property Description
Schema File The file path or URL to the XSD Schema file that is to be used for validation. If this property is blank, only XML syntax/structure will be validated.
XML Source Attribute The name of the attribute containing XML to be validated. If this property is blank, the FlowFile content will be validated.
Required Permission Explanation
reference remote resources Schema configuration can reference resources over HTTP
Name Description
invalid FlowFiles that are not valid according to the specified schema or contain invalid XML are routed to this relationship
valid FlowFiles that are successfully validated against the schema, if provided, or verified to be well-formed XML are routed to this relationship
Name Description
validatexml.invalid.error If the flow file is routed to the invalid relationship the attribute will contain the error message resulting from the validation failure.
Property Description
Message Authentication Code The MAC to compare with the calculated value
Message Authentication Code Algorithm Hashed Message Authentication Code Function
Message Authentication Code Encoding Encoding of the Message Authentication Code
Secret Key Secret key to calculate the hash
Secret Key Encoding Encoding of the Secret Key
Name Description
failure Signature Verification Failed
success Signature Verification Succeeded
Name Description
mac.calculated Calculated Message Authentication Code encoded by the selected encoding
mac.encoding The Encoding of the Hashed Message Authentication Code
mac.algorithm Hashed Message Authentication Code Algorithm
Property Description
public-key-service PGP Public Key Service for verifying signatures with Public Key Encryption
Name Description
failure Signature Verification Failed
success Signature Verification Succeeded
Name Description
pgp.literal.data.filename Filename from Literal Data
pgp.literal.data.modified Modified Date Time from Literal Data in milliseconds
pgp.signature.created Signature Creation Time in milliseconds
pgp.signature.algorithm Signature Algorithm including key and hash algorithm names
pgp.signature.hash.algorithm.id Signature Hash Algorithm Identifier
pgp.signature.key.algorithm.id Signature Key Algorithm Identifier
pgp.signature.key.id Signature Public Key Identifier
pgp.signature.type.id Signature Type Identifier
pgp.signature.version Signature Version Number
Display Name API Name Default Value Allowable Values Description
Maximum Cache Size * max-cache-size 100 The maximum number of Schemas to cache.
Property Description
attribute-copy-mode Specifies how to handle attributes copied from FlowFiles entering the Notify processor
distributed-cache-service The Controller Service that is used to check for release signals from a corresponding Notify processor
expiration-duration Indicates the duration after which waiting FlowFiles will be routed to the 'expired' relationship
releasable-flowfile-count A value, or the results of an Attribute Expression Language statement, which will be evaluated against a FlowFile in order to determine the releasable FlowFile count. This specifies how many FlowFiles can be released when a target count reaches target signal count. Zero (0) has a special meaning, any number of FlowFiles can be released as long as signal count matches target.
release-signal-id A value that specifies the key to a specific release signal cache. To decide whether the FlowFile that is being processed by the Wait processor should be sent to the 'success' or the 'wait' relationship, the processor checks the signals in the cache specified by this key.
signal-counter-name Within the cache (specified by the Release Signal Identifier) the signals may belong to different counters. If this property is specified, the processor checks the number of signals in the cache that belong to this particular counter. If not specified, the processor checks the total number of signals in the cache.
target-signal-count The number of signals that need to be in the cache (specified by the Release Signal Identifier) in order for the FlowFile processed by the Wait processor to be sent to the ‘success’ relationship. If the number of signals in the cache has reached this number, the FlowFile is routed to the 'success' relationship and the number of signals in the cache is decreased by this value. If Signal Counter Name is specified, this processor checks a particular counter, otherwise checks against the total number of signals in the cache.
wait-buffer-count Specify the maximum number of incoming FlowFiles that can be buffered to check whether it can move forward. The more buffer can provide the better performance, as it reduces the number of interactions with cache service by grouping FlowFiles by signal identifier. Only a signal identifier can be processed at a processor execution.
wait-mode Specifies how to handle a FlowFile waiting for a notify signal
wait-penalty-duration If configured, after a signal identifier got processed but did not meet the release criteria, the signal identifier is penalized and FlowFiles having the signal identifier will not be processed again for the specified period of time, so that the signal identifier will not block others to be processed. This can be useful for use cases where a Wait processor is expected to process multiple signal identifiers, and each signal identifier has multiple FlowFiles, and also the order of releasing FlowFiles is important within a signal identifier. The FlowFile order can be configured with Prioritizers. IMPORTANT: There is a limitation of number of queued signals can be processed, and Wait processor may not be able to check all queued signal ids. See additional details for the best practice.
Name Description
expired A FlowFile that has exceeded the configured Expiration Duration will be routed to this relationship
failure When the cache cannot be reached, or if the Release Signal Identifier evaluates to null or empty, FlowFiles will be routed to this relationship
success A FlowFile with a matching release signal in the cache will be routed to this relationship
wait A FlowFile with no matching release signal in the cache will be routed to this relationship
Name Description
wait.start.timestamp All FlowFiles will have an attribute 'wait.start.timestamp', which sets the initial epoch timestamp when the file first entered this processor. This is used to determine the expiration time of the FlowFile. This attribute is not written when the FlowFile is transferred to failure, expired or success
wait.counter.<counterName> The name of each counter for which at least one signal has been present in the cache since the last time the cache was empty gets copied to the current FlowFile as an attribute.
Property Description
Accepted State Blocks FlowFiles for a given SourceTableFQN until corresponding state is equal to the Accepted State
Table State Service Manages the state of each replicated table
Name Description
failure FlowFiles for tables in terminal states will be routed to this relationship
success FlowFiles fulfilling a given condition will be routed to this relationship
Display Name API Name Default Value Allowable Values Description
Configuration File * configuration-file A configuration file
Required Permission Explanation
read filesystem Provides operator the ability to read from any file that NiFi has access to.
Display Name API Name Default Value Allowable Values Description
Date Format Date Format Specifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/01/2017).
Schema Access Strategy * Schema Access Strategy infer-schema - Use 'Schema Name' Property - Use 'Schema Text' Property - Schema Reference Reader - Infer Schema Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property
Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier
Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry
Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema
Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
Time Format Time Format Specifies the format to use when reading/writing Time fields. If not specified, Time fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15).
Timestamp Format Timestamp Format Specifies the format to use when reading/writing Timestamp fields. If not specified, Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/01/2017 18:04:15).
Attribute Prefix attribute_prefix If this property is set, the name of attributes will be prepended with a prefix when they are added to a record.
Field Name for Content content_field_name If tags with content (e. g. <field>content</field>) are defined as nested records in the schema, the name of the tag will be used as name for the record and the value of this property will be used as name for the field. If tags with content shall be parsed together with attributes (e. g. <field attribute="123">content</field>), they have to be defined as records. In such a case, the name of the tag will be used as the name for the record and the value of this property will be used as the name for the field holding the original content. The name of the attribute will be used to create a new record field, the content of which will be the value of the attribute. For more information, see the 'Additional Details...' section of the XMLReader controller service's documentation.
Parse XML Attributes parse_xml_attributes true - true - false When 'Schema Access Strategy' is 'Infer Schema' and this property is 'true' then XML attributes are parsed and added to the record as new fields. When the schema is inferred but this property is 'false', XML attributes and their values are ignored.
Expect Records as Array * record_format false - false - true - Use attribute 'xml.stream.is.array' This property defines whether the reader expects a FlowFile to consist of a single Record or a series of Records with a "wrapper element". Because XML does not provide for a way to read a series of XML documents from a stream directly, it is common to combine many XML documents by concatenating them and then wrapping the entire XML blob with a "wrapper element". This property dictates whether the reader expects a FlowFile to consist of a single Record or a series of Records with a "wrapper element" that will be ignored.
Schema Inference Cache schema-inference-cache Specifies a Schema Cache to use when inferring the schema. If not populated, the schema will be inferred each time. However, if a cache is specified, the cache will first be consulted and if the applicable schema can be found, it will be used instead of inferring the schema.
Display Name API Name Default Value Allowable Values Description
Character Set * Character Set UTF-8 The Character set to use when writing the data to the FlowFile
Date Format Date Format Specifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/01/2017).
Schema Access Strategy * Schema Access Strategy inherit-record-schema - Inherit Record Schema - Use 'Schema Name' Property - Use 'Schema Text' Property Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
Schema Cache Schema Cache Specifies a Schema Cache to add the Record Schema to so that Record Readers can quickly lookup the schema.
Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property
Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier
Schema Reference Writer * Schema Reference Writer Service implementation responsible for writing FlowFile attributes or content header with Schema reference information
Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry
Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema
Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
Schema Write Strategy * Schema Write Strategy no-schema - Do Not Write Schema - Set 'schema.name' Attribute - Set 'avro.schema' Attribute - Schema Reference Writer Specifies how the schema for a Record should be added to the data.
Time Format Time Format Specifies the format to use when reading/writing Time fields. If not specified, Time fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15).
Timestamp Format Timestamp Format Specifies the format to use when reading/writing Timestamp fields. If not specified, Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/01/2017 18:04:15).
Array Tag Name array_tag_name Name of the tag used by property "Wrap Elements of Arrays" to write arrays
Wrap Elements of Arrays * array_wrapping no-wrapping - Use Property as Wrapper - Use Property for Elements - No Wrapping Specifies how the writer wraps elements of fields of type array
Omit XML Declaration * omit_xml_declaration false - true - false Specifies whether or not to include XML declaration
Pretty Print XML * pretty_print_xml false - true - false Specifies whether or not the XML should be pretty printed
Name of Record Tag record_tag_name Specifies the name of the XML record tag wrapping the record fields. If this is not set, the writer will use the record name in the schema.
Name of Root Tag root_tag_name Specifies the name of the XML root tag wrapping the record set. This property has to be defined if the writer is supposed to write multiple records in a single FlowFile.
Suppress Null Values * suppress_nulls never-suppress - Never Suppress - Always Suppress - Suppress Missing Values Specifies how the writer should handle a null field
Display Name API Name Default Value Allowable Values Description
Allow Comments * Allow Comments false - true - false Whether to allow comments when parsing the JSON document
Date Format Date Format Specifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/01/2017).
Max String Length * Max String Length 20 MB The maximum allowed length of a string value when parsing the JSON document
Schema Access Strategy * Schema Access Strategy infer-schema - Infer Schema - Use 'Schema Name' Property - Use 'Schema Text' Property - Schema Reference Reader Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Branch Schema Branch Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
Schema Name Schema Name $\{schema.name\} Specifies the name of the schema to lookup in the Schema Registry property
Schema Reference Reader * Schema Reference Reader Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier
Schema Registry Schema Registry Specifies the Controller Service to use for the Schema Registry
Schema Text Schema Text $\{avro.schema\} The text of an Avro-formatted Schema
Schema Version Schema Version Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
Time Format Time Format Specifies the format to use when reading/writing Time fields. If not specified, Time fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15).
Timestamp Format Timestamp Format Specifies the format to use when reading/writing Timestamp fields. If not specified, Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/01/2017 18:04:15).
Schema Application Strategy * schema-application-strategy SELECTED_PART - Whole JSON - Selected Part Specifies whether the schema is defined for the whole JSON or for the selected part starting from "Starting Field Name".
Schema Inference Cache schema-inference-cache Specifies a Schema Cache to use when inferring the schema. If not populated, the schema will be inferred each time. However, if a cache is specified, the cache will first be consulted and if the applicable schema can be found, it will be used instead of inferring the schema.
Starting Field Name starting-field-name Skips forward to the given nested JSON field (array or object) to begin processing.
Starting Field Strategy * starting-field-strategy ROOT_NODE - Root Node - Nested Field Start processing from the root node or from a specified nested node.
Name SERVICE_TYPE Time Zone Units DATES
[CORTEX_AGENT_USAGE_HISTORY](/sql-reference/account-usage/cortex_agent_usage_history) CORTEX_AGENTS UTC Converted to local [1] Tokens, Tools Data begins 11/10/2025
[CORTEX_AI_FUNCTIONS_USAGE_HISTORY](/sql-reference/account-usage/cortex_ai_functions_usage_history) AI_SERVICES UTC Converted to local [1] Tokens Data begins 1/5/2026
[CORTEX_CODE_CLI_USAGE_HISTORY](/sql-reference/account-usage/cortex_code_cli_usage_history) CORTEX_CODE_CLI UTC Tokens, Tools Data begins 2/16/2026
[CORTEX_CODE_SNOWSIGHT_USAGE_HISTORY](/sql-reference/account-usage/cortex_code_snowsight_usage_history) CORTEX_CODE_SNOWSIGHT UTC Tokens, Tools Data begins 3/13/2026, billing begins 4/1/2026
[CORTEX_ANALYST_USAGE_HISTORY](/sql-reference/account-usage/cortex_analyst_usage_history) AI_SERVICES UTC Converted to local [1] Messages 365 days of data
[CORTEX_FINE_TUNING_USAGE_HISTORY](/sql-reference/account-usage/cortex_fine_tuning_usage_history) AI_SERVICES UTC Converted to local [1] Fine-tuning time 365 days of data
[CORTEX_PROVISIONED_THROUGHPUT_USAGE_HISTORY](/sql-reference/account-usage/cortex_provisioned_throughput_usage_history) AI_SERVICES UTC PTU Hours 365 days of data
[CORTEX_SEARCH_DAILY_USAGE_HISTORY](/sql-reference/account-usage/cortex_search_daily_usage_history) [3] AI_SERVICES Local Serving time, Tokens 365 days of data
[SNOWFLAKE_INTELLIGENCE_USAGE_HISTORY](/sql-reference/account-usage/snowflake_intelligence_usage_history_view) SNOWFLAKE_INTELLIGENCE UTC Converted to local [1] Tokens, Tools Data begins 11/10/2025
[CORTEX_REST_API_USAGE_HISTORY](/sql-reference/account-usage/cortex_rest_api_usage_history) [2] AI_INFERENCE UTC Tokens (note: in currency) Data begins 11/1/2025
Name Service Type Time Zone Dates Notes
[CORTEX_SEARCH_SERVING_USAGE_HISTORY](/sql-reference/account-usage/cortex_search_serving_usage_history) AI_SERVICES UTC Converted to local [1] 365 days of data This credit total includes the embedding costs captured in CORTEX_AI_FUNCTIONS_USAGE_HISTORY.
CORTEX_SEARCH_BATCH_QUERY_USAGE_HISTORY AI_SERVICES UTC Converted to local [1] Data begins on 3/26/2026 This credit total includes the embedding costs captured in CORTEX_AI_FUNCTIONS_USAGE_HISTORY.
[CORTEX_AISQL_USAGE_HISTORY](/sql-reference/account-usage/cortex_aisql_usage_history) AI_SERVICES Data starts on 11/21/2025 Slated for deprecation on 1/15/2027 This view includes totals of all functions except AI_EXTRACT.
[CORTEX_DOCUMENT_PROCESSING_USAGE_HISTORY](/sql-reference/account-usage/cortex_document_processing_usage_history) AI_SERVICES 365 days of data Slated for deprecation on This view includes document processing now captured in CORTEX_AI_FUNCTIONS_USAGE_HISTORY.
[CORTEX_FUNCTIONS_QUERY_USAGE_HISTORY](/sql-reference/account-usage/cortex_functions_query_usage_history) AI_SERVICES Data ends on 11/21/2025 Slated for deprecation on 11/22/2026 Please use CORTEX_AI_FUNCTIONS_USAGE_HISTORY.
[CORTEX_FUNCTIONS_USAGE_HISTORY](/sql-reference/account-usage/cortex_functions_usage_history) AI_SERVICES Data ends on 11/21/2025 Slated for deprecation on 11/22/2026 Please use CORTEX_AI_FUNCTIONS_USAGE_HISTORY.
Feature Budget capabilities
Cortex Agents [Resource budgets](/user-guide/snowflake-cortex/cortex-agents-resource-budgets), [shared resource budgets](https://docs.snowflake.com/en/user-guide/budgets/budget-shared-resources)
Cortex AI Functions [Shared resource budgets](https://docs.snowflake.com/en/user-guide/budgets/budget-shared-resources)
Cortex Code CLI (Consumption) [Shared resource budgets](https://docs.snowflake.com/en/user-guide/budgets/budget-shared-resources), [credit usage limits](https://docs.snowflake.com/en/user-guide/cortex-code/credit-usage-limit)
Cortex Code in Snowsight [Shared resource budgets](https://docs.snowflake.com/en/user-guide/budgets/budget-shared-resources), [credit usage limits](https://docs.snowflake.com/en/user-guide/cortex-code/credit-usage-limit)
%sf-intelligence% [Resource budgets](/user-guide/snowflake-cortex/snowflake-cowork/cowork-resource-budgets), [shared resource budgets](https://docs.snowflake.com/en/user-guide/budgets/budget-shared-resources)
\*Cortex Search\* \*Planned resource budgets for the coming year\*
Situation Example message HTTP status code
Request validation failed. The query was cancelled as the model wouldn't be able to generate a valid response. This can be caused by a malformed request. `please provide a type for the response format object`, `please provide a schema for the response format object` 400
Input schema validation failed. The query was cancelled as the model wouldn't be able to generate a valid response. This can be caused by missing required properties in request payload or using unsupported json schema features such as constraints, or inappropriate use of $ref mechanism (for example, reaching outside of the schema `input schema validation error: ` with one of the reasons below: - `/properties/city additional properties are not allowed` - `/properties/arrondissement regexp pattern ^[a-zA-Z0-9_-]{1,64}$ mismatch on string` - `/properties/province/type sting should be one of [\"object\", \"array\", \"string\", \"number\", \"integer\", \"boolean\", \"null\"]` - `Invalid ref #/http://example.com/custom-email-validator.json#. Please define a valid object in #/$defs/ section` 400
Model output validation failed. The model could not generate a response that matched the schema. `json mode output validation error: ` with one of the reasons below: - `An error occurred while unmarshalling the model output. Model returned invalid JSON that cannot be parsed due to: unexpected end of JSON input` 422
Message Explanation
_COMPLETE_WITH_PROMPT_HISTORY_LLM$V1 with remote service error: 400 '"invalid request parameters: unsupported document content type: application/vnd.ms-excel" The selected file of an unsupported type (in this example, a Microsoft Excel file). Only Claude models support Excel files.
Request failed for external function _COMPLETE_WITH_PROMPT_HISTORY_LLM$V1 with remote service error: 400 '"invalid request parameters: File data exceeds the limit of 10.00 MB for file prefix/file.pdf" File size exceeds limit (10MB in this example).
Remote file ['@docs/file.pdf](mailto:'@docs/file.pdf)' was not found. There are several potential causes. The file might not exist. The required credentials may be missing or invalid. If you are running a copy command, please make sure files are not deleted when they are being loaded or files are not being loaded into two different tables concurrently with auto purge option. Possibly an error in the filename. Filenames are case-sensitive. Or the file might have been deleted.
Error in secure object May indicate that the stage does not exist. Check the stage name and ensure that the stage exists and is accessible. Be sure to use an at sign (@) at the beginning of the stage name. Ensure that the stage uses server-side encryption.
Request failed for external function COMPLETE$V6 with remote service error: 400 '"model "model_name" does not support given modality" The model provided in the request doesn't support document or text modality.
Request failed for external function _COMPLETE_WITH_PROMPT with remote service error: 500 '"internal error" Issue with processing the request on the server side. It could be the case that the file is corrupted or truncated.
Input data classification Output data classification Designation
%input-data% %output-data% Generally available functions are Covered AI Features. Preview functions are Preview AI Features.{" "} [1]
Input data classification Output data classification
%input-data% %output-data%
Condition What happens
You lose agent access You can still view and refresh the artifact. Follow-up questions with the original agent are not available.
You lose data access The last cached snapshot remains visible but refresh is unavailable.
Agent is deleted or modified The artifact and its saved query are unaffected. Follow-up questions use the current agent definition, if available.
Column Data type Description
RECORD_ID VARCHAR The unique identifier assigned by Snowflake for this evaluation record.
INPUT_ID VARCHAR The unique identifier assigned by Snowflake for this evaluation input.
REQUEST_ID VARCHAR The unique identifier assigned by Snowflake for this request.
TIMESTAMP TIMESTAMP_TZ The time (in UTC) at which the request was made.
DURATION_MS INT The amount of time, in milliseconds, that it took for the agent to return a response.
INPUT VARCHAR The query string used as input for this evaluation record.
OUTPUT VARCHAR The response returned by the Cortex Agent for this evaluation record.
ERROR VARCHAR Information about any errors that occurred during the request.
GROUND_TRUTH VARCHAR The ground truth information used to evaluate this record's Cortex Agent output. This column holds the JSON from your dataset's ground truth column, serialized as a string. For how `{{ground_truth}}` in custom metrics relates to this value, see the notes under [](#label-cortex-agent-evaluations-results-format).
METRIC_NAME VARCHAR The name of the metric evaluated for this record.
EVAL_AGG_SCORE NUMBER The evaluation score assigned for this record.
METRIC_TYPE VARCHAR The type of metric being evaluated. For built-in metrics, the value is `system`. For custom metrics, the value is `custom`.
METRIC_STATUS VARIANT A map containing information about the agent's HTTP response for this record, with the following keys:
- `status`: The HTTP status code of the response. - `message`: The HTTP message sent in the status response.
METRIC_CALLS ARRAY An array of VARIANT values that contain information about the computed metric. Each array entry contains the metric's criteria, an explanation of the metric score, and metadata. The keys of each entry are:
- `criteria`: The criteria used by an LLM judge to evaluate response correctness. - `explanation`: An explanation of why the score was assigned. - `full_metadata`: A VARIANT value that contains metadata and information about this metric's processing by the LLM judge. The keys of this map include: - `completion_tokens`: The number of output tokens generated by the LLM for this metric evaluation call. - `normalized_score`: The original evaluation score normalized to the range [0.0, 1.0], rounded to two decimal places. - `original_score`: The original score assigned by this metric evaluation for the record. - `prompt_tokens`: The number of tokens taken up by the prompt provided to the LLM judge. - `total_tokens`: The total number of tokens used by the LLM judge for this computation.
TOTAL_INPUT_TOKENS INT The total number of tokens used to process the input query.
TOTAL_OUTPUT_TOKENS INT The total number of output tokens produced by the Cortex Agent.
LLM_CALL_COUNT INT Counts the number of times any LLM was called, either by the agent or an evaluation judge.
Model Cross-cloud (Any region) AWS US (Cross-region) AWS EU (Cross-region) AWS APJ (Cross-region) Azure US (Cross-region)
`claude-haiku-4-5` \* \*
`claude-sonnet-4-5` %cm% %cm% %cm%
`claude-4-sonnet` %cm% %cm% %cm% %cm%
`claude-sonnet-4-6` %cm% %cm%
`openai-gpt-4.1` %cm% %cm%
Input data classification Output data classification Designation
%input-data% Customer Data %designation% [1]
Parameter Description
`database` (Required) Identifier for the database to which the resource belongs.
`schema` (Required) Schema identifier.
Parameter Description
`createMode` (Optional) Resource creation mode. Valid values:
- `errorIfExists` - `orReplace` - `ifNotExists`
Header Description
`Authorization` (Required) Authorization token. For more information, see [](#label-chat-api-authenticate-example).
`Content-Type` (Required) application/json
Field Type Description
`name` string Name of the agent.
`comment` string Optional comment about the agent.
`profile` [](#label-snowflake-agent-object-AgentProfile) Agent profile information (display name, avatar, color, etc.).
`models` [](#label-snowflake-agent-object-ModelConfig) Model configuration for the agent. Includes the orchestration model (e.g., claude-4-sonnet). If not provided, a model is automatically selected. Currently only available for the `orchestration` step.
`instructions` [](#label-snowflake-agent-object-AgentInstructions) Instructions for the agent's behavior, including response, orchestration, and sample questions.
`orchestration` [](#label-snowflake-agent-object-OrchestrationConfig) Orchestration configuration, including budget constraints (e.g., seconds, tokens).
`tools` array of [](#label-snowflake-agent-object-Tool) List of tools available for the agent to use. Each tool includes a tool_spec with type, name, description, and input schema. Tools may have a corresponding configuration in tool_resources.
`tool_resources` map of [](#label-snowflake-agent-object-ToolResource) Configuration for each tool referenced in the tools array. Keys must match the name of the respective tool.
Parameter Description
`database` (Required) Identifier for the database to which the resource belongs. You can use the /api/v2/databases GET request to get a list of available databases.
`schema` (Required) Identifier for the schema to which the resource belongs. You can use the /api/v2/databases/\{database\}/schemas GET request to get a list of available schemas for the specified database.
`name` (Required) Identifier for the agent.
Header Description
`Authorization` (Required) Authorization token. For more information, see [](#label-chat-api-authenticate-example).
`Content-Type` (Required) application/json
Header Description
`X-Snowflake-Request-ID` Unique ID of the API request.
`Link` Links to the page of results (e.g. the first page, the last page, etc.). The header can include multiple url entries with different rel attribute values that specify the page to return (first, next, prev, and last).
Parameter Description
`database` (Required) Identifier for the database to which the resource belongs. You can use the _/api/v2/databases_ GET request to get a list of available databases.
`schema` (Required) Schema identifier. You can use the _/api/v2/databases/\{database\}/schemas_ GET request to get a list of available schemas for the specified database.
`name` (Required) Name of the agent.
Header Description
`Authorization` (Required) Authorization token. For more information, see [](#label-chat-api-authenticate-example).
`Content-Type` (Required) application/json
Field Type Description
`comment` string Optional comment about the agent.
`profile` [](#label-snowflake-agent-object-AgentProfile) Agent profile information (display name, avatar, color, etc.).
`models` [](#label-snowflake-agent-object-ModelConfig) Model configuration for the agent. Includes the orchestration model (e.g., claude-4-sonnet). If not provided, a model is automatically selected. Currently only available for the `orchestration` step.
`instructions` [](#label-snowflake-agent-object-AgentInstructions) Instructions for the agent's behavior, including response, orchestration, and sample questions.
`orchestration` [](#label-snowflake-agent-object-OrchestrationConfig) Orchestration configuration, including budget constraints (e.g., seconds, tokens).
`tools` array of [](#label-snowflake-agent-object-Tool) List of tools available for the agent to use. Each tool includes a tool_spec with type, name, description, and input schema. Tools may have a corresponding configuration in tool_resources.
`tool_resources` map of [](#label-snowflake-agent-object-ToolResource) Configuration for each tool referenced in the tools array. Keys must match the name of the respective tool.
Parameter Description
`database` (Required) Identifier for the database to which the resource belongs. You can use the /api/v2/databases GET request to get a list of available databases.
`schema` (Required) Identifier for the schema to which the resource belongs. You can use the /api/v2/databases/\{database\}/schemas GET request to get a list of available schemas for the specified database.
Parameter Description
`like` (Optional) Filter the output by resource name. Uses case-insensitive pattern matching with support for SQL wildcard characters.
`fromName` (Optional) Enable fetching rows only following the first row whose object name matches the specified string. Case-sensitive and does not have to be the full name.
`showLimit` (Optional) Limit the maximum number of rows returned by the command. Minimum: 1. Maximum: 10000.
Header Description
`Authorization` (Required) Authorization token. For more information, see [](#label-chat-api-authenticate-example).
`Content-Type` (Required) application/json
Header Description
`X-Snowflake-Request-ID` Unique ID of the API request.
`Link` Links to the page of results (e.g. the first page, the last page, etc.). The header can include multiple url entries with different rel attribute values that specify the page to return (first, next, prev, and last).
Parameter Description
`database` (Required) Identifier for the database to which the resource belongs. You can use the /api/v2/databases GET request to get a list of available databases.
`schema` (Required) Identifier for the schema to which the resource belongs. You can use the /api/v2/databases/\{database\}/schemas GET request to get a list of available schemas for the specified database.
`name` (Required) Identifier for the agent.
Parameter Description
`ifExists` (Optional) Specifies how to handle the request if the agent does not exist. - `true`: The endpoint does not throw an error if the agent does not exist. It returns a 200 success response, but does not take any action. - `false`: The endpoint throws an error if the agent does not exist.
Header Description
`Authorization` (Required) Authorization token. For more information, see [](#label-chat-api-authenticate-example).
`Content-Type` (Required) application/json
Field Type Description
`response` string Instructions for response generation.
`orchestration` string These custom instructions are used when the agent is planning which tools to use.
Field Type Description
`display_name` string Display name for the agent.
Field Type Description
`seconds` integer Time budget in seconds.
`tokens` integer Token budget.
Field Type Description
`type` string The type of execution environment, currently only `warehouse` is supported.
`warehouse` string The name of the warehouse. Case-sensitive, if it is an unquoted identifier, provide the name in all-caps.
`query_timeout` integer The query timeout in seconds
Field Type Description
`orchestration` string Model to use for orchestration. If not provided, a model is automatically selected.
Field Type Description
`budget` [](#label-snowflake-agent-object-BudgetConfig) Budget constraints for the agent. If more than one constraint is specified, whichever is first hit will end the request.
Field Type Description
`tool_spec` [](#label-snowflake-agent-object-ToolSpec) Specification of the tool's type, configuration, and input requirements.
Field Type Description
`type` string The type of the input schema object.
`description` string A description of what the input is.
`properties` map of [](#label-snowflake-agent-object-ToolInputSchema) If type is `object`, definitions of each input parameter.
`items` [](#label-snowflake-agent-object-ToolInputSchema) If type is `array`, the schema for the elements of the array.
`required` array of string If type is `object`, list of required input parameter names.
Field Type Description
`semantic_model_file` string The path to a file stored in a Snowflake Stage holding the semantic model yaml.
`semantic_view` string The name of the Snowflake native semantic model object.
`execution_environment` [](#label-snowflake-agent-object-ExecutionEnvironment) Configuration for how to execute the generated SQL query.
Field Type Description
`search_service` string The fully qualified name of the search service.
`title_column` string The title column of the document.
`id_column` string The ID column of the document.
`filter` object Filter query for search results.
Field Type Description
`type` string If the tool is server-side executed, whether it is a Stored Procedure or a UDF.
`execution_environment` [](#label-snowflake-agent-object-ExecutionEnvironment)
`identifier` string Fully qualified name of the Stored Procedure or UDF.
Field Type Description
`max_results` integer Max web search results returned.
Field Type Description
`type` string The type of tool capability. Can be specialized types like 'cortex_analyst_text_to_sql' or 'generic' for general-purpose tools.
`name` string Unique identifier for referencing this tool instance. Used to match with configuration in tool_resources.
`description` string Description of the tool to be considered for tool use.
`input_schema` [](#label-snowflake-agent-object-ToolInputSchema) JSON Schema definition of the expected input parameters for this tool. This will be fed to the agent so it knows the structure it should follow for when generating the input for ToolUses. Required for generic tools to specify their input parameters.
Parameter Description
`database` (Required) The database containing the agent. You can use the _/api/v2/databases_ GET request to get a list of available databases.
`schema` (Required) The schema containing the agent. You can use the _/api/v2/databases/\{database\}/schemas_ GET request to get a list of available schemas for the specified database.
`name` (Required) The name of the agent.
Header Description
`Authorization` (Required) Authorization token. See [](#label-chat-api-authenticate-example).
`Content-Type` (Required) application/json
`Accept` (Optional) Response content type. Use `text/event-stream` for streaming responses or `application/json` for a single non-streaming response.
Field Type Description
`thread_id` integer The thread ID for the conversation. If thread_id is used, then parent_message_id must be passed as well.
`parent_message_id` integer The ID of the parent message in the thread. If this is the first message, parent_message_id should be 0.
`messages` array of [](#label-snowflake-agent-run-Message) If thread_id and parent_message_id are passed in the request, messages includes the current user message in the conversation. Else, messages includes the conversation history and the current message. Messages contains both user queries and assistant responses in chronological order.
`stream` boolean Whether to return a streaming response (`text/event-stream`) or a non-streaming JSON response (`application/json`). If true, the response will be streamed as Server-Sent Events. If false, the response will be returned as JSON.
`tool_choice` [](#label-snowflake-agent-run-ToolChoice) Configures how the agent should select and use tools during the interaction. Controls whether tool use is automatic, required, or whether specific tools should be used.
Header Description
`Authorization` (Required) Authorization token. See [](#label-chat-api-authenticate-example).
`Content-Type` (Required) application/json
`Accept` (Optional) Response content type. Use `text/event-stream` for streaming responses or `application/json` for a single non-streaming response.
Field Type Description
`thread_id` integer The thread ID for the conversation. If thread_id is used, then parent_message_id must be passed as well.
`parent_message_id` integer The ID of the parent message in the thread. If this is the first message, parent_message_id should be 0.
`messages` array of [](#label-snowflake-agent-run-Message) If thread_id and parent_message_id are passed in the request, messages includes the current user message in the conversation. Else, messages includes the conversation history and the current message. Messages contains both user queries and assistant responses in chronological order.
`stream` boolean Whether to return a streaming response (`text/event-stream`) or a non-streaming JSON response (`application/json`). If true, the response will be streamed as Server-Sent Events. If false, the response will be returned as JSON.
`tool_choice` [](#label-snowflake-agent-run-ToolChoice) Configures how the agent should select and use tools during the interaction. Controls whether tool use is automatic, required, or whether specific tools should be used.
`models` [](#label-snowflake-agent-run-ModelConfig) Model configuration for the agent. Includes the orchestration model (e.g., claude-4-sonnet). If not provided, a model is automatically selected. Currently only available for the `orchestration` step.
`instructions` [](#label-snowflake-agent-run-AgentInstructions) Instructions for the agent's behavior, including response, orchestration, and sample questions.
`orchestration` [](#label-snowflake-agent-run-OrchestrationConfig) Orchestration configuration, including budget constraints (e.g., seconds, tokens).
`tools` array of [](#label-snowflake-agent-run-Tool) List of tools available for the agent to use. Each tool includes a tool_spec with type, name, description, and input schema. Tools may have a corresponding configuration in tool_resources.
`tool_resources` map of [](#label-snowflake-agent-run-ToolResource) Configuration for each tool referenced in the tools array. Keys must match the name of the respective tool.
Field Type Description
`role` string The role for the message. Always `assistant` in the API response.
`content` array of [](#label-snowflake-agent-run-MessageContentItem) The content generated by the agent.
`warnings` array of [](#label-snowflake-agent-run-Warning) Non-fatal warnings that occurred during processing. Present for non-streaming clients or as a summary.
`metadata` [](#label-snowflake-agent-run-ResponseMetadata)
`status` string The completion status of the agent run. Set to \"cancelled\" when the run was terminated via CancelAgentRun.
Field Type Description
`content_index` integer The index in the response content array this event represents
`text` string A text result from the agent
`annotations` array of [](#label-snowflake-agent-run-Annotation) Any annotations attached to the text result (e.g. citations)
`is_elicitation` boolean Whether this text content is the agent asking for more information from the end user.
Field Type Description
`content_index` integer The index in the response content array this event represents
`text` string The text delta
`is_elicitation` boolean Whether this text content is the agent asking for more information from the end user.
Field Type Description
`content_index` integer The index in the response content array this event represents
`annotation_index` integer The index in the annotation array this `annotation` belongs to.
`annotation` [](#label-snowflake-agent-run-Annotation) The annotation object being added.
Field Type Description
`content_index` integer The index in the response content array this event represents
`text` string Thinking tokens from the agent
`signature` string The signature of the thinking token
Field Type Description
`content_index` integer The index in the response content array this event represents
`text` string The thinking token
`signature` string The signature of the thinking token
Field Type Description
`content_index` integer The index in the response content array this event represents
`tool_use_id` string Unique identifier for this tool use. Can be used to associated tool results.
`type` string The type of the tool (e.g. cortex_search, cortex_analyst_text_to_sql)
`name` string The unique identifier for this tool instance
`input` object The structured input for this tool. The schema of this object should will vary depending on the tool spec.
`client_side_execute` boolean Whether the tool use is executed on the client side.
`permission` [](#label-snowflake-agent-run-ToolUsePermission)
Field Type Description
`content_index` integer The index in the response content array this event represents
`tool_use_id` string Unique identifier for this tool use. Can be used to associated tool results.
`type` string The type of the tool (e.g. cortex_search, cortex_analyst_text_to_sql)
`name` string The unique identifier for this tool instance
`content` array of [](#label-snowflake-agent-run-ToolResultContent) The content on the tool result
`status` string The status of tool execution
Field Type Description
`tool_use_id` string Unique identifier for this tool use.
`tool_type` string The type of the tool (e.g. cortex_search, cortex_analyst_text_to_sql)
`status` string Enum for the current state.
`message` string A more descriptive message expanding on the current status.
`details` object Tool-specific status details.
Field Type Description
`content_index` integer The index in the response content array this event represents
`tool_use_id` string Unique identifier for this tool use. Can be used to associated tool results.
`tool_type` string The type of the tool (always cortex_analyst_text_to_sql for this event)
`tool_name` string The unique identifier for this tool instance
`delta` [](#label-snowflake-agent-run-CortexAnalystToolResultDelta) The content delta
Field Type Description
`content_index` integer The index in the response content array this event represents
`tool_use_id` string The ID of the tool use that generated this table
`query_id` string The query id of the sql query that generated this data
`result_set` [](#label-snowflake-agent-run-ResultSet) The SQL results to render a table. Matches the schema from Snowflake's SQL API ResultSet ([https://docs.snowflake.com/en/developer-guide/sql-api/reference#resultset](https://docs.snowflake.com/en/developer-guide/sql-api/reference#resultset))
`title` string The title for this table
Field Type Description
`content_index` integer The index in the response content array this event represents
`tool_use_id` string The ID of the tool use that generated this chart
`chart_spec` string The vega-lite chart specification serialized as a string
Field Type Description
`status` string Enum for the current state.
`message` string A more descriptive message expanding on the current status.
Field Type Description
`message` string The warning message to display to the user.
`code` string Optional structured warning code for clients to parse and handle.
Field Type Description
`code` string The Snowflake error code
`error_code` string Error code, same as `code` above. This property has been deprecated and will be removed in a future release, but is temporarily supported for short-term backward compatibility.
`message` string The error message
`request_id` string The unique identifier for this request
Field Type Description
`metadata` [](#label-snowflake-agent-run-Metadata)
Field Type Description
`response` string Instructions for response generation.
`orchestration` string These custom instructions are used when the agent is planning which tools to use.
Field Type Description
`type` string The citation type (always `cortex_search_citation`)
`index` integer The index of the citation in the search results.
`search_result_id` string The unique identifier for the search result.
`doc_id` string The unique identifier for the document.
`doc_title` string The title of the document.
`text` string The text excerpt from the document used as the citation.
Field Type Description
`seconds` integer Time budget in seconds.
`tokens` integer Token budget.
Field Type Description
`tool_use_id` string The ID of the tool use that generated this chart
`chart_spec` string The vega-lite chart specification serialized as a string
Field Type Description
`index` integer The index of the suggestion array this delta represents
`delta` string The text delta for the suggestion in this index
Field Type Description
`text` string A text delta from Cortex Analyst's final response.
`think` string A text delta from Cortex Analyst's reasoning steps.
`sql` string A delta from Cortex Analyst's SQL output. Currently, the entire SQL query comes in a single event but we may stream the SQL token-by-token in the future.
`sql_explanation` string A delta from Cortex Analyst's explanation of what the SQL query does
`query_id` string The query id once SQL execution begins
`verified_query_used` boolean Whether a verified query was used to generate this response
`result_set` [](#label-snowflake-agent-run-ResultSet) The results from SQL execution. Matches the schema from Snowflake's SQL API ResultSet ([https://docs.snowflake.com/en/developer-guide/sql-api/reference#resultset](https://docs.snowflake.com/en/developer-guide/sql-api/reference#resultset))
`suggestions` [](#label-snowflake-agent-run-CortexAnalystSuggestionDelta) A delta from Cortex Analyst's suggested questions. This is sent when Analyst cannot answer the question due to missing information or other failures.
Field Type Description
`type` string The type of execution environment, currently only `warehouse` is supported.
`warehouse` string The name of the warehouse. Case-sensitive, if it is an unquoted identifier, provide the name in all-caps.
`query_timeout` integer The query timeout in seconds
Field Type Description
`total` integer Total input tokens processed (including cached tokens).
`cache_read` integer Input tokens read from cache.
`cache_write` integer Input tokens written to cache.
`uncached` integer Input tokens that were not cached.
Field Type Description
`role` string Identifies who sent the message - either the user or the assistant. User messages typically contain queries, while assistant messages contain responses and tool results.
`content` array of [](#label-snowflake-agent-run-MessageContentItem) Array of content elements making up the message. Can include text, tool results, or custom content types.
`status` string The completion status of the message set by the server when saving to a thread. Set to \"error\" when the agent run terminated with an error; in that case the `error` field contains the error details.
`error` [](#label-snowflake-agent-run-MessageError) Details about the error that terminated the agent run. Only set when `status` is \"error\".
Field Type Description
`type` string The content type (always `chart`).
`chart` [](#label-snowflake-agent-run-ChartContent) The chart.
Field Type Description
`type` string Content type identifier.
`permission_decision` [](#label-snowflake-agent-run-PermissionDecision)
Field Type Description
`type` string The content type (always `table`).
`table` [](#label-snowflake-agent-run-TableContent) The table.
Field Type Description
`text` string A text result from the agent
`annotations` array of [](#label-snowflake-agent-run-Annotation) Any annotations attached to the text result (e.g. citations)
`is_elicitation` boolean Whether this text content is the agent asking for more information from the end user.
`type` string The content type (always `text`).
Field Type Description
`type` string The content type (always `thinking`).
`thinking` [](#label-snowflake-agent-run-ThinkingContent) The thinking content.
Field Type Description
`type` string The content type (always `tool_result`).
`tool_result` [](#label-snowflake-agent-run-ToolResult) The tool result.
Field Type Description
`type` string The content type (always `tool_use`).
`tool_use` [](#label-snowflake-agent-run-ToolUse) The tool use.
Field Type Description
`code` string The Snowflake error code.
`message` string The error message.
Field Type Description
`role` string Identifies who sent the message - either the user or the assistant.
`message_id` integer The thread message id. Use this ID (when role is `assistant`) to ask a followup question on the thread.
`run_id` string The unique identifier for this Agent Run. Can be used to reconnect to the output stream.
Field Type Description
`orchestration` string Model to use for orchestration. If not provided, a model is automatically selected.
Field Type Description
`budget` [](#label-snowflake-agent-run-BudgetConfig) Budget constraints for the agent. If more than one constraint is specified, whichever is first hit will end the request.
Field Type Description
`total` integer Total output tokens generated.
Field Type Description
`tool_use_id` string The ID of the tool_use this decision applies to.
`decision` string Must match one of the options from the tool_use permission.options list.
`reason` string Optional reason for denying permission. Only meaningful when decision is \"Deny\". This reason will be shown to the LLM as an error message so it can respond appropriately.
Field Type Description
`usage` [](#label-snowflake-agent-run-UsageMetadata)
`run_id` string The unique identifier for this Agent Run. Can be used to reconnect to the output stream.
`thread_id` integer The Thead ID, if using a thread.
`user_message_id` integer If using a Thread, this is the message ID of the user question sent in this request. The `assistant_message_id` is a child of this message.
`assistant_message_id` integer If using a Thread, this is the message ID of this assistant response. Use this value as the `parent_message_id` in followup requests.
Field Type Description
`statementHandle` string The query id.
`resultSetMetaData` [](#label-snowflake-agent-run-ResultSetMetaData) Metadata on the result set.
`data` array of array 2D array representing the data
Field Type Description
`partition` integer The index number of the partition.
`numRows` integer The total number of rows of results.
`format` string Format of the data in the result set.
`rowType` array of [](#label-snowflake-agent-run-RowType) Description of the columns in the result.
Field Type Description
`name` string Name of the column.
`type` string Snowflake data type of the column. ([https://docs.snowflake.com/en/sql-reference/intro-summary-data-types](https://docs.snowflake.com/en/sql-reference/intro-summary-data-types))
`length` integer Length of the column.
`precision` integer Precision of the column.
`scale` integer Scale of the column.
`nullable` boolean Specifies whether or not the column is nullable.
Field Type Description
`tool_use_id` string The ID of the tool use that generated this table
`query_id` string The query id of the sql query that generated this data
`result_set` [](#label-snowflake-agent-run-ResultSet) The SQL results to render a table. Matches the schema from Snowflake's SQL API ResultSet ([https://docs.snowflake.com/en/developer-guide/sql-api/reference#resultset](https://docs.snowflake.com/en/developer-guide/sql-api/reference#resultset))
`title` string The title for this table
Field Type Description
`text` string Thinking tokens from the agent
`signature` string The signature of the thinking token
Field Type Description
`model_name` string Name of the model used.
`input_tokens` [](#label-snowflake-agent-run-InputTokens)
`output_tokens` [](#label-snowflake-agent-run-OutputTokens)
`context_window` integer The model's context window size (in tokens).
Field Type Description
`tool_spec` [](#label-snowflake-agent-run-ToolSpec) Specification of the tool's type, configuration, and input requirements.
Field Type Description
`type` string Determines how tools are selected: - auto - Automatic tool selection (default) - required - Must use at least one tool - tool - Use specific named tools
`name` array of string List of specific tool names to use when type is 'tool'.
Field Type Description
`type` string The type of the input schema object.
`description` string A description of what the input is.
`properties` map of [](#label-snowflake-agent-run-ToolInputSchema) If type is `object`, definitions of each input parameter.
`items` [](#label-snowflake-agent-run-ToolInputSchema) If type is `array`, the schema for the elements of the array.
`required` array of string If type is `object`, list of required input parameter names.
Field Type Description
`semantic_model_file` string The path to a file stored in a Snowflake Stage holding the semantic model yaml.
`semantic_view` string The name of the Snowflake native semantic model object.
`execution_environment` [](#label-snowflake-agent-run-ExecutionEnvironment) Configuration for how to execute the generated SQL query.
Field Type Description
`search_service` string The fully qualified name of the search service.
`title_column` string The title column of the document.
`id_column` string The ID column of the document.
`filter` object Filter query for search results.
Field Type Description
`type` string If the tool is server-side executed, whether it is a Stored Procedure or a UDF.
`execution_environment` [](#label-snowflake-agent-run-ExecutionEnvironment)
`identifier` string Fully qualified name of the Stored Procedure or UDF.
Field Type Description
`max_results` integer Max web search results returned.
Field Type Description
`tool_use_id` string Unique identifier for this tool use. Can be used to associated tool results.
`type` string The type of the tool (e.g. cortex_search, cortex_analyst_text_to_sql)
`name` string The unique identifier for this tool instance
`content` array of [](#label-snowflake-agent-run-ToolResultContent) The content on the tool result
`status` string The status of tool execution
Field Type Description
`type` string The type of result (always `json`)
`json` object Structured output from a tool. The schema varies depending on the tool type.
Field Type Description
`type` string The type of result (always `text`)
`text` string The result text
Field Type Description
`type` string The type of tool capability. Can be specialized types like 'cortex_analyst_text_to_sql' or 'generic' for general-purpose tools.
`name` string Unique identifier for referencing this tool instance. Used to match with configuration in tool_resources.
`description` string Description of the tool to be considered for tool use.
`input_schema` [](#label-snowflake-agent-run-ToolInputSchema) JSON Schema definition of the expected input parameters for this tool. This will be fed to the agent so it knows the structure it should follow for when generating the input for ToolUses. Required for generic tools to specify their input parameters.
Field Type Description
`tool_use_id` string Unique identifier for this tool use. Can be used to associated tool results.
`type` string The type of the tool (e.g. cortex_search, cortex_analyst_text_to_sql)
`name` string The unique identifier for this tool instance
`input` object The structured input for this tool. The schema of this object should will vary depending on the tool spec.
`client_side_execute` boolean Whether the tool use is executed on the client side.
`permission` [](#label-snowflake-agent-run-ToolUsePermission)
Field Type Description
`options` array of string The complete set of valid options the user may choose from, including \"Deny\" to deny permission.
Field Type Description
`tokens_consumed` array of [](#label-snowflake-agent-run-TokensConsumed) Token consumption details per model used in this request.
Field Type Description
`message` string The warning message to display to the user.
`code` string Optional structured warning code for clients to parse and handle.
Budget Best for Optimization behavior Example use cases
`demo` (2 iterations) Quick validation and workflow previews Performs a lightweight sanity check with minimal experimentation. Useful for validating the end-to-end workflow or previewing optimization behavior, but unlikely to uncover major improvements. Demo environments, prototype validation, smoke testing, initial workflow verification
`light` (6 iterations) Simple, well-defined tasks Evaluates a focused set of prompt and function-body variations. Best when small refinements are likely to produce meaningful gains. Sentiment classification, spam detection, language detection, yes/no validation, simple text categorization
`medium` (12 iterations) Multi-step or nuanced tasks Explores a broader range of optimization strategies, including alternative prompt structures and pre/post-processing approaches. Provides a balanced tradeoff between runtime, cost, and optimization quality. Theme extraction from documents, named entity extraction, multi-label classification, structured Q&A, formatted summarization workflows
`heavy` (18 iterations) Complex, high-value production workloads Conducts a deeper search across the optimization space, including advanced prompt restructuring and workflow modifications. Best for maximizing quality in production-critical systems. Legal contract analysis, medical record extraction, policy-based routing, multi-stage reasoning pipelines, context-aware PII redaction
Category Metric Use cases
Rule-based Exact Match Uses straightforward, case-insensitive string comparison to check whether an output exactly matches the expected result. This approach works best for strict classification tasks where precision matters.
Rule-based Fuzzy Match Relies on token-level similarity to compare outputs, making it tolerant of small spelling differences or minor character variations. It's a good fit when approximate matches are acceptable.
Rule-based Contains Match Looks for the presence of a specific substring within the output, which makes it especially useful for tasks like information extraction or keyword detection.
Semantic LLM-as-a-Judge Uses a reference language model to evaluate whether two pieces of text are meaningfully equivalent. This allows for more nuanced scoring in complex tasks such as summarization, translation, or other open-ended generation.
Customized Custom Metrics Custom metrics automatically generated by the AI Function Studio's agentic engine to align with the unique objectives and success criteria of your task. Ideal when standard approaches such as Exact Match or LLM-as-a-Judge are insufficient, enabling sophisticated, task-specific evaluation logic.
Requirement Value
Filename extensions `.jpg`, `.jpeg`, `.png`, `.webp`, `.gif`
Stage encryption Server-side encryption
Data type [FILE](#label-data-types-file)
Model MMMU Mathvista ChartQA DocVQA VQAv2
GPT-4o 68.6 64.6 85.1 88.9 77.8
`openai-gpt-4.1` 75.0 72.0 - - -
`llama-4-maverick` 73.4 73.7 90 94.4 -
`llama-4-scout` 69.4 70.7 88.8 94.4 -
`pixtral-large` 64.0 69.4 88.1 85.7 67
Input data classification Output data classification Designation
%input-data% %output-data% Generally available functions are Covered AI Features. Preview functions are Preview AI Features.{" "} [1]
Input data classification Output data classification
%input-data% %output-data%
Media type Primary functions Common tasks
Documents `AI_COMPLETE`, `AI_PARSE_DOCUMENT`, `AI_EXTRACT` Q&A, summarization, extraction, comparison, chart understanding
Images `AI_COMPLETE`, `AI_CLASSIFY`, `AI_EMBED`, `AI_EXTRACT`, `AI_SIMILARITY`, `AI_FILTER` Caption, compare, classify, extract entities, image search
Audio `AI_COMPLETE`, `AI_TRANSCRIBE` Caption, compare, classify, extract entities, transcribe, identify speakers
Video `AI_COMPLETE`, `AI_TRANSCRIBE` Summarize, classify, extract metadata, search scenes, transcribe video or audio tracks
Video `AI_MULTI_EMBED` (`twelvelabs-marengo-embed-3-0` only) Video semantic search such as scene search, quotes, visual similarity, sports events, and brand and product search
Input data classification Output data classification Designation
%input-data% %output-data% Generally available functions are Covered AI Features. Preview functions are Preview AI Features.{" "} [1]
Input data classification Output data classification
%input-data% %output-data%
Parameter Type Session
Data Type BOOLEAN
Description Controls whether %cortex-analyst% functionality is enabled in your account.
Values - FALSE: %cortex-analyst% functionality is not available. - TRUE: %cortex-analyst% functionality is available.
Default TRUE
Input data classification Output data classification Designation
%input-data% Output (SQL query suggestion): Usage Data Query result (using SQL query suggestion): Customer Data %designation% [1]
Column Data type Description
RECORD_ID VARCHAR The unique identifier assigned by Snowflake for this evaluation record.
INPUT_ID VARCHAR The unique identifier assigned by Snowflake for this evaluation input.
REQUEST_ID VARCHAR The unique identifier assigned by Snowflake for this request.
TIMESTAMP TIMESTAMP_LTZ The time at which the request was made.
DURATION_MS INT The amount of time, in milliseconds, that it took for %cortex-analyst% to return a response.
INPUT VARCHAR The query string used as input for this evaluation record.
OUTPUT VARCHAR The response returned by %cortex-analyst% for this evaluation record.
ERROR VARCHAR Information about any errors which occurred during the request.
GROUND_TRUTH VARCHAR The ground truth information used to evaluate this record's %cortex-analyst% output.
METRIC_NAME VARCHAR The name of the metric evaluated for this record.
EVAL_AGG_SCORE NUMBER The evaluation score assigned for this record.
METRIC_TYPE VARCHAR The type of metric being evaluated. For built-in metrics, the value is `system`. For custom metrics, the value is `custom`.
METRIC_STATUS VARIANT A map containing information about the evaluation's HTTP response for this record, with the following keys:
- `status`: The HTTP status code of the response. - `message`: The HTTP message sent in the status response.
METRIC_CALLS ARRAY An array of VARIANT values that contain information about the computed metric. Each array entry contains the metric's criteria and an explanation of the metric score. The keys of each entry are:
- `criteria`: The criteria used to compute SQL correctness. - `explanation`: Details of the compared result sets.
Header Description
`Authorization` (Required) Authorization token. For more information, see [](/developer-guide/sql-api/authenticating).
`Content-Type` (Required) application/json
`X-Snowflake-Authorization-Token-Type` (Optional) Authorization token type. Defaults to OAuth. For more information, see [](/developer-guide/sql-api/authenticating).
Field Description
`messages[].role` (Required) The role of the entity that is creating the message. Currently only supports `user`. Type: string:enum Example: `user`
`messages[].content[]` (Required) The content object that is part of a message. Type: object
Example: ```json { "type": "text", "text": "Which company had the most revenue?" } ```
`messages[].content[].type` (Required) The content type. Currently only `text` is supported. Type: string:enum Example: `text`
`messages[].content[].text` (Required) The user's question. Type: string Example: `Which company had the most revenue?`
`semantic_model_file` Path to the semantic model YAML file. Must be a fully qualified stage URL including the database and schema. To specify multiple semantic models, use the `semantic_models` field. If you want to provide the YAML specification directly in the request instead, set the `semantic_model` field to the YAML specification for the semantic model. Type: string Example: `@my_db.my_schema.my_stage/my_semantic_model.yaml`
`semantic_model` A string containing the entire semantic model YAML. To specify multiple semantic models, use the `semantic_models` field instead. If you want to point to a YAML specification in a file instead, upload the file to a stage, and set the `semantic_model_file` field to the path to the file. Type: string
`semantic_models` An array containing JSON objects, each of which contains a `semantic_model_file` or `semantic_view` field. These fields have the same semantics as the top-level `semantic_model_file` and `semantic_view` fields: - `semantic_model_file` specifies a YAML file, stored in a stage, that contains a semantic model definition. (You cannot specify the YAML for the semantic model directly in the request with this form.) - `semantic_view` specifies the fully qualified name of a [semantic view](/user-guide/views-semantic/overview). For example: ```json { /* ... */ "semantic_models": [ {"semantic_view": "my_db.my_sch.my_sem_view_1" }, {"semantic_view": "my_db.my_sch.my_sem_view_2" } ] /* ... */ } ``` For each query, %cortex-analyst% chooses the most appropriate model or view from the list. This capability simplifies user interactions with %cortex-analyst%. You don't need to choose a data source to query, and you don't need to keep track of which semantic model or semantic view to use for each. Just specify all of your models or views with each query and let %cortex-analyst% figure out which one to use. Type: array %cortex-analyst% does not require that you specify more than one model or view. If you specify a single model or view, the request is functionally equivalent to one containing a top-level `semantic_model_file` or `semantic_view` field. The advantage of using `semantic_models` for single-model requests is that you can use the same client code, regardless of the number of models or views.
`semantic_view` Fully qualified name of the [semantic view](/user-guide/views-semantic/overview). For example: ```json { /* ... */ "semantic_view": "MY_DB.MY_SCHEMA.SEMANTIC_VIEW" /* ... */ } ``` If the name is case-sensitive or contains characters that are not allowed in an [unquoted identifier](/sql-reference/identifiers-syntax), you must enclose the name in backslash-escaped double quotes. For example, if the database name, schema name, and view name include hyphens (`my-database.my-schema.my-semantic-view`): ```json { /* ... */ "semantic_view": "\"my-database\".\"my-schema\".\"\"my-semantic-view\"\"" /* ... */ } ``` To specify multiple semantic views, use the `semantic_models` field. Type: string
`stream` (Optional) If set to `true`, the response is streamed to the client using [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events) as it is generated (see [](#label-cortex-analyst-rest-api-streaming)). Otherwise the complete response is returned after %cortex-analyst% has fully processed the user's question. Type: boolean
Code Description
200 The statement was executed successfully. The body of the response contains a message object that contains the following fields: - `message`: Messages of the conversation between the user and analyst. - `message` (object): Represents a message within a chat. - `message.role` (string:enum): The entity that produced the message. One of `user` or `analyst`. - `message.content[]` (object): The content object that is part of a message. - `message.content[].type` (string:enum): The content type of the message. One of `text`, `suggestion`, or `sql`. - `message.content[].text` (string): The text of the content. Only returned for content type `text`. - `message.content[].statement` (string): A SQL statement. Only returned for content type `sql`. - `message.content[].confidence` (object): Contains confidence-related information. Only returned for the `sql` content type. - `message.content[].confidence.verified_query_used` (object): Represents the verified query from Verified Query Repository used in SQL response generation. If no verified query used, the field value is `null`. - `message.content[].confidence.verified_query_used.name` (string): The name of the verified query used, extracted from the Verified Query Repository. - `message.content[].confidence.verified_query_used.question` (string): The question that is answered by the verified query, extracted from the Verified Query Repository. - `message.content[].confidence.verified_query_used.sql` (string): The SQL statement of the verified query used, extracted from the Verified Query Repository. - `message.content[].confidence.verified_query_used.verified_at` (integer): The numeric representation of the timestamp when the query is verified, extracted from the Verified Query Repository. - `message.content[].confidence.verified_query_used.verified_by` (string): The person who verified the query, extracted from the Verified Query Repository. - `message.content[].suggestions` (string): If SQL cannot be generated, a list of questions the semantic model can generate SQL for. Only returned for content type `suggestion`. - `warnings`: List of warnings from the analyst about the user's request. - `warnings[].message` (string): Contains a detailed description of one individual warning. - `response_metadata` (object): Metadata containing response generation details. - `response_metadata.model_names`: List of models used to generate response. - `response_metadata.cortex_search_retrieval` (object): Entities resolved with cortex search. - `response_metadata.question_category` (string): How the question in the request is categorized.
Header Description
`Authorization` (Required) Authorization token. For more information, see [](/developer-guide/sql-api/authenticating).
`Content-Type` (Required) application/json
Field Description
`request_id` (Required) The id of the request that you've made to send a message. Returned in the `request_id` field of `/api/v2/cortex/analyst/message`. For more information, see [](#label-cortex-analyst-rest-api-response). Type: string Example: `75d343ee-699c-483f-83a1-e314609fb563`
`positive` (Required) Whether the feedback is positive or negative. `true` for positive or "thumbs up", `false` for negative or "thumbs down". Type: boolean Example: `true`
`feedback_message` (Optional) The feedback message from the user. Example: `This is the best answer I've ever seen!`
**Chat Completions API** **Messages API**
Compatibility [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat/create) [Anthropic Messages API](https://docs.anthropic.com/en/api/messages)
Endpoint `/api/v2/cortex/v1/chat/completions` `/api/v2/cortex/v1/messages`
Supported models All models (OpenAI, Claude, Llama, Mistral, DeepSeek, Snowflake) Claude models only
SDK support OpenAI Python and JavaScript SDKs Anthropic Python SDK
Best for Most use cases; multi-model flexibility Existing Anthropic integrations; Anthropic API parity
Model Cross Cloud (Any Region) AWS Global (Cross-Region) AWS US (Cross-Region) AWS EU (Cross-Region) AWS APJ (Cross-Region) Azure Global (Cross-Region) Azure US (Cross-Region) Azure EU (Cross-Region)
`claude-opus-4-7` \* \* \* \*
`claude-sonnet-4-6` %cm% %cm% %cm% %cm% %cm%
`claude-opus-4-6` %cm% %cm% %cm% %cm%
`claude-sonnet-4-5` %cm% %cm% %cm% %cm% %cm%
`claude-opus-4-5` %cm% %cm% %cm% %cm%
`claude-haiku-4-5` %cm% %cm% %cm% %cm% %cm%
`claude-4-sonnet` %cm% %cm% %cm% %cm% %cm%
`openai-gpt-5.4` \* \* \*
`openai-gpt-5.2` %cm% %cm% %cm%
`openai-gpt-5.1` %cm% %cm% %cm% %cm%
`openai-gpt-5` \* \* \* \*
`openai-gpt-5-mini` \* \* \*
`openai-gpt-5-nano` \* \* \*
`openai-gpt-4.1` %cm% %cm% %cm%
`llama4-maverick` %cm% %cm% %cm%
`llama3.1-8b` %cm% %cm% %cm% %cm% %cm% %cm% %cm% %cm%
`llama3.1-70b` %cm% %cm% %cm% %cm% %cm% %cm% %cm% %cm%
`llama3.1-405b` %cm% %cm% %cm% %cm% %cm%
`deepseek-r1` %cm% %cm% %cm%
`mistral-7b` %cm% %cm%
`mistral-large` %cm% %cm%
`mistral-large2` %cm% %cm% %cm% %cm% %cm% %cm% %cm% %cm%
`snowflake-llama-3.3-70b` %cm% %cm% %cm%
Model AWS US West 2 (Oregon) AWS US East 1 (N. Virginia) Azure East US 2 (Virginia)
`llama4-maverick` %cm%
`llama3.1-8b` %cm% %cm% %cm%
`llama3.1-70b` %cm% %cm% %cm%
`llama3.1-405b` %cm% %cm% %cm%
`deepseek-r1` %cm%
`mistral-7b` %cm% %cm% %cm%
`mistral-large` %cm% %cm% %cm%
`mistral-large2` %cm% %cm% %cm%
`snowflake-llama-3.3-70b` %cm%
Model AWS Europe Central 1 (Frankfurt) AWS Europe West 1 (Ireland) Azure West Europe (Netherlands)
`llama3.1-8b` %cm% %cm%
`llama3.1-70b` %cm% %cm% %cm%
`mistral-7b` %cm% %cm%
`mistral-large` %cm% %cm%
`mistral-large2` %cm% %cm% %cm%
Model AWS AP Southeast 2 (Sydney) AWS AP Northeast 1 (Tokyo)
`llama3.1-8b` %cm%
`llama3.1-70b` %cm% %cm%
`mistral-7b` %cm%
`mistral-large` %cm%
`mistral-large2` %cm% %cm%
Effort level Behavior
`max` Always thinks with no constraints on thinking depth.
`high` (default) Always thinks. Provides deep reasoning on complex tasks.
`medium` Moderate thinking. May skip thinking for very simple queries.
`low` Minimizes thinking. Skips thinking for simple tasks where speed matters most.
Beta header value Feature
`token-efficient-tools-2025-02-19` Token-efficient tools
`interleaved-thinking-2025-05-14` Interleaved thinking
`output-128k-2025-02-19` Enables output tokens up to 128K
`dev-full-thinking-2025-05-14` Developer mode for raw thinking on Claude 4+ models
`context-management-2025-06-27` Context management
`effort-2025-11-24` Effort parameter for thinking
`tool-search-tool-2025-10-19` Tool search tool
`tool-examples-2025-10-29` Tool use examples
Field OpenAI Models Claude Models Other Models
`model` %cm% Supported %cm% Supported %cm% Supported
`messages` See sub-fields See sub-fields See sub-fields
`messages[].audio` %xm% Error %xm% Ignored %xm% Ignored
`messages[].role` %cm% Supported %cm% Only user/assistant/system %cm% Only user/assistant/system
`messages[].content` (string) %cm% Supported %cm% Supported %cm% Supported
`messages[].content[]` (array) See sub-fields See sub-fields See sub-fields
`messages[].content[].text` %cm% Supported %cm% Supported %cm% Supported
`messages[].content[].type` %cm% Supported %cm% Supported %cm% Supported
`messages[].content[].image_url` %cm% Supported %cm% Supported %xm% Error
`messages[].content[].cache_control` %xm% Ignored %cm% Supported (ephemeral only) %xm% Ignored
`messages[].content[].file` %xm% Error %xm% Error %xm% Ignored
`messages[].content[].input_audio` %xm% Error %xm% Ignored %xm% Ignored
`messages[].content[].refusal` %cm% Supported %xm% Ignored %xm% Ignored
`messages[].function_call` %cm% Supported (deprecated) %xm% Ignored %xm% Ignored
`messages[].name` %cm% Supported %xm% Ignored %xm% Ignored
`messages[].refusal` %cm% Supported %xm% Ignored %xm% Ignored
`messages[].tool_call_id` %cm% Supported %cm% Supported %xm% Ignored
`messages[].tool_calls` %cm% Supported %cm% Only `function` tools %xm% Ignored
`messages[].reasoning_details` %xm% Ignored %cm% OpenRouter format `reasoning.text` %xm% Ignored
`audio` %xm% Error %xm% Ignored %xm% Ignored
`frequency_penalty` %cm% Supported %xm% Ignored %xm% Ignored
`logit_bias` %cm% Supported %xm% Ignored %xm% Ignored
`logprobs` %cm% Supported %xm% Ignored %xm% Ignored
`max_tokens` %xm% Error (deprecated) %xm% Error (deprecated) %xm% Error (deprecated)
`max_completion_tokens` %cm% Supported (4096 default, 131072 max) %cm% Supported (4096 default, 131072 max) %cm% Supported (4096 default, 131072 max)
`metadata` %xm% Ignored %xm% Ignored %xm% Ignored
`modalities` %xm% Ignored %xm% Ignored %xm% Ignored
`n` %cm% Supported %xm% Ignored %xm% Ignored
`parallel_tool_calls` %cm% Supported %xm% Ignored %xm% Ignored
`prediction` %cm% Supported %xm% Ignored %xm% Ignored
`presence_penalty` %cm% Supported %xm% Ignored %xm% Ignored
`prompt_cache_key` %cm% Supported %xm% Ignored %xm% Ignored
`reasoning_effort` %cm% Supported %xm% Ignored (use `reasoning` object) %xm% Ignored
`reasoning` See sub-fields See sub-fields See sub-fields
`reasoning.effort` %cm% Supported (overrides `reasoning_effort`) %cm% Converted to `reasoning.max_tokens` %xm% Ignored
`reasoning.max_tokens` %xm% Ignored %cm% Supported %xm% Ignored
`response_format` %cm% Supported %cm% Only `json_schema` %xm% Ignored
`safety_identifier` %xm% Ignored %xm% Ignored %xm% Ignored
`service_tier` %xm% Error %xm% Error %xm% Error
`stop` %cm% Supported %xm% Ignored %xm% Ignored
`store` %xm% Error %xm% Error %xm% Error
`stream` %cm% Supported %cm% Supported %cm% Supported
`stream_options` See sub-fields See sub-fields See sub-fields
`stream_options.include_obfuscation` %xm% Ignored %xm% Ignored %xm% Ignored
`stream_options.include_usage` %cm% Supported %cm% Supported %cm% Supported
`temperature` %cm% Supported %cm% Supported %cm% Supported
`tool_choice` %cm% Supported %cm% Only `function` tools %xm% Ignored
`tools` %cm% Supported %cm% Only `function` tools %xm% Error
`top_logprobs` %cm% Supported %xm% Ignored %xm% Ignored
`top_p` %cm% Supported %cm% Supported %cm% Supported
`verbosity` %cm% Supported %xm% Ignored %xm% Ignored
`web_search_options` %xm% Error %xm% Ignored %xm% Ignored
Field OpenAI Models Claude Models Other Models
`id` %cm% Supported %cm% Supported %cm% Supported
`object` %cm% Supported %cm% Supported %cm% Supported
`created` %cm% Supported %cm% Supported %cm% Supported
`model` %cm% Supported %cm% Supported %cm% Supported
`choices` See sub-fields See sub-fields See sub-fields
`choices[].index` %cm% Supported %cm% Single choice only %cm% Single choice only
`choices[].finish_reason` %cm% Supported %xm% Not supported %cm% Only `stop`
`choices[].logprobs` %cm% Supported %xm% Not supported %xm% Not supported
`choices[].message` (non-streaming) See sub-fields See sub-fields See sub-fields
`choices[].message.content` %cm% Supported %cm% Supported %cm% Supported
`choices[].message.role` %cm% Supported %cm% Supported %cm% Supported
`choices[].message.refusal` %cm% Supported %xm% Not supported %xm% Not supported
`choices[].message.annotations` %xm% Not supported %xm% Not supported %xm% Not supported
`choices[].message.audio` %xm% Not supported %xm% Not supported %xm% Not supported
`choices[].message.function_call` %cm% Supported %xm% Not supported %xm% Not supported
`choices[].message.tool_calls` %cm% Supported %cm% Only `function` tools %xm% Not supported
`choices[].message.reasoning` %xm% Not supported %cm% OpenRouter format %xm% Not supported
`choices[].delta` (streaming) See sub-fields See sub-fields See sub-fields
`choices[].delta.content` %cm% Supported %cm% Supported %cm% Supported
`choices[].delta.role` %cm% Supported %xm% Not supported %xm% Not supported
`choices[].delta.refusal` %cm% Supported %xm% Not supported %xm% Not supported
`choices[].delta.function_call` %cm% Supported %xm% Not supported %xm% Not supported
`choices[].delta.tool_calls` %cm% Supported %cm% Only `function` tools %xm% Not supported
`choices[].delta.reasoning` %xm% Not supported %cm% OpenRouter format %xm% Not supported
`usage` See sub-fields See sub-fields See sub-fields
`usage.prompt_tokens` %cm% Supported %cm% Supported %cm% Supported
`usage.completion_tokens` %cm% Supported %cm% Supported %cm% Supported
`usage.total_tokens` %cm% Supported %cm% Supported %cm% Supported
`usage.prompt_tokens_details` See sub-fields See sub-fields See sub-fields
`usage.prompt_tokens_details.audio_tokens` %xm% Not supported %xm% Not supported %xm% Not supported
`usage.prompt_tokens_details.cached_tokens` %cm% Only cache reads %cm% Cache read + write %xm% Not supported
`usage.completion_tokens_details` See sub-fields See sub-fields See sub-fields
`usage.completion_tokens_details.accepted_prediction_tokens` %cm% Supported %xm% Not supported %xm% Not supported
`usage.completion_tokens_details.audio_tokens` %xm% Not supported %xm% Not supported %xm% Not supported
`usage.completion_tokens_details.reasoning_tokens` %cm% Supported %xm% Not supported %xm% Not supported
`usage.completion_tokens_details.rejected_prediction_tokens` %cm% Supported %xm% Not supported %xm% Not supported
`service_tier` %cm% Supported %xm% Not supported %xm% Not supported
`system_fingerprint` %cm% Supported %xm% Not supported %xm% Not supported
Header Support
`Authorization` %cm% Required
`Content-Type` %cm% Supported (`application/json`)
`Accept` %cm% Supported (`application/json`, `text/event-stream`)
Header Support
`openai-organization` %xm% Not supported
`openai-version` %xm% Not supported
`openai-processing-ms` %xm% Not supported
`x-ratelimit-limit-requests` %xm% Not supported
`x-ratelimit-limit-tokens` %xm% Not supported
`x-ratelimit-remaining-requests` %xm% Not supported
`x-ratelimit-remaining-tokens` %xm% Not supported
`x-ratelimit-reset-requests` %xm% Not supported
`x-ratelimit-reset-tokens` %xm% Not supported
`retry-after` %xm% Not supported
Input data classification Output data classification Designation
%input-data% %output-data% Generally available functions are Covered AI Features. Preview functions are Preview AI Features.{" "} [1]
Input data classification Output data classification
%input-data% %output-data%
Type Parameter value Routing behavior Best for
Global `ANY_REGION` Requests can be processed in any Snowflake-supported region, across any cloud provider. Broadest model selection, highest throughput, maximum resilience, lowest cost.
Cloud-specific `AWS_GLOBAL`, `AZURE_GLOBAL`, `GCP_GLOBAL`, or comma-separated combinations Requests stay within a designated cloud provider (for example, AWS or Azure). Organizations that require data to remain within a specific cloud provider's network while getting the lowest cost, higher throughput, and added resiliency.
Regional `AWS_US`, `AWS_EU`, `AZURE_US`, `AZURE_EU`, or comma-separated combinations Requests stay within designated cloud provider regions (for example, AWS US or Azure EU). Organizations that require data to remain within a specific cloud provider's network and geography.
Disabled `DISABLED` Requests are processed only in your account's home region. Strict data residency or geographic sovereignty requirements. Use cases not needing frontier AI models.
Generic family Resolves to
`sans-serif` Arial (Windows/macOS), DejaVu Sans or Liberation Sans (Linux)
`serif` Times New Roman (Windows/macOS), DejaVu Serif or Liberation Serif (Linux)
`monospace` Courier New (Windows/macOS), DejaVu Sans Mono or Liberation Mono (Linux)
Property Where it applies
`title.font`, `title.fontSize`, `title.fontWeight`, `title.fontStyle` Chart title
`axis.labelFont`, `axis.labelFontSize` Axis tick labels
`axis.titleFont`, `axis.titleFontSize` Axis titles (for example, "Revenue")
`header.labelFont`, `header.labelFontSize` Facet / small-multiple headers
`legend.labelFont`, `legend.labelFontSize` Legend value labels
`legend.titleFont`, `legend.titleFontSize` Legend title
`mark.font` Text marks (annotations)
Format Output example Use for
`$,.0f` $1,234,567 Dollar amounts, no decimals
`$,.2f` $1,234,567.89 Dollar amounts, 2 decimals
`,.0f` 1,234,567 Large integers with thousands separator
`.1%` 42.3% Percentages
`.2s` 1.2M Large numbers with SI prefix
`.2f` 3.14 Fixed 2 decimal places
Mode Behavior
`override` (default) Template values overwrite the chart. Use when you need to enforce a specific setting.
`extend` Existing chart values are preserved. New keys and additional scale entries are added. Use when you want to add to the chart without replacing what the LLM chose.
Category Notes
NAME Recognizes full name, first name, middle name, and last name
EMAIL
PHONE_NUMBER
DATE_OF_BIRTH
GENDER Recognizes male, female, and nonbinary
AGE
ADDRESS Identifies: - complete postal address (US, UK, CA) - street address (US, UK, CA) - postal code (US, UK, CA) - city (US, UK, CA) - state (US) or province (CA) - county, borough, or township (US)
NATIONAL_ID Identifies Social Security numbers (US)
PASSPORT Identifies passport numbers (US, UK, CA)
TAX_IDENTIFIER Identifies Individual Taxpayer Numbers (ITNs)
PAYMENT_CARD_DATA Identifies complete card information, card number, expiration date, and CVV
DRIVERS_LICENSE Identifies US, UK, and CA licenses
IP_ADDRESS
MESSAGE SPANS RESULT
My old manager, Washington, used to live in Washington. His first name was Mike. ```json { "spans": [ {"category": "NAME", "end": 26, "start": 16, "text": "Washington" }, {"category": "ADDRESS", "end": 54, "start": 44, "text": "Washington" }, {"category": "NAME", "end": 79, "start": 75, "text": "Mike" } ] } ``` My old manager, [NAME], used to live in [ADDRESS]. His first name was Mike.
Input data classification Output data classification Designation
%input-data% %output-data% Generally available functions are Covered AI Features. Preview functions are Preview AI Features.{" "} [1]
Input data classification Output data classification
%input-data% %output-data%
Cloud platform Cloud region
Amazon Web Services (AWS) - US East (N. Virginia) - US East (Ohio) - US West (Oregon) - Canada (Central) - South America (Sao Paulo) - Europe (London) - EU (Stockholm) - EU (Ireland) - EU (Frankfurt) - Asia Pacific (Mumbai) - Asia Pacific (Tokyo) - Asia Pacific (Seoul) - Asia Pacific (Sydney) - Asia Pacific (Jakarta)
Microsoft Azure - East US 2 (Virginia) - West US 2 (Washington) - South Central US (Texas) - Canada Central (Toronto) - UK South (London) - North Europe (Ireland) - West Europe (Netherlands) - Southeast Asia (Singapore) - UAE North (Dubai) - Australia East (New South Wales) - Central India (Pune) - Japan East (Tokyo)
Google Cloud - US East4 (N. Virginia) - US Central1 (Iowa) - Europe West2 (London) - Europe West3 (Frankfurt) - Europe West4 (Netherlands)
Example question Answer
What is the date of this agreement? `'October 6, 2023'`
Who is the buyer of the condo? `'John Davis', 'Jane Davis'`
What home appliances are included with the unit? `'stove/range', 'refrigerator', 'washer', 'dishwasher', 'attached television(s)', 'microwave'`
What items are not included with the flat? `'dryer', 'security system', 'satellite dish', 'wood stove', 'fireplace insert', 'hot tub', 'attached speaker(s)', 'generator'`
Is there a dryer in the flat? `'No'`
What addenda are attached to this purchase and sale agreement? `'22A (Financing)', '2AA (Appraisal)', '22FSBO (Owner Sale)'`
What is the seller's fax number? None
Is the buyer's signature present on the form? `'No'`
What is the MLS number? `'59844680'`
What is the property's address? `'604 Bishop Crossing Land, Fort Lauderdale, Broward County, FL, 33338'`
Input data classification Output data classification Designation
%input-data% %output-data% Generally available functions are Covered AI Features. Preview functions are Preview AI Features.{" "} [1]
Input data classification Output data classification
%input-data% %output-data%
Parameter Description
`database` (Required) Identifier for the database to which the resource belongs. You can use the /api/v2/databases GET request to get a list of available databases.
`schema` (Required) Identifier for the schema to which the resource belongs. You can use the /api/v2/databases/\{database\}/schemas GET request to get a list of available schemas for the specified database.
`name` (Required) Identifier for the agent.
Header Description
`Authorization` (Required) Authorization token. For more information, see [](#label-chat-api-authenticate-example).
`Content-Type` (Required) application/json
Field Type Description
`orig_request_id` string Request ID for the message associated with the feedback. If this value is not set, then feedback is logged for the agent.
`positive` boolean Whether the response was good (`true`) or bad (`false`).
`feedback_message` string The text for the detailed feedback message.
`categories` array of strings List of categories for the feedback. Each category is a string that represents a specific category of feedback.
`thread_id` integer The id of the thread.
Header Description
`X-Snowflake-Request-ID` Unique ID of the API request.
Input data classification Output data classification Designation
%input-data% %output-data% Generally available functions are Covered AI Features. Preview functions are Preview AI Features.{" "} [1]
Input data classification Output data classification
%input-data% %output-data%
File Prompt Response
`file1.pdf` `{"date": "What is the date?", "total": "What is the total amount?"}` `{"date": "2024-06-30", "total": "82.50"}`
`file2.pdf` `[["invoice_number", "What is the invoice number?"], ["vendor", "What is the vendor name?"]]` `{"invoice_number": "543433434", "vendor": "Example Corp"}`
`file3.pdf` ```text { "schema": { "type": "object", "properties": { "deductions": { "description": "Deductions", "type": "object", "properties": { "deductions_name": { "type": "array" }, "current": { "type": "array" } } } } } } ``` ```text { "deductions": { "deductions_name": [ "Federal Tax", "Wyoming State Tax", "SDI", "Soc Sec / OASDI", "Health Insurance Tax", "None" ], "current": [ "82.50", "64.08", "None", "13.32", "91.74", "21.46" ] } } ```