| Views |
Auto-generated `_VIEW` with all issue fields flattened.
| No views created. Data is directly queryable from the destination tables. |
Any queries that reference the legacy `ISSUE` column (for example, `SELECT issue:fields:summary`) or the
auto-generated `_VIEW` must be rewritten to use the new column names directly (for example, `SELECT SUMMARY`).
### Parameter changes
The following parameters from the legacy connector are not available in the new connector:
| Legacy parameter |
Current equivalent |
| Search Type |
Removed. The new connector always fetches all issues from discovered projects. Use `Project Keys Filter` to limit ingestion to specific projects. |
| JQL Query |
Removed. The new connector doesn't support arbitrary JQL for issue filtering. Use `Project Keys Filter` instead. |
| Project Names |
Replaced by `Project Keys Filter`, which accepts project keys (not names or IDs). |
| Status Category |
Removed. The new connector fetches all issues regardless of status. |
| Updated After |
Removed. The new connector manages incremental state automatically. |
| Created After |
Removed. The new connector manages incremental state automatically. |
| Destination Table |
Removed. The new connector creates fixed table names per entity (`ISSUE`, `PROJECT`, `COMMENT`, and others) in the configured destination schema. |
| Fetch All Worklogs |
Removed. The new connector fetches all worklogs into a separate `WORKLOG` table by default when `WORKLOG` is listed in `Enabled Tables`. |
| Connection Method |
Not exposed as a parameter. The new connector uses the `DIRECT` connection method. |
The following parameters are introduced in the new connector:
| Parameter |
Description |
| Deletes Fetch Strategy |
Enables tracking of deleted issues via the Jira audit log. Not available in the legacy connector. |
| Merge Interval |
Time interval between journal-to-destination merge operations. Available in both the core flow and the agile flow. |
Additionally, agile data (boards, sprints, and board mappings) is now available through a separate
agile flow rather than a parameter toggle. See [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-agile) for details on installing and configuring
the agile flow.
### API token scopes
If you're using API tokens with scopes, the new connector may require additional scopes
depending on the features you enable. See [](#label-jira-core-api-scopes) for the core flow scopes and [](#label-jira-agile-api-scopes) for the agile flow scopes.
### Snowflake privileges
The new connector requires only `CREATE TABLE` on the destination schema. The legacy
connector additionally required `CREATE VIEW` to create flattened issue views. The new
connector doesn't create views, so the `CREATE VIEW` privilege is no longer needed. If you're
reusing an existing role, you can revoke `CREATE VIEW` after the legacy connector is
decommissioned.
## Migration steps
1. **Set up the new connector.** Install the core flow on the same
or a different Openflow runtime. If you need agile data, also install the agile flow.
Configure both flows to write to a **different destination schema** than the one used by the legacy
connector. This allows the legacy and new connectors to run simultaneously.
2. **Map your legacy configuration to the new parameters.**
- Copy the `Jira Email`, `Jira API Token`, and `Environment URL` values from the legacy connector
to the new core flow. If using the agile flow, configure these values separately for that flow as well.
- If the legacy connector uses `Project Names`, convert them to project keys for the
`Project Keys Filter` parameter.
- If the legacy connector uses a `JQL Query`, evaluate whether `Project Keys Filter` covers your
use case. If your JQL filters by criteria other than project (for example, status or custom fields),
those filters aren't available in the new connector. All matching issues from the configured
projects are ingested.
- Set `Issue Fields` to match your previous configuration. The default changed from `*all` (legacy)
to `*standard`.
- Configure the Snowflake destination parameters (database, schema, warehouse, credentials) for each flow.
3. **Start the new connector.** Run the core flow and allow the initial load to complete.
If using the agile flow, start it as well.
4. **Validate the data.** Compare the data in the new destination tables against the legacy destination
table to check for completeness. Expect some differences: the legacy connector didn't track deletes,
so issues that were deleted in Jira still appear in the legacy table but not in the new `ISSUE`
table (or they appear with `_SNOWFLAKE_DELETED = TRUE` if delete tracking is enabled). Row counts
will not match exactly when any issues have been deleted.
```sql
-- Compare issue counts (expect differences if issues were deleted in Jira)
SELECT COUNT(*) AS legacy_count FROM legacy_schema.JIRA_ISSUES;
SELECT COUNT(*) AS new_count FROM new_schema.ISSUE;
-- Spot-check specific issues
SELECT KEY, SUMMARY, STATUS FROM new_schema.ISSUE WHERE KEY = 'PROJ-123';
```
5. **Update downstream queries.** Rewrite any queries, views, dashboards, or pipelines that reference
the legacy table structure. Key changes:
- Replace references to the legacy `ISSUE` `OBJECT` column or `_VIEW` with direct column references.
- Replace `FLATTEN`-based queries with standard `SELECT` statements.
- Add `JOIN` statements to combine data across the new entity tables (for example, join `ISSUE`
with `COMMENT` on `ISSUE_ID`).
- If you want queries to ignore deleted issues, filter on the new `_SNOWFLAKE_DELETED` column
(`WHERE _SNOWFLAKE_DELETED = FALSE`). The legacy connector didn't track deletes at all, so
legacy queries against `JIRA_ISSUES` returned issues that had since been removed in Jira.
6. **Stop the legacy connector.** Once you've confirmed that the new data is complete and downstream
consumers have been updated, stop the legacy connector process group. Both new flows (core and agile)
can continue running independently.
7. **Clean up.** Optionally, drop the legacy destination table and view after confirming they're no
longer needed.
When the legacy connector and the new connector use the same Jira API token, they share the
same Jira API rate limits. Running both simultaneously roughly doubles the API call volume, which
may cause rate limiting on Jira instances with heavy API usage. Consider reducing the legacy
ingestion frequency during the migration period, or run the new connector with a separate
API token whose rate budget you can manage independently.
---
title: ModifyBytes 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/modifybytes.md
section: Loading & Unloading Data
---
# ModifyBytes 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Discard byte range at the start and end or all content of a binary file.
## Tags
binary, discard, keep
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| End Offset |
Number of bytes removed at the end of the file. |
| Remove All Content |
Remove all content from the FlowFile superseding Start Offset and End Offset properties. |
| Start Offset |
Number of bytes removed at the beginning of the file. |
## Relationships
| Name |
Description |
| success |
Processed flowfiles. |
---
title: ModifyCompression 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/modifycompression.md
section: Loading & Unloading Data
---
# ModifyCompression 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-compress-nar
## Description
Changes the compression algorithm used to compress the contents of a FlowFile by decompressing the contents of FlowFiles using a user-specified compression algorithm and recompressing the contents using the specified compression format properties. This processor operates in a very memory efficient way so very large objects well beyond the heap size are generally fine to process
## Tags
brotli, bzip2, compress, content, deflate, gzip, lz4-framed, lzma, recompress, snappy, snappy framed, snappy-hadoop, xz-lzma2, zstd
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Input Compression Strategy |
The strategy to use for decompressing input FlowFiles |
| Output Compression Level |
The compression level for output FlowFiles for supported formats. A lower value results in faster processing but less compression; a value of 0 indicates no (that is, simple archiving) for gzip or minimal for xz-lzma2 compression. Higher levels can mean much larger memory usage such as the case with levels 7-9 for xz-lzma/2 so be careful relative to heap size. |
| Output Compression Strategy |
The strategy to use for compressing output FlowFiles |
| Output Filename Strategy |
Processing strategy for filename attribute on output FlowFiles |
## Relationships
| Name |
Description |
| failure |
FlowFiles will be transferred to the failure relationship on compression modification errors |
| success |
FlowFiles will be transferred to the success relationship on compression modification success |
## Writes attributes
| Name |
Description |
| mime.type |
The appropriate MIME Type is set based on the value of the Compression Format property. If the Compression Format is 'no compression' this attribute is removed as the MIME Type is no longer known. |
---
title: MongoDBControllerService
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/mongodbcontrollerservice.md
section: Loading & Unloading Data
---
# MongoDBControllerService
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Provides a controller service that configures a connection to MongoDB and provides access to that connection to other Mongo-related components.
## Tags
mongo, mongodb, service
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Database User |
Database User |
|
|
Database user name |
| Mongo URI * |
Mongo URI |
|
|
MongoURI, typically of the form: mongodb://host1[:port1][,host2[:port2],...] |
| Password |
Password |
|
|
The password for the database user |
| SSL Context Service |
SSL Context Service |
|
|
The SSL Context Service used to provide client certificate information for TLS/SSL connections. |
| Write Concern * |
Write Concern |
ACKNOWLEDGED |
- ACKNOWLEDGED
- UNACKNOWLEDGED
- FSYNCED
- JOURNALED
- REPLICA_ACKNOWLEDGED
- MAJORITY
- W1
- W2
- W3
|
The write concern to use |
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: MongoDBLookupService
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/mongodblookupservice.md
section: Loading & Unloading Data
---
# MongoDBLookupService
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Provides a lookup service based around MongoDB. Each key that is specified will be added to a query as-is. For example, if you specify the two keys, user and email, the resulting query will be \{ "user": "tester", "email": "[tester@test.com](mailto:tester@test.com)" \}. The query is limited to the first result (findOne in the Mongo documentation). If no "Lookup Value Field" is specified then the entire MongoDB result document minus the _id field will be returned as a record.
## Tags
lookup, mongo, mongodb, record
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Schema Access Strategy * |
Schema Access Strategy |
infer |
- Use 'Schema Name' Property
- Use 'Schema Text' Property
- Infer from Result
|
Specifies how to obtain the schema that is to be used for interpreting the data. |
| Schema Branch |
Schema Branch |
|
|
Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored. |
| Schema Name |
Schema Name |
$\{schema.name\} |
|
Specifies the name of the schema to lookup in the Schema Registry property |
| Schema Registry |
Schema Registry |
|
|
Specifies the Controller Service to use for the Schema Registry |
| Schema Text |
Schema Text |
$\{avro.schema\} |
|
The text of an Avro-formatted Schema |
| Schema Version |
Schema Version |
|
|
Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved. |
| Mongo Collection Name * |
mongo-collection-name |
|
|
The name of the collection to use |
| Mongo Database Name * |
mongo-db-name |
|
|
The name of the database to use |
| Client Service * |
mongo-lookup-client-service |
|
|
A MongoDB controller service to use with this lookup service. |
| Projection |
mongo-lookup-projection |
|
|
Specifies a projection for limiting which fields will be returned. |
| Lookup Value Field |
mongo-lookup-value-field |
|
|
The field whose value will be returned when the lookup key(s) match a record. If not specified then the entire MongoDB result document minus the _id field will be returned as a record. |
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: Monitor connectors using the Openflow Connectors Dashboard
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors-dashboard.md
section: Loading & Unloading Data
---
" />
# Monitor connectors using the Openflow Connectors Dashboard
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/monitor)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
The Openflow Connectors Dashboard provides a high-level view of all installed connectors, health snapshots,
and key performance indicators, such as the aggregated average throughput and total data ingested by all connectors matching
the filter criteria.
## Prerequisites
To use the Openflow Connectors Dashboard, the following prerequisites must be met:
- You need at least read-only permissions on the event table.
- You must have the following minimum Openflow versions:
- BYOC deployment: 1.36.0
- Snowflake deployment: 1.26.0
- Runtime: 2026.3.17.13
- You must have the following minimum connector versions. These versions apply to change data capture (CDC) connectors only.
Other connector types don't have a minimum version requirement for dashboard support.
| Connector |
Minimum version |
| MySQL |
0.33.0 |
| PostgreSQL |
0.39.0 |
| MongoDB |
0.17.0 |
| SQL Server |
0.27.0 |
| Oracle Embedded License |
0.25.0 |
| Oracle Independent License |
0.24.0 |
See [](/user-guide/data-integration/openflow/version-history) for more information.
## Access the Openflow Connectors Dashboard
1. Sign in to %sf-web-interface-link%.
2. In the navigation menu, select **Ingestion** %raa% **Openflow** and navigate to the **Connector Observability** tab.
The Openflow Connectors Dashboard appears.
## The Openflow Connectors Dashboard overview
The Openflow Connectors Dashboard displays the following information:
- **Status**
-
Shows the number of connectors with the following statuses:
- **Healthy**: Didn't encounter any errors during the selected time period.
- **Unhealthy**: Logged errors in the event table during the selected time period or has one or more tables in
**Failed** state (change data capture (CDC) connectors only).
- **Upgrade required**: Openflow deployment, runtime, or connector aren't running the minimum required versions
to display health and performance metrics. Review the version prerequisites and upgrade as needed.
- **Average throughput**
-
Measures the rate at which data is read from source systems and sent to Snowflake across all connectors.
- The **Average throughput** %raa% **Ingested** metric measures how fast data is sent to Snowflake across all connectors that match
the primary filter criteria (time frame and event table).
- The **Average throughput** %raa% **Read** metric measures how fast Openflow reads data from source systems across all connectors that match
the primary filter criteria (time frame and event table).
- **Total data ingested**
-
Shows how much data all connectors that match the primary filter criteria for time frame and event table have sent to Snowflake during the selected time period.
Use this metric to quickly identify ingestion anomalies over a specific time period.
For custom telemetry queries beyond the dashboard, see [](/user-guide/data-integration/openflow/monitor).
- **Total data ingested** and **Average throughput** metrics include both raw payload and structural overhead such as JSON keys, braces,
and delimiters. Because these metrics track the total transmitted volume, these figures might be higher than the uncompressed data reported
by Snowpipe Streaming or the final storage volume in your destination table.
- The connectors appear in the list if they match the selected filter criteria and have recorded telemetry events during the selected time frame.
- If you examine longer time frames, the list might show connectors that were previously deleted.
For example, you deployed a connector six days ago, and then deleted that connector two days ago. If you set the time frame to **Last 7 days**,
the connector appears in the list because it recorded telemetry events in the last 7 days.
### Filtering connectors
The Openflow Connectors Dashboard supports the following filters:
- **Event table**
-
The Openflow connectors event table you want to monitor. This filter displays event tables that are associated with at least one Openflow deployment,
as well as the default event table and the account event table. You can select only one event table at a time. Event table views are also supported.
The event table is set when you set up Openflow.
To view the event table associated with an Openflow deployment, use the [](/sql-reference/sql/desc-oflow-data-plane-integration) command.
See [Set up Openflow - Snowflake Deployment](/user-guide/data-integration/openflow/setup-openflow-spcs-deployment) or
[](/user-guide/data-integration/openflow/setup-openflow-byoc) for more information on configuring event tables.
- Time frame
-
Use this filter to identify relevant connectors in a specific time frame.
To get the most up-to-date results about the connector health, select the **Last Hour** time period.
- **Status**
-
Enables filtering for **Healthy**, **Unhealthy**, or **All** connectors.
- **Source**
-
Enables filtering by the source system based on known deployed connectors. The filter only shows sources that are used by your connectors.
- **Deployment**
-
Enables filtering by Snowflake Openflow deployments.
This filter displays data plane integration names, which are composed of the prefix `OPENFLOW_DATAPLANE_` followed by the deployment ID.
To find the deployment ID, navigate to Openflow, select the **Deployments** tab, then select **View Details**.
- **Runtime**
-
Enables filtering by Snowflake Openflow runtimes.
This filter displays the runtime keys. To match runtime keys with Openflow runtime names in the UI, navigate to Openflow, select the **Runtimes** tab, then
select **View Details**, and find the corresponding key.
- **Type**
-
Enables filtering by connector type: Databases, SaaS, Streaming, Unstructured, Other.
- Primary filters (event table and time frame) are applied before secondary filters (status, source, deployment, runtime, or type).
- The secondary filters (status, source, deployment, runtime, type) don't apply to the throughput and data ingested visuals.
## Monitoring Openflow connectors
To monitor the connector details, select %vertical-more-icon% %raa% **View Details**.
### Change data capture connectors
The details page shows the following information for each table that is part of the change data capture configuration:
- **Table replication status**
-
Tables can either be in **Active** or **Failed** replication status. The replication status is based on the most recent telemetry event
that is available for the table. Events that cause replication to fail for a table immediately result in a **Failed** replication
status in the dashboard. Use the **Failure Reason** message to identify the issue.
- **Error distribution**
-
Helps you understand when the connector experienced issues, so that you can identify any potential problems with source systems,
connector configuration, or the Snowflake destination.
- **Table name**
-
Shows the schema and table names for all tables that are configured to be replicated by the connector. The list matches the
**Included Table Names** or **Included Table Regex** configuration parameters of the connector.
- **Replication status**
-
Shows whether each table is in **Active** or **Failed** replication status.
- **Replication phase**
-
Shows the current table replication phase. After configuration in the connector, tables enter the **New** replication
phase, progress to the **Snapshot Load** phase, perform the initial load, and ultimately enter the **Incremental Replication** phase
when individual change data capture events are processed.
- **Last Ingested**
-
Shows the timestamp of the last inserted record into the destination table during the selected time frame. When looking at this
metric, consider a short delay between the records being ingested and events being logged and available to query.
You can use the **Replication status**, **Replication phase**, and time frame filters to narrow down the table list.
### All connectors
- **Connector status**
-
Shows the connector health status: **Healthy** if no error messages were encountered during the selected time frame,
or **Unhealthy** if any error messages were encountered.
- **Error distribution**
-
Shows a count of how many errors this connector encountered during the selected time period.
- **Average throughput**
-
Measures the rate at which data is read from source systems and ingested into Snowflake for the selected connector.
- The **Average throughput** %raa% **Ingested** metric measures how fast the selected connector ingests data into Snowflake.
- The **Average throughput** %raa% **Read** metric measures how fast the selected connector reads data from source systems.
- **Total data ingested**
-
Shows how much data the selected connector has ingested into Snowflake during the selected time period.
Use this metric to quickly identify ingestion anomalies over a specific time period.
### Custom flows
Custom flows built on the Openflow canvas can also be monitored on the dashboard, but only if they are actively
version-controlled in a customer Git repository using the Openflow Git integration. Flows that aren't version-controlled
don't appear in the dashboard.
For more information, see [](/user-guide/data-integration/openflow/version-control-custom-flows).
## Debugging Openflow connectors
The Openflow Connectors Dashboard serves as an entry point for debugging connector-specific issues and makes all connector logs easily accessible to users.
### Viewing the connector errors
To view all errors that a connector encountered in the selected time frame, first navigate to the connector details page by
selecting %vertical-more-icon% %raa% **View Details**, and then select the **Issues** tab.
The error headline tells you what type of error the connector encountered, and the content provides the entire stacktrace of the error.
### Viewing the connector logs
You might also want to look at additional connector logs to understand the context around an error message. To view all logs for the selected connector,
select %vertical-more-icon% %raa% **View logs**.
After you open the log explorer, you can also change the filters to view logs for different connectors or for entire
runtimes or deployments. The log explorer supports Openflow-specific filters like the dataplane ID, the runtime key, and the process group ID.
### Accessing the Openflow canvas
When you identify a connector issue, you probably need to navigate to the Openflow canvas to fix it; for example, adjust some configuration parameters or
upgrade to a newer connector version.
To navigate to the selected connector in the Openflow canvas, select %vertical-more-icon% %raa% **Go to canvas**.
## Optimizing performance
### Select a larger warehouse
Use the warehouse selector in the top right section of the screen to choose a different warehouse to run the queries.
While larger warehouses run queries faster, they take longer to resume, which might increase the initial page load time.
### Set up clustering on the Openflow event table
By using clustering keys, you can avoid unnecessary scanning of micro-partitions during querying, significantly accelerating
the performance of queries that reference these columns. For more information, see
[](#label-data-clustering).
Run the following query, replacing the placeholders with your Openflow event table:
```sqlsyntax
ALTER TABLE ..
CLUSTER BY (
DATE_TRUNC('HOUR', timestamp),
RECORD_TYPE,
CAST(record_attributes:"metricNameHash" AS STRING)
);
```
- Automatic clustering consumes Snowflake credits using serverless compute resources. To learn how many credits
per compute-hour are consumed, refer to the "Serverless Feature Credit Table" in the
[Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf).
- After you enable clustering on your event table, a background process starts that takes some time to complete.
After the process is complete, you should see improved performance when using the dashboard.
### Reduce the queried time frame
Selecting a smaller time frame in the filter scans less data and leads to faster query performance.
Use the **Last Hour** filter for the best performance and the most up-to-date view of your connector health and performance.
## Limitations
- The Openflow Connectors Dashboard uses data stored in event tables to provide insight into Openflow connectors. Depending on the selected time period and event table,
information provided on the dashboard might not reflect the current status of a connector.
- Detailed health monitoring is currently only available for Database CDC connectors.
- The **Deployment** and **Runtime** filters use internal names that differ from the display names in the Openflow UI.
For details on matching these names, see [Filtering connectors](#label-openflow-dashboard-filtering).
## Known issues
- After upgrading the deployment, runtime, and connector to the versions mentioned in the prerequisites, the error count metric is only accurate
for errors encountered after the upgrade.
---
title: Monitor Openflow
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/monitor-overview.md
section: Loading & Unloading Data
---
# Monitor Openflow
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/setup-openflow-byoc)
- [](/user-guide/data-integration/openflow/setup-openflow-spcs)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
Openflow provides two approaches for monitoring your data integration pipelines:
- [](/user-guide/data-integration/openflow/connectors-dashboard)
-
Use the Openflow Connectors Dashboard in Snowsight to get a high-level view of connector health, throughput,
and data ingestion. The dashboard provides filtering, error distribution, and per-connector detail pages.
- [](/user-guide/data-integration/openflow/monitor)
-
Query the Openflow telemetry data stored in your event table to monitor logs, application metrics, JVM and system
metrics, and build custom queries tailored to your environment.
---
title: Monitor Openflow using telemetry data
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/monitor.md
section: Loading & Unloading Data
---
# Monitor Openflow using telemetry data
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/setup-openflow-byoc)
- [](/user-guide/data-integration/openflow/setup-openflow-spcs)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/processors/index)
- [](/user-guide/data-integration/openflow/controllers/index)
This topic describes how to monitor the state of Openflow and troubleshoot problems.
## Accessing Openflow logs
Snowflake sends Openflow logs to the event table you configured when you set up Openflow
([BYOC](#label-openflow-event-table) | [Snowflake deployment](#label-openflow-spcs-event-table)).
Snowflake recommends that you include a timestamp in the WHERE clause of event table queries.
This is particularly important because of the potential volume of data generated by various Snowflake components.
By applying filters, you can retrieve a smaller subset of data, which improves query performance.
To get started quickly with Openflow's telemetry, see [Example Queries](#label-openflow-example-queries) below.
## Openflow Telemetry Schema
For information about the event table columns, see [](/developer-guide/logging-tracing/event-table-columns).
The following sections describe how Openflow structures telemetry in an Event Table.
### Resource Attributes
Describes the event metadata set by Openflow. For general information on other
types of resource attributes see [](#label-event-table-resource-attributes-column) in the Event Table columns documentation.
| Name |
Type |
Description |
| application |
String |
The fixed value _openflow_ |
| cloud.service.provider |
String |
One of _aws_, _snowflake_ |
| container.id |
String |
Unique identifier of the container |
| container.image.name |
String |
Fully qualified name of the container image. All Openflow images are hosted by Snowflake repositories.
For example, *<account>-openflow-<env>.registry-internal.snowflakecomputing.com/openflow/openflow/openflow_repo/runtime-server*
|
| container.image.tag |
String |
Version of the container image |
| k8s.container.name |
String |
The name of the K8s container. Openflow Runtime containers will start with the "Runtime Key" and end with *-gateway* or *-server*.
For example, an Openflow Runtime named "PostgreSQL CDC" with a Runtime Key of postgresql-cdc, so it would have container names of:
- postgresql-cdc-gateway
- postgresql-cdc-server
|
| k8s.container.restart_count |
Numeric String |
The number of times this container has restarted since it was created. |
| k8s.namespace.name |
String |
K8s namespace of the pod or container, starting with _runtime-_ for Openflow Runtimes. Values also include _kube-system_ and _openflow-runtime-infra_. |
| k8s.node.name |
String |
The internal domain name of the EKS node hosting the pod / container, or the EKS node itself.
For example, ip-10-12-13-144.us-west-2.compute.internal
|
| k8s.pod.name |
String |
The name of the K8s pod. Openflow Runtime pods will start with the "Runtime Key" and end with a numeric identifier for each pod replica. This number can grow up to the "Max Nodes" set for the Runtime, indexed at 0.
For example, an Openflow Runtime named "PostgreSQL CDC" with a Runtime Key of postgresql-cdc and 3 nodes would have pod names of:
- postgresql-cdc-0
- postgresql-cdc-1
- postgresql-cdc-2
|
| k8s.pod.start_time |
ISO 8601 Date String |
Timestamp that the pod was started |
| k8s.pod.uid |
UUID String |
Unique identifier of the pod within the cluster |
| deployment.version |
String |
The Openflow deployment version. |
| openflow.dataplane.id |
UUID String |
The unique identifier of the Openflow Deployment, matching the "ID" shown in the Snowflake Openflow UI through Deployment > View Details. |
- Resource Attributes Example:
-
```json
{
"application": "openflow",
"cloud.service.provider": "aws",
"container.id": "a1b2c3d4e5f6",
"container.image.name": "example-openflow-prod.registry-internal.snowflakecomputing.com/openflow/openflow/openflow_repo/runtime-server",
"container.image.tag": "2026.3.17.13",
"deployment.version": "1.35.0",
"k8s.container.name": "pg-dev-server",
"k8s.container.restart_count": "0",
"k8s.namespace.name": "runtime-pg-dev",
"k8s.node.name": "ip-10-10-62-36.us-east-2.compute.internal",
"k8s.pod.name": "pg-dev-0",
"k8s.pod.start_time": "2025-04-25T22:14:29Z",
"k8s.pod.uid": "94610175-1685-4c8f-b0a1-42898d1058e6",
"openflow.dataplane.id": "abeddb4f-95ae-45aa-95b1-b4752f30c64a"
}
```
### Scope
| Name |
Type |
Description |
| name |
String |
Provider of the metric. One of:
- *runtime* for Openflow Connector metrics
- *github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver* for system-level metrics
|
- Scope Example:
-
```json
{
"name": "runtime"
}
```
### Record Type
Depending on the type of Openflow telemetry represented by this row, this will be one of:
- LOG
- METRIC
Openflow does not collect TRACE records, but that is also a valid type for this column in Snowflake Event Tables.
### Record
Optional. This JSON object describes the type of metric represented by this row.
| Name |
Type |
Description |
| metric |
Object |
Contains two fields:
- *name* for the unique metric produced, typically using dot-delimited namespaces
- *unit* for the value represented by the type, such as byte, nanosecond, and thread
The name and unit values vary widely. For the full list, see [Application Metrics](#label-openflow-application-metrics) below.
|
| metric_type |
String |
One of:
- *gauge* for most Openflow metrics, a snapshot value that can increase or decrease
- *sum* for cumulative metrics like pod CPU time and network IO
|
| value_type |
String |
The primitive type of the value produced by this metric. One of:
- INT
- DOUBLE
|
| aggregation_temporality |
String |
Optional. Set to cumulative for metrics that are strictly increasing and dependent on previous values, such as pod CPU time and network IO. |
| is_monotonic |
Boolean |
Optional. For cumulative metrics, this is true to show that it is strictly increasing within the time series. |
- Record Example:
-
```json
{
"metric": {
"name": "connection.queued.duration.max",
"unit": "millisecond"
},
"metric_type": "gauge",
"value_type": "INT"
}
```
### Record Attributes
#### Logs
Record attributes for Logs will typically indicate where this log was sourced. For example, logs from an Openflow Runtime named *testruntime* could have Record Attributes of:
```json
{
"log.file.path": "/var/log/pods/runtime-testruntime_testruntime-0_66d80cdb-9484-40a4-bdba-f92eb0af14c7/testruntime-server/0.log",
"log.iostream": "stdout",
"logtag": "F"
}
```
#### System Metrics
System metrics like CPU usage will typically not set Record Attributes, so this will be *null*.
#### Openflow Application Metrics
Record Attributes for Application or "Flow" metrics provide details about the component in the data pipeline that produced the metric. This will vary based on the type of component. See [Application Metrics](#label-openflow-application-metrics)
```json
{
"component": "PutSnowpipeStreaming",
"execution.node": "ALL",
"group.id": "c052f9d7-7f76-3013-a2c5-d3b064fa7326",
"id": "c69e2913-22a9-36bb-a159-6a5ed1fb9d63",
"name": "PutSnowpipeStreaming",
"type": "processor"
}
```
### Value
This column contains the raw value of the telemetry. For metrics, this will be a numeric value (integer or double). For logs, this will either be a semi-structured string value or a well-formatted JSON string.
#### Openflow Runtime Logs
Openflow Runtimes emit most logs as JSON, so applying Snowflake's [](/sql-reference/functions/try_parse_json) to the *VALUE* column allows you to further break this value into the following structured fields:
| Name |
Type |
Description |
| formattedMessage |
String |
The actual log message emitted from the Runtime logger. |
| level |
String |
One of:
- ERROR
- WARN
- INFO
- DEBUG
- TRACE
|
| loggerName |
String |
The fully qualified classname for the logger. Openflow processors will typically use logger names that start with *com.snowflake.openflow.runtime.processors*.
This is useful to view logs for a specific processor, controller service, or bundled library.
|
| nanoseconds |
Integer |
Nanosecond-level time that this log message was created, starting at milliseconds.
For example, a nanosecond value of 111222333 could correspond to a timestamp value of 1749180210111 with the leftmost 3 digits of nanosecond matching the right-most 3 digits of timestamp.
|
| threadName |
String |
Name of the thread handling this call. For example, _Timer-Driven Process Thread-7_ |
| throwable |
JSON Object |
*null* when there is no exception or stacktrace for this log message. Otherwise, it logs the stacktrace as a JSON string with fields:
- *className* - the exception thrown
- *message* - any message logged with the exception
- *stepArray* - array of method calls for the stack trace, including:
- *className*
- *fileName*
- *lineNumber*
- *methodName*
|
| timestamp |
Integer |
Time that this log message was created, represented as milliseconds since the UNIX epoch.
For example, 1749180210044 indicates that the log was created at 2025-06-05 03:23:30.044 UTC
|
| mdc |
JSON Object |
Mapped Diagnostic Context (MDC) providing additional flow-level context for the log entry. Contains the following fields:
- *processGroupId* - unique identifier of the process group
- *processGroupIdPath* - hierarchical path of process group IDs
- *processGroupName* - name of the process group
- *processGroupNamePath* - hierarchical path of process group names
- *registeredFlowIdentifier* - identifier of the registered flow (present for all versioned flows, including out-of-the-box Openflow connectors)
- *registeredFlowVersion* - version of the registered flow (present for all versioned flows, including out-of-the-box Openflow connectors)
For example:
```json
{
"processGroupId": "6dc1d98f-019d-1000-ffff-ffffa3ba8a09",
"processGroupIdPath": "/58385a8b-019d-1000-2a52-9ef1c34b0e5f/6dc1d98f-019d-1000-ffff-ffffa3ba8a09",
"processGroupName": "latency targets",
"processGroupNamePath": "/Openflow/latency targets",
"registeredFlowIdentifier": "sqlserver-multidatabase",
"registeredFlowVersion": "0.29.0-ebb7a257"
}
```
|
## Application Metrics
The following list covers all application metrics available for Openflow Runtimes. Runtimes only emit a subset of metrics relevant to Openflow Connectors to persist in a Snowflake Event Table.
Snowflake's OpenTelemetry Reporting Task can send some or all metrics to any OTLP destination.
### Connection Metrics
| Metric Name |
Unit |
Description |
| connection.input.bytes |
bytes |
Size of Items Input |
| connection.input.count |
items |
Count of Items Input |
| connection.output.bytes |
bytes |
Size of Items Output |
| connection.output.count |
items |
Count of Items Output |
| connection.queued.bytes |
bytes |
Size of Items Queued |
| connection.queued.bytes.max |
bytes |
Max Size of Items Queued |
| connection.queued.count |
items |
Count of Items Queued |
| connection.queued.count.max |
items |
Max Count of Items Queued |
| connection.queued.duration.total |
milliseconds |
Total Duration of Queued Items |
| connection.queued.duration.max |
milliseconds |
Max Duration of Queued Items |
| connection.backpressure.threshold.bytes |
bytes |
The maximum size of data in bytes that can be queued in this connection before it applies back pressure. |
| connection.backpressure.threshold.objects |
items |
The configured maximum number of FlowFiles that can be queued in this connection before it applies back pressure. |
| connection.loadbalance.status.load_balance_not_configured |
binary, 0 or 1 |
1 if the connection does not have a configured load balance setting. Otherwise, 0. |
| connection.loadbalance.status.load_balance_active |
binary, 0 or 1 |
1 if the connection is load balancing across the cluster. Otherwise, 0. |
| connection.loadbalance.status.load_balance_inactive |
binary, 0 or 1 |
1 if the connection is not load balancing across the cluster. Otherwise, 0. |
### Connection Record Attributes
Each Connection metric includes the following Record Attributes:
| Attribute |
Description |
| id |
The unique identifier of the connection |
| name |
The user-visible name of the connection |
| type |
The fixed value _connection_ |
| source.id |
The unique identifier of the component that is sending FlowFiles to this connection |
| source.name |
The user-visible name of the component that is sending FlowFiles to this connection |
| destination.id |
The unique identifier of the component that is receiving FlowFiles from this connection |
| destination.name |
The user-visible name of the component that is receiving FlowFiles from this connection |
| group.id |
The unique identifier of the Process Group that contains this Connection |
### Input and Output Port Metrics
Input Port and Output Ports are technically two separate types of components. For consistency, metrics and attributes for Input and Output Ports are the same, with the exception of the *type* attribute that indicates whether it is an input port or an output port.
| Metric Name |
Unit |
Description |
| port.thread.count.active |
threads |
Number of Active Threads |
| port.bytes.received |
bytes |
Number of Bytes Received |
| port.bytes.sent |
bytes |
Number of Bytes Sent |
| port.flowfiles.received |
flowfiles |
Number of FlowFiles Received |
| port.flowfiles.sent |
flowfiles |
Number of FlowFiles Sent |
| port.input.bytes |
bytes |
Size of Items Input |
| port.input.count |
items |
Count of Items Input |
| port.output.bytes |
bytes |
Size of Items Output |
| port.output.count |
items |
Count of Items Output |
### Input and Output Port Record Attributes
Each Port metric includes the following Record Attributes:
| Attribute |
Description |
| id |
The unique identifier of the port |
| name |
The user-visible name of the port |
| type |
One of _port-input_ or _port-output_ |
| group.id |
The unique identifier of the Process Group that contains this Port |
### Process Group Metrics
| Metric Name |
Unit |
Description |
| processgroup.thread.count.active |
threads |
Number of Active Threads |
| processgroup.thread.count.stateless |
threads |
Number of Stateless Threads |
| processgroup.thread.count.terminated |
threads |
Number of Terminated Threads |
| processgroup.bytes.read |
bytes |
Number of Bytes Read |
| processgroup.bytes.received |
bytes |
Number of Bytes Received |
| processgroup.bytes.transferred |
bytes |
Number of Bytes Transferred |
| processgroup.bytes.sent |
bytes |
Number of Bytes Sent |
| processgroup.bytes.written |
bytes |
Number of Bytes Written |
| processgroup.flowfiles.received |
flowfiles |
Number of FlowFiles Received |
| processgroup.flowfiles.sent |
flowfiles |
Number of FlowFiles Sent |
| processgroup.flowfiles.transferred |
flowfiles |
Number of FlowFiles Transferred |
| processgroup.input.count |
items |
Number of Items Input |
| processgroup.input.content.size |
bytes |
Size of Items Input |
| processgroup.output.count |
items |
Number of Items Output |
| processgroup.output.content.size |
bytes |
Size of Items Output |
| processgroup.queued.count |
items |
Number of Items Queued |
| processgroup.queued.content.size |
bytes |
Size of Items Queued |
| processgroup.time.processing |
nanoseconds |
Time Spent Processing |
### Process Group Record Attributes
Each Process Group metric includes the following Record Attributes:
| Attribute |
Description |
| id |
The unique identifier of the Process Group |
| name |
The user-visible name of the Process Group |
| type |
The fixed value _process-group_ |
| tree.level |
The depth of the Process Group, relative to the root process group of the flow. Process Groups at the highest level of the flow will have a tree.level of 1 |
### Processor Metrics
| Metric Name |
Unit |
Description |
| processor.thread.count.active |
thread |
Number of Active Threads |
| processor.thread.count.terminated |
thread |
Number of Terminated Threads |
| processor.time.lineage.average |
nanosecond |
Average Lineage Duration |
| processor.invocations |
invocations |
Number of Invocations |
| processor.bytes.read |
byte |
Number of Bytes Read |
| processor.bytes.received |
byte |
Number of Bytes Received |
| processor.bytes.sent |
byte |
Number of Bytes Sent |
| processor.bytes.written |
byte |
Number of Bytes Written |
| processor.flowfiles.received |
flowfiles |
Number of FlowFiles Received |
| processor.flowfiles.removed |
flowfiles |
Number of FlowFiles Removed |
| processor.flowfiles.sent |
flowfiles |
Number of FlowFiles Sent |
| processor.input.count |
item |
Number of Items Input |
| processor.input.content.size |
bytes |
Size of Items Input |
| processor.output.count |
item |
Number of Items Output |
| processor.output.content.size |
byte |
Size of Items Output |
| processor.time.processing |
nanosecond |
Time Spent Processing |
| processor.run.status.running |
binary, 0 or 1 |
1 if running; 0 otherwise |
| processor.run.status.stopped |
binary, 0 or 1 |
1 if stopped; 0 otherwise |
| processor.run.status.validating |
binary, 0 or 1 |
1 if validating; 0 otherwise |
| processor.run.status.invalid |
binary, 0 or 1 |
1 if invalid; 0 otherwise |
| processor.run.status.disabled |
binary, 0 or 1 |
1 if disabled; 0 otherwise |
| processor.counter |
count |
Value of the counter |
### Processor Record Attributes
Each Processor metric includes the following Record Attributes:
| Attribute |
Description |
| id |
The unique identifier of the processor |
| name |
The user-visible and user-editable name of the Processor |
| type |
The fixed value _processor_ |
| component |
The immutable class name of the processor. |
| execution.node |
Either _ALL_ or _PRIMARY_, depending on how this Processor is configured to run |
| group.id |
The unique identifier of the Process Group that contains this Processor |
### Additional Attributes for Counters
In addition to the standard Processor attributes above, *processor.counter* metrics include the following:
| Attribute |
Description |
| type |
The fixed value _counter_ |
| counter |
The user- or system-generated name of the counter |
### Remote Process Group Metrics
| Metric Name |
Unit |
Description |
| remoteprocessgroup.thread.count.active |
threads |
Number of Active Threads |
| remoteprocessgroup.remote.port.count.active |
ports |
Number of Active Remote Ports |
| remoteprocessgroup.remote.port.count.inactive |
ports |
Number of Inactive Remote Ports |
| remoteprocessgroup.duration.lineage.average |
nanoseconds |
Average Lineage Duration |
| remoteprocessgroup.refresh.age |
milliseconds |
Time since last refresh |
| remoteprocessgroup.received.count |
items |
Number of Received Items |
| remoteprocessgroup.received.content.size |
bytes |
Size of Received Items |
| remoteprocessgroup.sent.count |
items |
Number of Sent Items |
| remoteprocessgroup.sent.content.size |
bytes |
Size of Sent Items |
| remoteprocessgroup.transmission.status.transmitting |
binary, 0 or 1 |
1 if the Remote Process Group is transmitting. Otherwise, 0. |
| remoteprocessgroup.transmission.status.nottransmitting |
binary, 0 or 1 |
0 if the Remote Process Group is transmitting. Otherwise, 1. |
### Remote Process Group Record Attributes
Each Remote Process Group metric includes the following Record Attributes:
| Attribute |
Description |
| id |
The unique identifier of the remote process group |
| name |
The user-visible name of the Remote Process Group |
| group.id |
The unique identifier of the Process Group that contains this Remote Process Group |
| authorization.issue |
The Authorization used to access the Remote Process Group |
| target.uri |
The URI of the Remote Process Group |
| type |
The fixed value _remote-process-group_ |
### JVM Metrics
| Metric Name |
Unit |
Description |
| jvm.memory.heap.used |
bytes |
The amount of memory currently occupied by objects on the JVM Heap |
| jvm.memory.heap.committed |
bytes |
The amount of memory guaranteed to be available for use by the JVM Heap |
| jvm.memory.heap.max |
bytes |
Maximum amount of memory allocated for the JVM Heap |
| jvm.memory.heap.init |
bytes |
Initial amount of memory allocated for the JVM Heap |
| jvm.memory.heap.usage |
percentage |
JVM Heap Usage |
| jvm.memory.non-heap.usage |
percentage |
JVM Non-Heap Usage |
| jvm.memory.total.init |
bytes |
Initial amount of memory allocated for the JVM |
| jvm.memory.total.used |
bytes |
Current amount of memory used by the JVM |
| jvm.memory.total.max |
bytes |
Maximum amount of memory that can be used by the JVM |
| jvm.memory.total.committed |
bytes |
The amount of memory guaranteed to be available for use by the JVM |
| jvm.threads.count |
threads |
Number of live threads |
| jvm.threads.deadlocks |
threads |
JVM Thread Deadlocks |
| jvm.threads.daemon.count |
threads |
Number of live daemon threads |
| jvm.uptime |
seconds |
Number of seconds the JVM process has been running |
| jvm.file.descriptor.usage |
percentage |
Percentage of available file descriptors currently in use. |
| jvm.gc.G1-Concurrent-GC.runs |
runs |
Total number of times that the G1 Concurrent Garbage Collection has run |
| jvm.gc.G1-Concurrent-GC.time |
milliseconds |
Total amount of time that the G1 Concurrent Garbage Collection has been running |
| jvm.gc.G1-Young-Generation.runs |
runs |
Total number of times that the G1 Young Generation has run |
| jvm.gc.G1-Young-Generation.time |
milliseconds |
Total amount of time that the G1 Young Generation has been running |
| jvm.gc.G1-Old-Generation.runs |
runs |
Total number of times that the G1 Old Generation has run |
| jvm.gc.G1-Old-Generation.time |
milliseconds |
Total amount of time that the G1 Old Generation has been running |
### JVM Record Attributes
JVM metrics do not provide Record Attributes.
### CPU Metrics
| Metric Name |
Unit |
Description |
| cores.available |
cores |
The number of available cores for the Runtime |
| cores.load |
percentage |
Either the system load average or -1 if it is not available |
### CPU Record Attributes
| Attribute |
Description |
| id |
The fixed value _cpu_ |
| name |
The name of the operating system |
| architecture |
The architecture of the operating system |
| version |
The version of the operating system |
### Storage Metrics
| Metric Name |
Unit |
Description |
| storage.free |
bytes |
The amount of free storage for a given repository |
| storage.used |
bytes |
The amount of used storage for a given repository |
### Storage Record Attributes
| Attribute |
Description |
| id |
The unique identifier of the storage repository |
| name |
Same as id and provided for consistency |
| storage.type |
One of _flowfile_, _content_, or _provenance_ |
## Example Queries
The following queries are examples to get you started with Openflow Telemetry.
All queries assume that Openflow is configured to send telemetry to the default Event Table of *SNOWFLAKE.TELEMETRY.EVENTS*. If your Snowflake Account or Openflow Deployment is configured with a different Event Table, substitute that table name where you see *SNOWFLAKE.TELEMETRY.EVENTS*.
### Find Stuck FlowFiles
This query returns connections with FlowFiles that have been queued for more than some threshold, indicating that they may be stuck and require intervention. Adjust the 30 minute threshold as needed for your use case.
```sql
SELECT * FROM (
SELECT
resource_attributes:"openflow.dataplane.id" as Deployment_ID,
resource_attributes:"k8s.namespace.name" as Runtime_Key,
record_attributes:name as Connection_Name,
record_attributes:id as Connection_ID,
MAX(TO_NUMBER(value / 60 / 1000)) as Max_Queued_File_Minutes
FROM snowflake.telemetry.events
WHERE true
AND record_type = 'METRIC'
AND record:metric:name = 'connection.queued.duration.max'
AND timestamp > dateadd(minutes, -30, sysdate())
GROUP BY 1, 2, 3, 4
ORDER BY Max_Queued_File_Minutes DESC
) WHERE Max_Queued_File_Minutes > 30;
```
### Find Error Logs for Openflow Runtimes
```sql
SELECT
timestamp,
Deployment_ID,
Runtime_Key,
parsed_log:level as log_level,
parsed_log:loggerName as logger,
parsed_log:formattedMessage as message,
parsed_log
FROM (
SELECT
timestamp,
resource_attributes:"openflow.dataplane.id" as Deployment_ID,
resource_attributes:"k8s.namespace.name" as Runtime_Key,
TRY_PARSE_JSON(value) as parsed_log
FROM snowflake.telemetry.events
WHERE true
AND timestamp > dateadd('minutes', -30, sysdate())
AND record_type = 'LOG'
AND resource_attributes:"k8s.namespace.name" like 'runtime-%'
ORDER BY timestamp DESC
) WHERE log_level = 'ERROR';
```
### Find Running and Non-Running Processors
Some flows expect that all processors are in a "running" state, even if they are not actively processing data.
This query helps you find any processors that are running or in another state, such as:
- stopped
- invalid
- disabled
```sql
SELECT
timestamp,
resource_attributes:"openflow.dataplane.id" as Deployment_ID,
resource_attributes:"k8s.namespace.name" as Runtime_Key,
record_attributes:component as Processor,
record_attributes:id as Processor_ID,
TO_NUMBER(value) as Running
FROM snowflake.telemetry.events
WHERE true
AND record:metric:name = 'processor.run.status.running'
AND record_type = 'METRIC'
AND timestamp > dateadd(minutes, -30, sysdate());
```
### Find High CPU Usage for Openflow Runtimes
Slow data flows or reduced throughput may be the result of a bottleneck on the CPU. Openflow Runtimes scale up automatically, based on the number of minimum and maximum nodes you have configured.
If an Openflow Runtime is using its maximum number of nodes and still CPU usage remains high, consider:
1. Increasing the maximum number of nodes allocated to the Runtime
2. Troubleshoot the Connector or flow to identify the bottleneck
Snowsight Charts provide an easy way to visualize query results for CPU usage over time.
```sql
SELECT
timestamp,
resource_attributes:"openflow.dataplane.id" as Deployment_ID,
resource_attributes:"k8s.namespace.name" as Runtime_Key,
resource_attributes:"k8s.pod.name" as Runtime_Pod,
TO_NUMBER(value, 10, 3) * 100 as CPU_Usage_Percentage
FROM snowflake.telemetry.events
WHERE true
AND timestamp > dateadd(minute, -30, sysdate())
AND record_type = 'METRIC'
AND record:metric:name ilike 'container.cpu.usage'
AND resource_attributes:"k8s.namespace.name" ilike 'runtime-%'
AND resource_attributes:"k8s.container.name" ilike '%-server'
ORDER BY timestamp desc, CPU_Usage_Percentage desc;
```
---
title: MonitorActivity 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/monitoractivity.md
section: Loading & Unloading Data
---
# MonitorActivity 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Monitors the flow for activity and sends out an indicator when the flow has not had any data for some specified amount of time and again when the flow's activity is restored
## Tags
active, activity, detection, flow, inactive, monitor
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Activity Restored Message |
The message that will be the content of FlowFiles that are sent to 'activity.restored' relationship |
| Continually Send Messages |
If true, will send inactivity indicator continually every Threshold Duration amount of time until activity is restored; if false, will send an indicator only when the flow first becomes inactive |
| Copy Attributes |
If true, will copy all flow file attributes from the flow file that resumed activity to the newly created indicator flow file |
| Inactivity Message |
The message that will be the content of FlowFiles that are sent to the 'inactive' relationship |
| Monitoring Scope |
Specify how to determine activeness of the flow. 'node' means that activeness is examined at individual node separately. It can be useful if DFM expects each node should receive flow files in a distributed manner. With 'cluster', it defines the flow is active while at least one node receives flow files actively. If NiFi is running as standalone mode, this should be set as 'node', if it 's' cluster ', NiFi logs a warning message and act as' node'scope. |
| Reporting Node |
Specify which node should send notification flow-files to inactive and activity.restored relationships. With 'all', every node in this cluster send notification flow-files. 'primary' means flow-files will be sent only from a primary node. If NiFi is running as standalone mode, this should be set as 'all', even if it 's' primary ', NiFi act as' all'. |
| Reset State on Restart |
When the processor gets started or restarted, if set to true, the initial state will always be active. Otherwise, the last reported flow state will be preserved. |
| Threshold Duration |
Determines how much time must elapse before considering the flow to be inactive |
| Wait for Activity |
When the processor gets started or restarted, if set to true, only send an inactive indicator if there had been activity beforehand. Otherwise send an inactive indicator even if there had not been activity beforehand. |
## State management
| Scopes |
Description |
| LOCAL |
MonitorActivity stores the last timestamp at each node as state, so that it can examine activity at cluster wide. If 'Copy Attribute' is set to true, then flow file attributes are also persisted. In local scope, it stores last known activity timestamp if the flow is inactive. |
| CLUSTER |
MonitorActivity stores the last timestamp at each node as state, so that it can examine activity at cluster wide. If 'Copy Attribute' is set to true, then flow file attributes are also persisted. In local scope, it stores last known activity timestamp if the flow is inactive. |
## Relationships
| Name |
Description |
| activity.restored |
This relationship is used to transfer an Activity Restored indicator when FlowFiles are routing to 'success' following a period of inactivity |
| inactive |
This relationship is used to transfer an Inactivity indicator when no FlowFiles are routed to 'success' for Threshold Duration amount of time |
| success |
All incoming FlowFiles are routed to success |
## Writes attributes
| Name |
Description |
| inactivityStartMillis |
The time at which Inactivity began, in the form of milliseconds since Epoch |
| inactivityDurationMillis |
The number of milliseconds that the inactivity has spanned |
---
title: MoveAzureDataLakeStorage 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/moveazuredatalakestorage.md
section: Loading & Unloading Data
---
# MoveAzureDataLakeStorage 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-azure-nar
## Description
Moves content within an Azure Data Lake Storage Gen 2. After the move, files will be no longer available on source location.
## Tags
adlsgen2, azure, cloud, datalake, microsoft, storage
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| ADLS Credentials |
Controller Service used to obtain Azure Credentials. |
| Conflict Resolution Strategy |
Indicates what should happen when a file with the same name already exists in the output directory |
| Destination Directory |
Name of the Azure Storage Directory where the files will be moved. The Directory Name cannot contain a leading '/'. The root directory can be designated by the empty string value. Non-existing directories will be created. If the original directory structure should be kept, the full directory path needs to be provided after the destination directory. e.g.: destdir/$\{azure.directory\} |
| Destination Filesystem |
Name of the Azure Storage File System where the files will be moved. |
| File Name |
The filename |
| Source Directory |
Name of the Azure Storage Directory from where the move should happen. The Directory Name cannot contain a leading '/'. The root directory can be designated by the empty string value. |
| Source Filesystem |
Name of the Azure Storage File System from where the move should happen. |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor. |
## Relationships
| Name |
Description |
| failure |
Files that could not be written to Azure storage for some reason are transferred to this relationship |
| success |
Files that have been successfully written to Azure storage are transferred to this relationship |
## Writes attributes
| Name |
Description |
| azure.source.filesystem |
The name of the source Azure File System |
| azure.source.directory |
The name of the source Azure Directory |
| azure.filesystem |
The name of the Azure File System |
| azure.directory |
The name of the Azure Directory |
| azure.filename |
The name of the Azure File |
| azure.primaryUri |
Primary location for file content |
| azure.length |
The length of the Azure File |
## See also
- [org.apache.nifi.processors.azure.storage.DeleteAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/deleteazuredatalakestorage)
- [org.apache.nifi.processors.azure.storage.FetchAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/fetchazuredatalakestorage)
- [org.apache.nifi.processors.azure.storage.ListAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/listazuredatalakestorage)
---
title: Notify 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/notify.md
section: Loading & Unloading Data
---
# Notify 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Caches a release signal identifier in the distributed cache, optionally along with the FlowFile's attributes. Any flow files held at a corresponding Wait processor will be released once this signal in the cache is discovered.
## Tags
cache, distributed, map, notify, release, signal
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| attribute-cache-regex |
Any attributes whose names match this regex will be stored in the distributed cache to be copied to any FlowFiles released from a corresponding Wait processor. Note that the uuid attribute will not be cached regardless of this value. If blank, no attributes will be cached. |
| distributed-cache-service |
The Controller Service that is used to cache release signals in order to release files queued at a corresponding Wait processor |
| release-signal-id |
A value, or the results of an Attribute Expression Language statement, which will be evaluated against a FlowFile in order to determine the release signal cache key |
| signal-buffer-count |
Specify the maximum number of incoming flow files that can be buffered until signals are notified to cache service. The more buffer can provide the better performance, as it reduces the number of interactions with cache service by grouping signals by signal identifier when multiple incoming flow files share the same signal identifier. |
| signal-counter-delta |
A value, or the results of an Attribute Expression Language statement, which will be evaluated against a FlowFile in order to determine the signal counter delta. Specify how much the counter should increase. For example, if multiple signal events are processed at upstream flow in batch oriented way, the number of events processed can be notified with this property at once. Zero (0) has a special meaning, it clears target count back to 0, which is especially useful when used with Wait Releasable FlowFile Count = Zero (0) mode, to provide 'open-close-gate' type of flow control. One (1) can open a corresponding Wait processor, and Zero (0) can negate it as if closing a gate. |
| signal-counter-name |
A value, or the results of an Attribute Expression Language statement, which will be evaluated against a FlowFile in order to determine the signal counter name. Signal counter name is useful when a corresponding Wait processor needs to know the number of occurrences of different types of events, such as success or failure, or destination data source names, etc. |
## Relationships
| Name |
Description |
| failure |
When the cache cannot be reached, or if the Release Signal Identifier evaluates to null or empty, FlowFiles will be routed to this relationship |
| success |
All FlowFiles where the release signal has been successfully entered in the cache will be routed to this relationship |
## Writes attributes
| Name |
Description |
| notified |
All FlowFiles will have an attribute 'notified'. The value of this attribute is true, is the FlowFile is notified, otherwise false. |
## See also
- [org.apache.nifi.processors.standard.Wait](/user-guide/data-integration/openflow/processors/wait)
---
title: Object definition overrides for the Openflow Connector for Shopify
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/shopify/object-definitions.md
section: Loading & Unloading Data
---
# Object definition overrides for the %shopifyof%
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/connectors/shopify/setup)
This topic describes the **Object Definitions Override** parameter in detail, including the
full schema, promoted column and child field definitions, and a complete example.
The **Object Definitions Override** parameter accepts a JSON array of object definitions.
Each definition can add a new object type or fully replace an existing catalog entry.
## Object definition schema
The **Object Definitions Override** value must be valid JSON. If the JSON is malformed,
the connector fails to start. Validate your JSON before applying the override.
The following fields are supported in each object definition:
| Field |
Description |
| `apiType` |
**(Required)** The query endpoint name in the Shopify Admin GraphQL API. This must match the root query field exactly (for example, `orders` for the [orders query](https://shopify.dev/docs/api/admin-graphql/2026-04/queries/orders), `products` for the [products query](https://shopify.dev/docs/api/admin-graphql/2026-04/queries/products)). Used as the key for lookup and override matching. |
| `tableName` |
**(Required)** The Snowflake destination table name. |
| `gidTypeName` |
The Shopify GID resource type (for example, `Order`, `Product`). Used for delete cascade and child record routing. |
| `additionalGidTypeNames` |
Array of additional GID type names that also map to this object. Use when Shopify returns the same resource under more than one GID type name, so that records are routed to the correct table regardless of which GID type appears in the response. |
| `graphqlFields` |
List of GraphQL selection fields. Each entry is a field name, a nested selection (for example, `"totalPriceSet { shopMoney { amount currencyCode } }"`), or an aliased field with arguments (for example, `"tier: metafield(key: \"custom.tier\") { value }"`). Aliases are useful for querying metafields by key. |
| `requiredQueryArgs` |
Map of fixed GraphQL argument key-value pairs appended to every query for this object. Use for endpoints that require non-standard arguments that aren't covered by the built-in query parameters (for example, `{"type": "SALES_CHANNEL"}`). |
| `supportsIncremental` |
Whether the object supports incremental sync. Default: `true`. |
| `incrementalField` |
The field used for watermark-based incremental queries (for example, `updatedAt`, `createdAt`). |
| `refreshStrategy` |
Controls the sync mode. `INCREMENTAL` (default) uses watermark-based incremental queries. `FULL_PERIODIC` performs a complete re-sync on each run instead. `PARENT_PIGGYBACKED` means this object is extracted from another object's query response and is not queried independently. |
| `supportsDeletes` |
Whether the connector should track deletion events for this object. Default: `false`. |
| `promotedColumns` |
Array of column definitions that extract values from the JSON payload into dedicated Snowflake columns. See [Promoted columns](#label-promoted-columns). |
| `childFields` |
Array of child connection definitions that are extracted into separate tables. See [Child fields](#label-child-fields). |
| `ignoredFields` |
List of field names to exclude from queries. |
| `supportsBulk` |
Whether the object supports bulk queries through the Shopify Bulk Operations API. Default: `true`. |
| `sortKeys` |
List of sort key values (from the object's corresponding `SortKeys` enum) used to order results during bulk and incremental queries. For example, `["UPDATED_AT", "ID"]`. |
| `sortKeyStyle` |
How sort key values are formatted in queries. `ENUM` (default) uses bare enum values (for example, `UPDATED_AT`). `STRING` uses quoted lowercase strings (for example, `"updated_at"`). Use `STRING` for object types that accept string sort keys, such as metaobjects. |
## Promoted columns
Promoted columns extract specific values from the raw JSON payload into dedicated typed
columns in the destination table. This makes frequently queried fields available as
first-class Snowflake columns for efficient filtering and aggregation.
Each promoted column has the following fields:
| Field |
Description |
| `name` |
The Snowflake column name (uppercase recommended). |
| `path` |
A JSONPath expression pointing to the value in the raw record (for example, `$.email`, `$.totalPriceSet.shopMoney.amount`). |
| `type` |
The column type: `string`, `integer`, `boolean`, `float`, `money`, `timestamp`, `date`, `id`, `gid`, or `json`. |
## Child fields
Child field definitions extract nested connections (such as order line items) into separate
Snowflake tables. Each child table includes a `__PARENT_ID` column linking records back
to the parent.
Each child field has the following fields:
| Field |
Description |
| `fieldName` |
The GraphQL connection field name in the parent object (for example, `lineItems`). |
| `tableName` |
The Snowflake table name for the child records. |
| `gidTypeName` |
The Shopify GID type for the child (for example, `LineItem`). |
| `connectionType` |
`edges` (paginated connection) or `array` (inline array). Default: `edges`. |
| `pageSize` |
Number of child records to fetch per page. Default: `250`. |
| `graphqlFields` |
Explicit GraphQL selection set for the child table. If omitted, the connector parses the child's fields from the matching connection entry in the parent's `graphqlFields` list. |
| `promotedColumns` |
Array of promoted column definitions for the child table, using the same schema as top-level `promotedColumns`. |
## Example: Register a custom object type with promoted columns
The following override customizes an existing catalog entry to add scalar fields, nested object
selections, a metafield alias, and promoted columns. One promoted column extracts a value
directly from the aliased metafield.
```json
[
{
"apiType": "draftOrders",
"tableName": "DRAFT_ORDERS",
"gidTypeName": "DraftOrder",
"supportsBulk": true,
"supportsIncremental": true,
"incrementalField": "updatedAt",
"ignoredFields": [],
"sortKeys": ["UPDATED_AT", "ID"],
"supportsDeletes": false,
"graphqlFields": [
"id",
"createdAt",
"updatedAt",
"name",
"status",
"email",
"currencyCode",
"totalQuantityOfLineItems",
"customer { id }",
"totalPriceSet { shopMoney { amount currencyCode } }",
"billingAddress { address1 city countryCode zip }",
"draft_po_number: metafield(key: \"custom.draft_po_number\") { key namespace compareDigest createdAt id jsonValue legacyResourceId updatedAt value definition { id description key pinnedPosition } }"
],
"promotedColumns": [
{ "name": "STATUS", "path": "$.status", "type": "string" },
{ "name": "NAME", "path": "$.name", "type": "string" },
{ "name": "CUSTOMER_ID", "path": "$.customer.id", "type": "gid" },
{ "name": "TOTAL_PRICE_AMOUNT", "path": "$.totalPriceSet.shopMoney.amount", "type": "money" },
{ "name": "DRAFT_PO_NUMBER", "path": "$.draft_po_number.value", "type": "string" }
],
"childFields": []
}
]
```
## Example: Override an object with child fields
The following override customizes the `orders` object to extract line items and fulfillments
into separate tables. Line items use a paginated connection (`edges`); fulfillments are an
inline array in the parent response (`array`).
```json
[
{
"apiType": "orders",
"tableName": "ORDERS",
"gidTypeName": "Order",
"graphqlFields": [
"id",
"createdAt",
"updatedAt",
"name",
"email",
"lineItems(first: 250) { edges { cursor node { id title quantity originalUnitPriceSet { shopMoney { amount currencyCode } } } } }",
"fulfillments { id status createdAt }"
],
"childFields": [
{
"fieldName": "lineItems",
"tableName": "ORDER_LINE_ITEMS",
"gidTypeName": "LineItem",
"connectionType": "edges"
},
{
"fieldName": "fulfillments",
"tableName": "ORDER_FULFILLMENTS",
"gidTypeName": "Fulfillment",
"connectionType": "array"
}
]
}
]
```
---
title: OpenAiTranscribeAudio 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/openaitranscribeaudio.md
section: Loading & Unloading Data
---
# OpenAiTranscribeAudio 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-openai-nar
## Description
Transcribes audio into English text. The audio data must be in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
## Tags
audio, flac, m4a, mp3, mp4, mpeg, mpga, ogg, openai, openflow, speech-to-text, text, transcribe, translate, wav, webm
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Model Name |
The name of the OpenAI Model to use |
| OpenAI API Key |
The API Key for interacting with OpenAI |
| Prompt |
Text that can be used to guide the model's style or continue a previous audio segment. The text must be in English. |
| Response Format |
Specifies which format is desired for the output |
| Temperature |
The sampling temperature to use. The value must be a floating-point number between 0.0 and 1.0. A higher value, such as 0.8 will result in more of an interpreted translation, whereas a value of 0.0 will result in a more literal translation. |
## Relationships
| Name |
Description |
| failure |
FlowFiles that could not be transcribed are routed to this relationship. |
| success |
FlowFiles that have been successfully transcribed will be transferred to this relationship. |
## Use Cases Involving Other Components
| Create embeddings for audio data and insert them into Pinecone so that the audio can be made available to a large language model (LLM) such as OpenAI's GPT models. |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
---
title: Openflow BYOC - Set up custom ingress
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-byoc-custom-ingress.md
section: Loading & Unloading Data
---
# Openflow BYOC - Set up custom ingress
This feature is not available in the People's Republic of China.
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/about-byoc)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/monitor)
- [](/user-guide/data-integration/openflow/troubleshoot)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
This topic describes the considerations for and steps required to set up an Openflow BYOC deployment with a custom ingress solution managed within your own AWS account.
## Benefits
Custom ingress for Openflow BYOC deployments provides your organization with:
- Stronger security with network-level restrictions that can limit access to only your VPN or private network.
- Full control over the URL and TLS certificate used to access Openflow to meet your security and compliance requirements.
## Considerations
With Snowflake managed ingress, Openflow creates the necessary DNS records, public load balancer, and manages the TLS certificate for the Openflow runtimes in your BYOC deployment.
When you enable custom ingress, Openflow will no longer automatically manage external DNS records, will not create a public load balancer automatically, and will no longer manage certificates for the Openflow runtimes. You must manage these resources within your own AWS account.

## Configure custom ingress in Snowflake Openflow
1. Enable custom ingress during deployment creation.
- During deployment creation, enable **Custom ingress** and specify your preferred fully qualified domain name (FQDN) in the **Hostname** field.
- You must be able to manage this DNS record and create a TLS certificate for this FQDN. Do not use a subdomain of `snowflakecomputing.com`.
- You must not include the protocol **https://** or a trailing slash **/** in the FQDN.
- For example, if you specify `openflow01.your-domain.org`, you will access a runtime named "My Runtime" at `https://openflow01.your-domain.org/my-runtime/nifi/`.
2. Download the CloudFormation template. This file has all of the settings required for Openflow to run as your custom ingress domain.
## Configure custom ingress in AWS
`{deployment-key}` represents the Openflow unique identifier applied to cloud resources created and managed by Openflow for a particular deployment.
This is in the `DataPlaneKey` parameter of the CloudFormation template, also available in Openflow through the **View Details** menu option for the deployment.
1. Add the following tag to the private subnets for your Openflow deployment:
- Key: **kubernetes.io/role/internal-elb**
- Value: `1`
2. If your private subnets are used by other EKS clusters, you must also tag them with the name of the Openflow cluster. This allows Openflow to create a load balancer alongside other load balancers.
- Key: **kubernetes.io/cluster/\{deployment-key\}**
- Value: `1`
3. Upload the CloudFormation template. Wait approximately 30 minutes for Openflow to create the internal network load balancer.
- You can find the internal network load balancer in the AWS Console under **EC2** %ra% **Load Balancers**.
- The load balancer will be named `runtime-ingress-{deployment-key}`.
4. Obtain the internal IP address of the Openflow-managed AWS internal network load balancer.
- Under **EC2** %ra% **Load Balancers**, navigate to the details page and copy the **DNS name** of the Load Balancer.
- Log into your agent EC2 instance (identified as **openflow-agent-\{deployment-key\}**) and run the command `nslookup {openflow-load-balancer-dns-name}`.
- Copy the IP addresses of the Openflow-managed AWS internal network load balancer. These are destinations for the target group of the load balancer you will create in a following step.
5. Provision a TLS certificate.
- Obtain a TLS certificate for the load balancer that will handle traffic to the Openflow runtime UIs. You can generate a certificate using AWS Certificate Manager (ACM) or import an existing certificate.
6. Create a network load balancer that will route traffic to the Openflow-managed AWS internal network load balancer.
1. In your AWS account, create a Network Load Balancer with the following configuration:
- Name: We recommend the naming convention `custom-ingress-external-{deployment-key}`, where `{deployment-key}` is the key of your Openflow deployment.
- Type: **Network Load Balancer**
- Scheme: **Internal** or **Internet-facing**, depending on your requirements.
- VPC: Select the VPC of your deployment
- Availability Zones: Select both Availability Zones where your Openflow deployment is running.
- Subnets: Select the private subnets of your VPC for an **Internal** Load Balancer, or the public subnets of your VPC for an **Internet-facing** Load Balancer.
- Security groups: Select or create a security group that allows traffic on port `443`
- Default SSL/TLS server certificate: Import your SSL/TLS certificate
- Target group: Create a new target group with the following settings:
- Target type: **IP addresses**
- Protocol: **TLS**
- Port: **443**
- VPC: Verify the VPC matches your deployment
- Type the IP address of the internal network load balancer created by Openflow (obtained in the previous step) as the target and select **Include as pending below**.
2. Once the load balancer is created, copy the DNS name for the load balancer to use in the next step.
3. For more information on how to create a network load balancer, see [Create a Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/create-network-load-balancer.html).
7. Create a DNS CNAME record that maps your custom ingress FQDN to the AWS load balancer's DNS name.
- For detailed DNS configuration instructions in Route 53, see [Create records in Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resource-record-sets-creating.html).
## Verification
1. The Openflow deployment shows a status of **Active** in the **Deployments** page.
2. Create a runtime in the Openflow deployment.
3. Once the runtime is **Active**, click on the runtime name or use the **View canvas** menu option to access the runtime's UI.
4. Openflow directs you to the runtime with the hostname specified during deployment creation. For example, `https://openflow01.your-domain.org/my-runtime/nifi/`.
## Troubleshooting
The following sections provide troubleshooting steps for common issues with custom ingress. If you are still experiencing issues after performing these checks, file a [Snowflake Support](https://docs.snowflake.com/user-guide/contacting-support) case.
### Load balancer target health check
The target group for your network load balancer should list the IP addresses of the Openflow-managed internal network load balancer as targets. All of these targets should show as **Healthy**. If targets are **Unhealthy**, use the following checks to narrow down where traffic is failing.
1. In the AWS console, open **EC2** %ra% **Load Balancers**.
2. Locate the Openflow-managed load balancer that manages ingress to the Kubernetes cluster. This load balancer is named `runtime-ingress-{deployment-key}`.
3. Review the target health for that load balancer under the **Resource map** tab.
4. If the Openflow-managed load balancer is not active or has **Unhealthy** targets:
- Traffic may be blocked between the Openflow-managed load balancer and the BYOC cluster, or a service inside the cluster may not be ready.
- Generate a diagnostic bundle by running `./diagnostics.sh` from the **openflow-agent-\{deployment-key\}** EC2 instance and attach it to a [Snowflake Support](https://docs.snowflake.com/user-guide/contacting-support) case.
5. If the Openflow-managed load balancer is active and has healthy targets, check the target health for your load balancer.
6. If your load balancer's targets are **Unhealthy**, the path from your load balancer to the Openflow-managed load balancer is the most likely problem:
- **Incorrect or stale IP addresses in your target group.** The Openflow-managed load balancer exposes multiple IP addresses that can change over time. To get the latest values, run `nslookup` with the **DNS name** of the Openflow-managed load balancer. Update your load balancer's targets as necessary.
- **Security group rules.** Confirm that inbound rules on the Openflow-managed load balancer's security groups allow TCP `443` from your load balancer. Traffic can fail if your load balancer can't reach the Openflow load balancer on port `443`.
### Browser security blocking
Some problems with custom ingress are caused by corporate browser security, firewalls, or web proxies that block or inspect traffic to your custom hostname. Those policies are separate from AWS load balancer configuration. You may find that users can't open the Openflow UI even when AWS load balancers report healthy targets.
To verify connectivity through the load balancers to the Openflow services:
1. In the AWS console, open **EC2** %ra% **Load Balancers** to get the DNS name of the load balancer that is serving traffic and the TLS certificate for your custom ingress domain name.
- This is **not** the **runtime-ingress-\{deployment-key\}** load balancer.
2. From the **openflow-agent-\{deployment-key\}** EC2 instance, verify connectivity through the load balancers to the Openflow deployment. Run the command:
```bash
curl -kv https://{your-load-balancer-dns-name}
```
- If the command outputs the expected certificate information and a successful 404 status code response, you have successfully verified connectivity to your Openflow deployment.
- If the command times out or returns an error, create a [Snowflake Support](https://docs.snowflake.com/user-guide/contacting-support) case and attach a diagnostic bundle generated by running `./diagnostics.sh` from the Openflow Agent instance.
3. From the Openflow Agent instance, you can also verify the DNS CNAME record for your custom ingress FQDN. Run the command:
```bash
source ~/.env && nslookup $DOMAIN
```
- If the command returns the IP addresses of the load balancer that is performing TLS termination for your custom ingress domain name, you have successfully verified the DNS CNAME record.
- If the command returns no results, the DNS CNAME record is not configured correctly. Check the DNS record for your custom ingress FQDN and ensure it points to your load balancer's DNS name.
If the Openflow Agent connected successfully through your load balancer's DNS and you have verified the DNS CNAME record, a security policy or firewall is likely blocking traffic from your browser to the Openflow BYOC deployment. Work with your security team to allowlist your custom ingress FQDN.
---
title: Openflow BYOC - Set up encrypted EBS volumes
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-byoc-encrypted-volumes.md
section: Loading & Unloading Data
---
# Openflow BYOC - Set up encrypted EBS volumes
This feature is not available in the People's Republic of China.
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/about-byoc)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/monitor)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
This topic describes the steps to set up an Openflow BYOC deployment with encrypted Elastic Block Storage (EBS) volumes using one of the following methods:
- [](#label-openflow-byoc-encrypted-ebs-kms-key)
- [](#label-openflow-byoc-encrypted-ebs-default-encryption)
Both of these solutions provide encrypted EBS volumes that meet the following storage requirements of Openflow BYOC:
- Root volume for the Openflow Agent EC2 instance
- Root volumes for the EC2 instances in each EKS Cluster Node Group
- Persistent volumes for Openflow's runtimes and supporting components
- `$AWS_ACCOUNT_ID` represents the AWS Account ID of the account where Openflow is deployed.
- `$AWS_REGION` represents the AWS Region of the account, for example `us-west-2`.
- `$AWS_KMS_KEY_ARN` represents the Amazon Resource Name (ARN) of the Amazon Key Management Service (AWS KMS) key that Openflow will use for encrypted EBS volumes.
- `$DEPLOYMENT_KEY` represents the Openflow unique identifier applied to cloud resources created and managed by Openflow for a particular deployment.
This is in the `DataPlaneKey` parameter of the CloudFormation template, also available in Openflow through the **View Details** menu option for the deployment.
## Prerequisites
This topic assumes that you have completed the prerequisites for setting up Openflow BYOC. For more information, see [](/user-guide/data-integration/openflow/setup-openflow-byoc).
You must also have access to an AWS KMS key that Openflow will use for encrypted EBS volumes.
## Provide a specific AWS KMS Key for Encrypted EBS Volumes
When uploading the CloudFormation template for your Openflow BYOC Deployment, you can provide the ARN for the AWS KMS key that Openflow uses for encrypted EBS volumes.
Using this configuration, Openflow makes requests for encrypted EBS volumes, ensuring that all SCP policies are satisfied. Snowflake recommends this approach for most customers.
This allows you to use different KMS keys for different applications, reducing the risk of a single key being compromised.
To ensure that Openflow has the necessary permissions to use this key, perform the following tasks:
1. Ensure that the AWS KMS key grants permissions to the AWS Autoscaling Service Role. The Key Policy must include the following statement:
```json
{
"Sid": "Allow Autoscaling to use the key",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::$AWS_ACCOUNT_ID:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
},
"Action": [
"kms:CreateGrant",
"kms:Decrypt",
"kms:Encrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "*"
}
```
2. Enter the ARN of the AWS KMS key in the `EBSKMSKeyArn` parameter of the CloudFormation stack when uploading the template.
For example, `arn:aws:kms:$AWS_REGION:$AWS_ACCOUNT_ID:key/1a1a11aa-aa1a-aaa1a-a1a1-000000000000`.
Approximately 20 minutes after uploading the CloudFormation template, the Openflow BYOC Deployment creates a new IAM Role with the name `$DEPLOYMENT_KEY-eks-role`.
3. Add the following statement to the KMS key policy to grant permissions for Openflow to use the key:
```json
{
"Sid": "Allow Openflow Deployment to encrypt EBS volumes",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::$AWS_ACCOUNT_ID:role/$DEPLOYMENT_KEY-eks-role"
},
"Action": [
"kms:Decrypt",
"kms:Encrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:CreateGrant",
"kms:DescribeKey"
],
"Resource": "*"
}
```
Openflow automatically detects the new permissions for the KMS key and continues the installation process. The Openflow BYOC deployment will become `Active` after approximately 20 minutes.
## Enable Encrypted EBS Volumes by default for your AWS Account
AWS accounts can encrypt new EBS volumes by default by following the [AWS EBS encryption by default documentation](https://docs.aws.amazon.com/ebs/latest/userguide/encryption-by-default.html).
With this configuration, Openflow makes requests for unencrypted EBS volumes, but the AWS API will return an encrypted EBS volume. The following steps ensure that Openflow has permissions to use the KMS key for these encrypted volumes.
Whether you choose to use the AWS managed key `aws/ebs` or your own KMS key, you must attach an IAM Policy to the Openflow IAM Role `$DEPLOYMENT_KEY-eks-role` that grants the necessary permissions to use the key.
1. Create an IAM Policy to allow Openflow to use the KMS key by replacing `$AWS_KMS_KEY_ARN` with the ARN of the KMS key.
```json
{
"Sid": "Allow Openflow EKS Role to encrypt EBS volumes",
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:Encrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:CreateGrant",
"kms:DescribeKey"
],
"Resource": "$AWS_KMS_KEY_ARN"
}
```
2. Ensure that the AWS KMS key grants permissions to the AWS Autoscaling Service Role. The Key Policy must include the following statement:
```json
{
"Sid": "Allow Autoscaling to use the key",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::$AWS_ACCOUNT_ID:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
},
"Action": [
"kms:CreateGrant",
"kms:Decrypt",
"kms:Encrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "*"
}
```
3. When uploading the Openflow BYOC CloudFormation template:
- Leave the optional `EBSKMSKeyArn` parameter blank.
- Set the `AdditionalEksRolePolicyArns` parameter to the ARN of the new IAM Policy created previously. For example, `arn:aws:iam::$AWS_ACCOUNT_ID:policy/openflow-kms-key-access-policy`.
Approximately 20 minutes after uploading the CloudFormation template, the Openflow BYOC Deployment creates a new IAM Role with the name `$DEPLOYMENT_KEY-eks-role`.
4. Add the following statement to the KMS key policy to grant permissions for Openflow to use the key:
```json
{
"Sid": "Allow Openflow Deployment to encrypt EBS volumes",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::$AWS_ACCOUNT_ID:role/$DEPLOYMENT_KEY-eks-role"
},
"Action": [
"kms:Decrypt",
"kms:Encrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:CreateGrant",
"kms:DescribeKey"
],
"Resource": "*"
}
```
Openflow automatically detects the new permissions for the KMS key and continues the installation process. The Openflow BYOC deployment will become `Active` after approximately 20 minutes.
---
title: Openflow BYOC cost and scaling considerations
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/cost-byoc.md
section: Loading & Unloading Data
---
# Openflow BYOC cost and scaling considerations
This feature is not available in the People's Republic of China.
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/about-byoc)
- [](/user-guide/data-integration/openflow/setup-openflow-byoc)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/monitor)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
Snowflake Openflow BYOC has cost considerations in multiple areas, including infrastructure, compute, data ingestion and others.
Scaling Openflow involves understanding these costs. The following sections describe Openflow BYOC costs in general,
and provide a number of examples of scaling Openflow BYOC runtimes and associated costs.
## Openflow BYOC costs
When using Openflow, you can incur the following types of costs:
| Cost category |
Description |
| Openflow (shown as **Openflow Compute BYOC** on your Snowflake bill) |
Cost based on the number of virtual CPU cores (vCPU) used by connector runtimes
within your "bring your own cloud (BYOC)" environment. You are charged for active runtimes only.
The compute used for Openflow management processes is excluded from this specific charge.
Credits are billed per-second with a 60 second minimum.
For an example of using of VCPU and the impacts of scaling see [](#label-openflow-byoc-scaling-overview).
For information on the rate per vCPU per hour, refer to Table 1(g) in the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf).
Additionally, the [METERING_DAILY_HISTORY](/sql-reference/account-usage/metering_daily_history)
and [METERING_HISTORY](/sql-reference/account-usage/metering_history) views in the
[](/sql-reference/account-usage) schema can provide additional details on Openflow compute costs
using queries for *SERVICE_TYPE=OPENFLOW_COMPUTE_BYOC*.
See [](/user-guide/cost-exploring-compute) for more information on exploring compute costs in Snowflake.
|
| Infrastructure (only for BYOC configuration) |
Applicable only for BYOC deployments, you directly pay your cloud provider, for example, AWS,
for the underlying infrastructure provisioned in your environment to run Openflow.
This primarily includes compute (for runtimes you provision to run the connectors and for managing the runtimes),
networking, and storage costs and will appear on your CSP bill.
The EC2 compute requirements are illustrated in the following image:

|
| Ingestion |
Cost for loading data into Snowflake using services such as Snowpipe or Snowpipe Streaming, based on data volume.
Appears on your Snowflake bill under respective ingestion services line items.
Certain connectors may require a standard Snowflake warehouse, incurring additional warehouse costs.
For example, database CDC connectors require a Snowflake warehouse for both initial snapshot and
incremental Change Data Capture (CDC).
You can schedule [](/sql-reference/sql/merge) operations to manage the compute cost.
|
| Telemetry Data Ingest |
Standard Snowflake charges for sending logs and metrics to Openflow deployments
and sending runtimes to your event table within Snowflake.
The rate for credits per GB of telemetry data can be found in Table 5 in the [Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf).
|
## Openflow BYOC scaling
The runtimes and scaling behavior you choose are crucial for managing costs effectively.
Openflow supports different runtime types, each with its own scaling characteristics.
### Runtime types and the associated costs
The following table illustrates the scaling behavior of various runtimes and their associated costs:
| Runtimes | Activity | Snowflake costs | Cloud costs |
| --------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------- |
| No runtimes | None | No cost | Compute and storage of Dataplane |
| 1 small runtime (1vCPU) (min 1 max 2) | Active for 1 hour Runtime does not scale to 2. | 1 runtime x 1 node x 1 vCPU x 1 hour = 1 Total = 1 vCPU-hour | Compute and storage of Dataplane |
| 2 small runtime (1 vCPU) (min/max=2) 1 large runtime (8 vCPU) (min/max=10) | Small: 2 nodes active for 1 hour Large: 10 nodes active for 1 hour | 2 runtime2 x 2 node x 2 vCPU x 1 hour = 4 vCPU 1 runtime x 10 nodes x 8 vCPU x 1 hour = 80 vCPU Total = 84 vCPU-hours | Compute and storage of Dataplane |
| 1 medium (4vCPU) (min =1 max=2) | First 20 minutes, 1 node is running Scales to 2 nodes for the remaining 40 minutes of the hour Total 1 hour | 20 minutes = 1/3 hour 1 runtime x 1 node x 4 vCPU x 1/3 hour = 4/3 1 runtime x 2 nodes x 4 vCPU x 2/3 hour = 16/3 Total = 6 2/3 vCPU-hours | Compute and storage of Dataplane |
| 1 medium (4vCPU) (min/max=2) | First 30 minutes 2 nodes running Suspends after first 30 minutes. | 30 minutes = 1/2 hour 1 runtime x 2 nodes x 4 vCPU x 1/2 hour = 4 Total = 4 vCPU-hours | Compute and storage of Dataplane |
### Mapping runtimes to EC2 instance types
Choosing a runtime type (t-shirt size) results in the runtime pods being scheduled on the associated EC2
node group \{key\}-sm-group, \{key\}-md-group, or \{key\}-lg-group with resources described in the following table:
| Runtime type | vCPUs | Available memory (GB) | EC2 instance type | EC2 node group | EC2 node - CPUs | EC2 node - memory (GB) |
| ------------ | ----- | --------------------- | ----------------- | ---------------- | --------------- | ---------------------- |
| Small | 1 | 2 | m7i.xlarge | \{key\}-sm-group | 4 | 16 |
| Medium | 4 | 10 | m7i.4xlarge | \{key\}-md-group | 16 | 64 |
| Large | 8 | 20 | m7i.8xlarge | \{key\}-lg-group | 32 | 128 |
The type of runtime that you choose impacts the number of cores (vCPUs) consumed each second. Openflow scales the underlying EC2 node group
when additional pods need to be scheduled, based on CPU consumption, and up to the maximum node setting set during runtime creation.
EKS node groups are configured with a minimum size of 0 nodes and a maximum of 50 nodes.
The desired size is dynamically adjusted depending on the runtime required CPU and memory.
Customers are charged by their cloud service provider for the underlying nodes that host their runtime.
The underlying EC2 instances are created when the first runtime of a respective size is scheduled.
### Examples for calculating Openflow BYOC runtime consumption
- A user requests a BYOC deployment from Openflow and then installs the Openflow agent and deployment
-
- The user has not created any runtimes. 0 vCPUs are allocated, so there is no Openflow software cost.
- The user is charged by their cloud service provider for the provisioned compute and storage of the Openflow BYOC deployment.
- Total Openflow consumption = 0 vCPU-hours
- A user creates one small runtime with Min Nodes = 1 and Max Nodes = 2. Runtime stays at 1 node for 1 hour.
-
- 1 small runtime = 1 vCPU
- Total Openflow consumption = 1 vCPU-hour
- A user creates 2 small runtimes with min/max of 2 nodes each, and one large runtime with min/max of 10 nodes. These Runtimes are active for 1 hour
-
- 2 small runtimes at 2 nodes = 2 Runtimes x 2 nodes x 1 vCPU = 4 vCPUs
- 1 large runtime at 10 nodes = 1 Runtime x 10 nodes x 8 vCPU = 80 vCPUs
- Total Openflow consumption = (4 vCPU + 80 vCPU) x 1 hour = 84 vCPU-hours
- A user creates 1 medium runtime with 1 node. After 20 minutes, it scales to 2 nodes and remains at 2 nodes for the rest of the hour.
-
- 1 medium runtime = 4 vCPUs
- 20 minutes = 1/3 hour; 40 minutes = 2/3 hour
- (1 node x 4 vCPU x 1/3 hour) + (2 nodes x 4 vCPU x 2/3 hour)
- 4/3 vCPU-hours + 16/3 vCPU-hours
- Total Openflow consumption = 20/3 vCPU-hours, so approximately 6.67 vCPU-hours
- A user creates 1 medium runtime with 2 nodes, then suspends it after 30 minutes
-
- 1 medium runtime = 4 vCPU
- 30 minutes = 1/2 hour
- Total Openflow consumption = (2 nodes x 4 vCPU x 1/2 hour) = 4 vCPU-hours
---
title: Openflow Connector for Amazon Kinesis Data Streams
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kinesis/about.md
section: Loading & Unloading Data
---
# %kinesis%
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/kinesis/setup)
- [](/user-guide/data-integration/openflow/connectors/kinesis/performance-tuning)
- [](/user-guide/data-integration/openflow/connectors/kinesis/maintenance)
- [](/user-guide/data-integration/openflow/connectors/kinesis/troubleshoot)
## About
This topic describes the basic concepts of %kinesis%, including its workflow and limitations.
You can use [Amazon Kinesis Data Streams](https://docs.aws.amazon.com/streams/latest/dev/introduction.html) to collect and process large streams of data records in real time. Producers continually push data to Kinesis Data Streams, and consumers process the data in real time.
A Kinesis data stream is a set of [shards](https://docs.aws.amazon.com/streams/latest/dev/key-concepts.html#shard). Each shard has a sequence of data records. A data record is the unit of data stored in a Kinesis data stream. Data records are composed of a sequence number, a partition key, and a data blob, which is an immutable sequence of bytes.
%kinesis% reads data from Kinesis streams and writes it into Snowflake tables using the
[Snowpipe Streaming](/user-guide/snowpipe-streaming/snowpipe-streaming-high-performance-overview) architecture.
Use this connector if you're looking to do the following:
- Ingest real-time events from Amazon Kinesis into Snowflake for near real-time analytics
- Ingest real-time events from Amazon Kinesis into Snowflake-managed Iceberg™ tables
- Accelerate your ingestion even more by combining Openflow speed with the Interactive Tables feature
- Use Single Message Transforms to enrich or filter data before it lands in Snowflake.
## Limitations
- One connector supports only ingestion from a single stream.
- The connector does not support schema evolution for Apache Iceberg™ tables.
- Autoscaling is not supported. The number of Openflow runtime min and max nodes should be constant for the runtime where %kinesis% is deployed.
- The connector supports routing Kinesis traffic through Snowflake outbound AWS PrivateLink. DynamoDB traffic must use the public endpoint because Amazon DynamoDB doesn't support Private DNS. For more information, see [](#label-kinesis-configure-aws-privatelink).
### Limitations of fault tolerance with the connector
Kinesis Streams can be configured with a retention time. If for any reason the %kinesis% is not able to ingest data for more than the retention time, then expired records will not be loaded.
### Supported data types and authentication methods
The connector by default is configured to work with the JSON data type and supports authentication using AWS Credentials: Access Key ID and Secret Access Key. Connector can be customized to work with other data types and authentication methods.
## Next steps
- [](/user-guide/data-integration/openflow/connectors/kinesis/setup)
---
title: Openflow Connector for MySQL: Data mapping
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/mysql/data-mapping.md
section: Loading & Unloading Data
---
# %mysql%: Data mapping
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/connectors/mysql/about)
- [](/user-guide/data-integration/openflow/connectors/mysql/setup)
This topic describes MySQL data types are mapped
to Snowflake data types.
## MySQL to Snowflake data type mapping
The following table shows how MySQL data types are mapped to Snowflake data types
when replicating data.
| MySQL type |
Snowflake type |
Notes |
| DECIMAL / NUMERIC |
NUMBER |
The maximum number of digits in DECIMAL format for MySQL is 65. For Snowflake, the maximum is 38. Precision is lost when exceeded. |
| INT / INTEGER |
INT |
|
| TINYINT / BOOL |
INT |
|
| SMALLINT |
INT |
|
| MEDIUMINT |
INT |
|
| BIGINT |
INT |
|
| YEAR |
INT |
|
| FLOAT |
FLOAT |
|
| DOUBLE |
FLOAT |
|
| VARCHAR |
TEXT |
|
| CHAR |
TEXT |
Trailing spaces aren't preserved. |
| TINYTEXT |
TEXT |
|
| TEXT |
TEXT |
|
| MEDIUMTEXT |
TEXT |
Supported up to the maximum entry size in Snowflake (16 MB). |
| LONGTEXT |
TEXT |
Supported up to the maximum entry size in Snowflake (16 MB). |
| ENUM |
TEXT |
Stored as a string value. For example, for `ENUM('one', 'two')` the possible values are `'one'` and `'two'`. |
| SET |
TEXT |
Stored as a comma-separated string in column declaration order. For example, for `SET('one', 'two')` the possible values are `''`, `'one'`, `'two'`, and `'one,two'`. |
| BIT |
TEXT |
Represented as a hexadecimal string. For example: `'83060c183060c183'`. |
| DATE |
DATE |
|
| DATETIME |
TIMESTAMP_NTZ |
|
| TIMESTAMP |
TIMESTAMP_TZ |
Values are stored in UTC. |
| TIME |
TIME |
|
| BINARY |
BINARY |
|
| VARBINARY |
BINARY |
|
| TINYBLOB |
BINARY |
|
| BLOB |
BINARY |
|
| MEDIUMBLOB |
BINARY |
Supported up to the maximum entry size in Snowflake (16 MB). |
| LONGBLOB |
BINARY |
Supported up to the maximum entry size in Snowflake (16 MB). |
| JSON |
VARIANT |
Supported up to the maximum entry size in Snowflake (16 MB). |
Any MySQL data types not listed in this table are mapped to TEXT by default.
---
title: Openflow Connector for MySQL: Maintenance
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/mysql/maintenance.md
section: Loading & Unloading Data
---
# %mysql%: Maintenance
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/mysql/setup)
- [](/user-guide/data-integration/openflow/connectors/mysql/data-mapping)
This topic describes important maintenance considerations and best practices for
maintaining the %mysql% such as reinstalling the connector or setting the starting binary log position for loading.
These operations are often used in conjunction with [Incremental replication without snapshots](/user-guide/data-integration/openflow/connectors/mysql/incremental-replication).
## Check the replication status of a table
Interim failures, such as connection errors, do not prevent table replication. However,
permanent failures, such as unsupported data types, prevent table replication.
To troubleshoot replication issues or verify that a table has been successfully removed from the replication flow, check the Table State Store:
1. In the Openflow runtime canvas, right-click a processor group and choose **Controller Services**. A table listing controller services displays.
2. Locate the row labeled **Table State Store**, click the **More** %sf-vertical-more-button% button on the right side of the row, and then choose **View State**.
A list of tables and their current states displays. Type in the search box to filter the list by table name. The possible states are:
- **NEW**: The table is scheduled for replication but replication hasn't started.
- **SNAPSHOT_REPLICATION**: The connector is copying existing data. This status displays until all records are stored in the destination table.
- **INCREMENTAL_REPLICATION**: The connector is actively replicating changes. This status displays after snapshot replication ends and continues to display indefinitely until a table is either removed from replication or replication fails.
- **FAILED**: Replication has permanently stopped due to an error.
The Openflow runtime canvas doesn't display table status changes — only the current table status. However, table status changes are recorded in logs when they occur. Look for the following log message:
```text
Replication state for table .. changed from to
```
If a permanent failure prevents table replication, remove the table from replication. After you address the problem that caused the failure, you can add the table back to replication. For more information, see [Restart table replication](#label-of-mysql-restart-table-replication).
## Reinstall the connector
This section provides instructions on how to reinstall the connector, and continue replicating data for
the same tables without having to snapshot them again.
It covers situations where the new connector is installed in the same runtime, as well as moved to a new runtime.
For the connector to continue replicating from the same CDC stream position where it stopped before reinstallation,
the source database must retain the binary log long enough to cover the time since the prior connector was stopped
and the new connector is started.
Make sure the `binlog_expire_logs_seconds` parameter of the MySQL server is high enough, and keep the reinstallation time to a minimum.
The value of `binlog_expire_logs_seconds` needs to be longer than the expected time expected to reinstall the connector.
Typically 86400s, a day is seconds, is sufficient, however longer times might be appropriate to ensure time to reinstall.
### Prerequisites
Review and note connector parameter context values.
If you're reinstalling the connector in the same runtime, you can reuse the existing context.
If the new instance is located in a different runtime, you must re-enter all parameters.
1. Finish processing all in-flight FlowFiles in the existing connector, then stop the connector.
1. Sign in to %sf-web-interface-link%.
2. In the navigation menu, select **Ingestion** %raa% **Openflow**.
3. Select **Launch Openflow**.
4. In the **Openflow** pane select the **Runtimes** tab.
5. Select the runtime containing the connector.
6. Select the connector.
7. Stop the topmost processor **Set Tables for Replication** in the **Snapshot Load** group.
8. Stop the topmost processor **Read MySQL CDC Stream** in the **Incremental Load** group.
9. If you changed the value of the **Merge Task Schedule CRON** parameter, return it to `* * * * * ?`, otherwise queues won't be emptied until the next scheduled run.
Wait until all FlowFiles in the connector have been processed, and all queues are empty.
When all FlowFiles have been processed, the **Queued** value on the connector's processor group becomes zero.
If there are any items left in the original connector's queues, there may be data gaps when the new connector starts.
10. Stop all processors and controller services in the connector.
The existing connector can remain in the runtime and doesn't interfere with the new instance, as long as it remains stopped.
2. Create a new instance of the connector. If you're using the same runtime as the original connector, you can choose to keep the existing parameter contexts and reuse the settings.
3. If you're installing into a different runtime or you deleted the previous parameter contexts, enter the configuration settings into the new parameter contexts,
including the table names and patterns as described in [](/user-guide/data-integration/openflow/connectors/mysql/setup).
4. Navigate to the `MySQL Ingestion Parameters` context, and set the following parameters:
- Set the `Ingestion Type` parameter to `incremental`. For more information on the concerns see [](#label-mysql-incremental-replication).
- Set the `Starting Binlog Position` parameter to `Earliest`.
For more information and potential concerns see [](#label-mysql-connector-start-restart-incremental-load-from-earliest-available-binary-log-position).
5. Start the new connector.
### Usage notes
The new connector uses the existing destination tables that were created by the original connector, but the connector creates new journal tables.
## Specify load from binary log position
The %mysql% connector allows you to select the starting position where MySQL binary logs are read.
By default the connector reads from the latest available position. Alternatively, you can choose the earliest position available on the source instance.
Choosing to start from the earliest position is common when reinstalling the connector.
This allows the new instance to catch up and continue replicating existing tables without having to snapshot each again.
Note that switching a running connector from latest to earliest position cause the entire available binary log
to be re-read, re-processed, and re-applied to the destination table.
While the binary log is being re-read, the columns and data in affected destination tables
can become out of sync with their sources until all events have been re-processed and merged.
The following parameters control snapshot loads are available in the `Ingestion Parameters` context:
| Parameter |
Description |
| Starting Binlog Position |
- *Latest* (default): CDC stream reading starts at the latest available position and continues from there.
- `Earliest`: Switches the incremental load to start, or restart reading from the earliest available
binary log position.
|
| Re-read Tables in State |
- `New` (default):
While re-reading the binary log, only those events will be processed
from new tables added to replication after the re-reading started.
Other events are discarded until the connector reaches the position just before re-reading started.
- `Any active`: Re-read and re-process events from any table currently in replication.
|
To determine whether the connector finished re-reading the binary log:
1. Navigate to the Openflow canvas.
2. Open the **Incremental Load** process group.
3. Right-click the topmost processor named **Read MySQL CDC Stream**, then select **View state**.
4. Compare the state entries:
- **binlog.position.rewind**: the latest position the processor read before re-reading of the binary log started.
- **binlog.position.dml**: the current latest position read by the processor. As long as this value is lower than the rewind value above, the processor is still re-reading the binary log.
### Usage notes
- After a running connector is switched to read from the earliest position, and starts running,
the process can't be reconfigured or cancelled, and will continue until the currently-read position reaches the position from before it started.
- Switching to the earliest position on a running connector will, for any tables being re-processed,
finish their existing journals, and create new journal tables.
- If the binary log contains events from a previous table that was dropped
and re-created in the source database, the re-reading the stream re-processes all events in the current destination.
The connector can't distinguish between a previous and current source table if they share the same name.
---
title: Openflow Connector for MySQL: Set up incremental replication without snapshots
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/mysql/incremental-replication.md
section: Loading & Unloading Data
---
# %mysql%: Set up incremental replication without snapshots
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/mysql/setup)
- [](/user-guide/data-integration/openflow/connectors/mysql/data-mapping)
You can configure the %mysql% connector to immediately replicate incremental changes for newly added tables, bypassing snapshots. Use incremental load to continue replication without snapshotting every table again when you reinstall the connector over previously replicated data.
To enable incremental replication in a new connector instance:
1. Set up the connector as described in [](/user-guide/data-integration/openflow/connectors/mysql/setup).
2. In the `MySQL Ingestion Parameters` context, set the `Ingestion Type` parameter to `incremental`.
## Enable incremental replication without snapshots
To enable incremental replication on an existing connector:
1. sign in to %sf-web-interface-link%.
2. in the navigation menu, select **Ingestion** %raa% **Openflow**.
3. In the **Openflow** pane select the **Runtimes** tab.
4. Select the runtime containing the connector.
5. Select the connector.
6. In the `Ingestion Parameters` context, specify `Ingestion Type` = `incremental`.
7. Add new replication tables. These tables immediately switch to their incremental load.
To return to replicating tables with the snapshot load, change **Ingestion Type** from `incremental` to `full`.
# Usage notes
- Changing the value of **Ingestion Type** does not impact any tables that have begun replicating data.
Tables currently in the snapshot phase continue until the snapshot load is complete.
- While **Ingestion Type** is set to `incremental`, new tables added to the list of replicated tables bypass the snapshot phase.
This includes new tables added to the source database that match the `Included Table Regex` parameter.
Ensure that the ingestion type is set to `incremental` to bypass the snapshot phase.
Connectors should only remain in `incremental` mode as long as required as it bypasses snapshots.
Once customer needs for incremental updates have been satisfied the connector should be returned to `full` mode.
- For tables that bypass snapshot load, the connector creates a destination table in Snowflake,
by executing `CREATE TABLE IF NOT EXISTS`, only if no destination table already exists.
Tables going through the snapshot require that no destination table exist.
---
title: Openflow Connector for Oracle: Configure the Oracle database
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/setup-oracledb.md
section: Loading & Unloading Data
---
# %oracleofc%: Configure the Oracle database
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
The %oracleofc% is also subject to additional terms of service beyond the standard
connector terms of service. For more information, see the
[Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/).
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/oracle/about)
- [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms)
- [](/user-guide/data-integration/openflow/connectors/oracle/setup-tasks)
- [](/user-guide/data-integration/openflow/connectors/oracle/setup-snowflake)
This topic describes how to set up the Oracle database for %oracleofc%.
Your Oracle database setup depends on your organization's security policies
and database architecture. For example, if tables reside in a Container
Database (CDB), a Pluggable Database (PDB), multiple PDBs, or a combination.
The steps provided in this topic are examples only. Modify them
as required for your environment.
As an Oracle database administrator, perform the following procedures on your source database:
1. [](#label-set-up-archived-redo-logs-retention-period)
2. [](#label-enable-xstream-and-supplemental-logging)
3. [](#label-create-xstream-administrator-user)
4. [](#label-granting-xstream-administrator-privileges)
5. [](#label-configure-xstream-server-connect-user)
6. [](#label-create-xstream-outbound-server)
7. [](#label-set-xstream-outbound-server-connect-user)
8. [](#label-set-xstream-outbound-server-capture-user)
9. (Optional) [](#label-configure-ssl-connections)
The steps in this topic are written for a multi-tenant architecture with a Container
Database (CDB) and one or more Pluggable Databases (PDB). If your Oracle database uses a single-tenant
architecture, see [](#label-setup-xstream-single-tenant).
## Configure the retention period for archived redo logs
You must enable the `ARCHIVELOG` mode to ensure that change data is available for replication.
If you use AWS RDS for Oracle, you must also configure the retention period for archived redo logs.
Determine this period based on the volume of changes in the source database and your storage capacity.
To set the retention period, for example to 24 hours, follow the procedures in the following table:
| Database version |
Procedure |
| AWS RDS (Standard) |
Run the following:
```sql
begin
rdsadmin.rdsadmin_util.set_configuration(
name => 'archivelog retention hours',
value => '24');
end;
/
commit;
```
For more information see
[Retaining archived redo logs](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.Oracle.CommonDBATasks.RetainRedoLogs.html).
|
| AWS RDS Custom |
1. Create a text file named `/opt/aws/rdscustomagent/config/redo_logs_custom_configuration.json`.
2. Add a JSON object to this file in the following format: `{"archivedLogRetentionHours" : "24"}`.
For more information see
[Restoring an RDS Custom for Oracle instance](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/custom-backup.pitr.html).
|
## Enable XStream and supplemental logging
XStream is included with Oracle Database and doesn't require any additional software.
To enable and configure XStream replication to capture and stream change data, run the following commands:
1. Enable XStream replication:
```sql
ALTER SYSTEM SET enable_goldengate_replication=TRUE SCOPE=BOTH;
ALTER SYSTEM SET STREAMS_POOL_SIZE = 2560M;
```
Snowflake recommends setting the streams pool size to 2.5 GB. This allocation covers the following:
- 1 GB for Capture
- 1 GB for Apply
- An additional 25% buffer
To enable supplemental logging to ensure that the redo logs capture the information required for logical
replication, run the following commands:
1. Confirm that the database is in ARCHIVELOG mode as shown in the following example:
```sql
SELECT LOG_MODE, FORCE_LOGGING FROM V$DATABASE;
```
Snowflake recommends forcing logging on database or table space level.
2. Set the container to the root container and add supplemental logging to the database:
```sql
ALTER SESSION SET CONTAINER = CDB$ROOT;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
```
Alternatively, you can enable logging only on specific tables as shown in the following example:
```sql
ALTER TABLE schema_name.table_name ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
```
## Create the XStream administrator user
An XStream administrator user is required to manage XStream components, including the
creation and alteration of outbound servers.
You can either create a dedicated user for this purpose or use an existing user,
provided that the necessary XStream administration privileges are granted (see the next section).
The following example details the setup of a dedicated XStream administrator user in the root container of a CDB.
The following example assumes that the database also has a PDB containing tables to be replicated.
Connect as SYSDBA or a user with appropriate privileges and run the following commands:
```sql
-- Switch to the root container.
ALTER SESSION SET CONTAINER = CDB$ROOT;
-- Create a tablespace for the XStream administrator user.
CREATE TABLESPACE xstream_adm_tbs DATAFILE '/path/to/your/cdb/xstream_adm_tbs.dbf'
SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
-- Switch to the Pluggable Database (PDB) and create a tablespace there.
ALTER SESSION SET CONTAINER = YOUR_PDB_NAME;
CREATE TABLESPACE xstream_adm_tbs DATAFILE '/path/to/your/pdb/xstream_adm_tbs.dbf'
SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
-- Switch back to the root container to create the common user.
ALTER SESSION SET CONTAINER = CDB$ROOT;
-- Create the XStream administrator user.
-- Note 'c##' prefix indicates a common user in a CDB environment, and CONTAINER=ALL grants privileges across all containers.
-- Replace "YOUR_XSTREAM_ADMIN_PASSWORD" with a strong, secure password.
CREATE USER c##xstreamadmin IDENTIFIED BY "YOUR_XSTREAM_ADMIN_PASSWORD"
DEFAULT TABLESPACE xstream_adm_tbs
QUOTA UNLIMITED ON xstream_adm_tbs
CONTAINER=ALL;
```
## Grant XStream administrator privileges
Connect as SYSDBA or a user with appropriate privileges and grant the required privileges
to the XStream administrator user.
1. Grant the CREATE SESSION privilege to the XStream administrator:
```sql
GRANT CREATE SESSION TO c##xstreamadmin CONTAINER=ALL;
```
2. Grant XStream capture privileges using one of the following commands, depending on your Oracle Database version:
| Database version |
Command |
| Oracle Database 21c and earlier |
Run the following:
```sql
BEGIN
DBMS_XSTREAM_AUTH.GRANT_ADMIN_PRIVILEGE(
grantee => 'c##xstreamadmin',
privilege_type => 'CAPTURE',
grant_select_privileges => TRUE,
container => 'ALL');
END;
/
```
|
| Oracle Database 23c and later |
Oracle Database 23c introduced a dedicated `XSTREAM_CAPTURE` system privilege. Run the following:
```sql
GRANT XSTREAM_CAPTURE TO c##xstreamadmin CONTAINER=ALL;
```
|
## Configure XStream server connect user
The Snowflake Openflow Connector uses a dedicated connect user to establish a connection to the XStream Outbound Server and receive change data.
This user requires specific privileges to facilitate replication:
- **Read from XStream Outbound Server**: The user must be able to access the change data stream from the configured XStream Outbound Server.
- **Select from Data Dictionary Views**: The connect user needs SELECT access to various data dictionary views.
This can be achieved by granting SELECT_CATALOG_ROLE or SELECT ANY DICTIONARY.
If granting SELECT ANY DICTIONARY isn't desired due to company policy, the user specifically needs SELECT access to the following views:
- ALL_USERS
- ALL_TABLES
- ALL_TAB_COLS
- ALL_CONS_COLUMNS
- ALL_CONSTRAINTS
- ALL_INDEXES
- ALL_IND_COLUMNS
- V$DATABASE
`ALL_INDEXES` and `ALL_IND_COLUMNS` are required so the connector can detect
unique constraints and unique indexes as replication keys when a table has no
primary key. For more information on the selection algorithm, see
[](#label-oracle-replication-key-selection).
- **Select from Source Tables**: The user must have SELECT privileges on all tables that are intended for replication.
The following is an example of how to set up such a user in the root container of the CDB.
The example assumes that the database also has a PDB containing tables to be replicated.
```sql
-- Connect as SYSDBA or a user with appropriate privileges
-- Switch to the root container.
ALTER SESSION SET CONTAINER = CDB$ROOT;
-- Create the connect user.
-- Replace "YOUR_CAPTURE_USER_PASSWORD" with a strong, secure password.
CREATE USER c##connectuser IDENTIFIED BY "YOUR_CAPTURE_USER_PASSWORD"
CONTAINER=ALL;
-- Grant necessary privileges to the connect user.
-- You can choose to grant access to specific tables
-- instead of SELECT ANY TABLE for more granular control,
-- for example, GRANT SELECT ON schema.table TO c##connectuser;
GRANT CREATE SESSION, SELECT_CATALOG_ROLE, SELECT ANY TABLE TO c##connectuser CONTAINER=ALL;
```
If your database is multi-tenant and the connector is connected to a CDB to replicate
data from multiple PDBs, grant the connect user the additional privileges needed to
switch between containers and read data dictionary information across all of them:
```sql
ALTER USER c##connectuser SET CONTAINER_DATA = ALL CONTAINER = CURRENT;
GRANT SET CONTAINER TO c##connectuser CONTAINER=ALL;
```
## Create XStream Outbound Server
The XStream Outbound Server captures changes from redo logs for consumption by the Openflow Connector. Define which schemas or tables to replicate.
For more information see [DBMS_XSTREAM_ADM.CREATE_OUTBOUND Documentation](https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_XSTREAM_ADM.html#GUID-A602ED86-0F5A-4A27-92A0-55D5ADC0AF0D).
Important considerations for replication scope:
- If a table is included in the XStream Outbound filtering rules command, it won't be replicated.
- A table or schema included here must also be defined in the connector parameters for it to be replicated.
You can include an entire schema in the server filtering rules and later, in the connector parameters,
specify only certain tables within that schema for replication.
The XStream Outbound Server can only be created from root container. However,
starting with Oracle Database version 23ai, it can also be created on the PDB level.
To avoid a significant hit to your CPU and network, and to prevent your queues from being filled with irrelevant data, it's essential to use a granular approach. The best way to do this is with the DBMS_XSTREAM_ADM.ADD_TABLE_RULES procedure, which lets you choose only the specific tables
you need.
The following examples show how to set up the XStream Outbound Server based on different replication needs. In practice, when setting up your XStream Outbound Server on your production environment, you should be selective about what changes you capture. Capturing everything can have serious consequences for your database's performance and resource usage.
For information on how to configure XStream Outbound Server, see
[Configuring XStream Out](https://docs.oracle.com/en/database/oracle/oracle-database/19/xstrm/configuring-xstream-out.html#GUID-A1C8430E-565B-4F66-8E00-495F283AAAFB).
**Example 1:** Capture all tables from all schemas in the root container and all PDBs
```sql
-- Connect as a user with XStream admin privileges to the root container.
-- Ensure serveroutput is enabled to see messages from the PL/SQL block.
SET SERVEROUTPUT ON;
DECLARE
tables DBMS_UTILITY.UNCL_ARRAY;
schemas DBMS_UTILITY.UNCL_ARRAY;
BEGIN
-- To replicate all tables in all schemas across all containers, set both to NULL.
tables(1) := NULL;
schemas(1) := NULL;
DBMS_XSTREAM_ADM.CREATE_OUTBOUND(
server_name => 'XOUT1',
table_names => tables,
schema_names => schemas,
include_ddl => TRUE
);
DBMS_OUTPUT.PUT_LINE('XStream Outbound Server created.');
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Error creating XStream Outbound Server: ' || SQLERRM);
RAISE;
END;
/
```
**Example 2:** Capture all tables from a single schema in a Pluggable Database (PDB)
```sql
-- Connect as a user with XStream admin privileges to the root container.
-- Ensure serveroutput is enabled to see messages from the PL/SQL block.
SET SERVEROUTPUT ON;
DECLARE
tables DBMS_UTILITY.UNCL_ARRAY;
schemas DBMS_UTILITY.UNCL_ARRAY;
BEGIN
-- To replicate all tables in a schemas in the single PDB, set source_container_name.
tables(1) := NULL;
schemas(1) := 'schema_name';
DBMS_XSTREAM_ADM.CREATE_OUTBOUND(
server_name => 'XOUT1',
table_names => tables,
schema_names => schemas,
include_ddl => TRUE,
source_container_name => 'YOUR_PDB_NAME'
);
DBMS_OUTPUT.PUT_LINE('XStream Outbound Server created.');
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Error creating XStream Outbound Server: ' || SQLERRM);
RAISE;
END;
/
```
## Set up the XStream Outbound Server Connect User
Set the connect user on the XStream Outbound Server. This ensures that the previously created connect user is associated with the XStream Outbound Server (XOUT1), allowing it to receive change data.
The following example assumes that the connect user is c##connectuser.
```sql
BEGIN
DBMS_XSTREAM_ADM.ALTER_OUTBOUND(
server_name => 'XOUT1',
connect_user => 'c##connectuser');
END;
/
```
## Set up the XStream Outbound Server Capture User
If you want the data to be captured by the same user that created the server (the administrator), skip this section.
If you configured a separate capture user, configure the XStream Outbound Server to run
as this user. This ensures that the dedicated capture user is associated with the XStream Outbound Server (XOUT1), allowing that user to capture change data.
```sql
BEGIN
DBMS_XSTREAM_ADM.ALTER_OUTBOUND(
server_name => 'XOUT1',
capture_user => 'yourcaptureuser');
END;
/
```
## Set up XStream for single-tenant databases
The default architecture for Oracle 12c and later is a multi-tenant architecture with
a Container Database (CDB) and one or more Pluggable Databases (PDB).
If your Oracle database uses a single-tenant architecture, note the following
differences in setting up XStream:
- Do not use `ALTER SESSION SET CONTAINER` commands. In a single-tenant
database, there is only one instance, so container switching doesn't apply.
- Create only one `xstream_adm_tbs` tablespace. Do not create a second
tablespace in a PDB.
- Do not use the `C##` prefix on user names. For example, create
`xstreamadmin` instead of `c##xstreamadmin` and `connectuser` instead
of `c##connectuser`. The `C##` prefix is required only in multi-tenant
environments.
- Do not include `CONTAINER=ALL` or `container => 'ALL'` in any commands.
These clauses grant privileges across multiple containers and don't apply
in a single-tenant database.
## Configure SSL connections (optional)
The %oracleofc% supports encrypted SSL connections to the Oracle database using the TCPS
(TCP with SSL) protocol. When SSL is enabled, both the database connection and the XStream connection use encrypted communication.
To use SSL, you must:
1. [](#label-enable-tcps-on-oracle-database)
2. [](#label-create-client-wallet)
### Enable TCPS on the Oracle database
You must configure the Oracle database to accept connections using the TCPS protocol.
Follow the procedure for your database environment.
#### On-premises / OCI
1. Create an SSL server wallet with the server certificate.
2. Configure the `listener.ora` to include a TCPS endpoint (default port 2484).
3. Configure the `sqlnet.ora` to reference the server wallet.
4. Restart the listener.
For more information, see
[Configuring Transport Layer Security Encryption](https://docs.oracle.com/en/database/oracle/oracle-database/23/dbseg/configuring-transport-layer-security-encryption.html).
#### AWS RDS (Standard)
1. Add the Oracle SSL option to the option group associated with the DB instance.
2. Specify the SSL port (for example, 2484).
For more information, see
[Oracle Secure Sockets Layer](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.Oracle.Options.SSL.html).
### Create a client wallet
After TCPS is enabled on the database, create an Oracle auto-login wallet (`cwallet.sso`)
containing the server's trusted certificate. This wallet is provided to the connector so
that it can verify the server during the SSL handshake.
1. Export the server certificate from the Oracle database server as a PEM file.
2. Use the Oracle `orapki` utility to create a client wallet and import the server certificate:
```bash
orapki wallet create -wallet /path/to/client/wallet -pwd -auto_login
orapki wallet add -wallet /path/to/client/wallet -pwd \
-trusted_cert -cert /path/to/server-cert.pem
```
3. Copy the generated `cwallet.sso` file to a location accessible by the Openflow runtime.
For AWS RDS, download the root certificate from AWS instead of exporting it from the
database server. For more information, see
[Connecting to an RDS for Oracle DB instance using SSL](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.Oracle.Options.SSL.Connecting.html).
For more information, see
[Using the orapki Utility to Manage PKI Elements](https://docs.oracle.com/en/database/oracle/oracle-database/23/dbseg/using-the-orapki-utility-to-manage-pki-elements.html).
## Next steps
[Configure the connector](/user-guide/data-integration/openflow/connectors/oracle/setup-connector).
---
title: Openflow Connector for Oracle: Data mapping
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/data-mapping.md
section: Loading & Unloading Data
---
# %oracleofc%: Data mapping
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
The %oracleofc% is also subject to additional terms of service beyond the standard
connector terms of service. For more information, see the
[Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/).
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/oracle/about)
- [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms)
- [](/user-guide/data-integration/openflow/connectors/oracle/setup-tasks)
This topic describes how Oracle data types are mapped to Snowflake data types when replicating data.
## Oracle to Snowflake data type mapping
The following table shows how Oracle data types are mapped to Snowflake data types
when replicating data.
| Oracle type |
Snowflake type |
Notes |
| NUMBER |
NUMBER |
If precision is undefined, mapped to NUMBER(38, 19). If precision or scale exceeds Snowflake limitations (precision > 38 or scale > 37), the value is stored as TEXT. |
| FLOAT |
FLOAT |
|
| BINARY_FLOAT |
FLOAT |
|
| BINARY_DOUBLE |
FLOAT |
|
| CHAR |
TEXT |
|
| VARCHAR2 |
TEXT |
|
| NCHAR |
TEXT |
|
| NVARCHAR2 |
TEXT |
|
| CLOB |
TEXT |
Supported up to the maximum entry size in Snowflake (16 MB). |
| NCLOB |
TEXT |
Supported up to the maximum entry size in Snowflake (16 MB). |
| LONG |
TEXT |
|
| DATE |
TIMESTAMP_NTZ |
|
| TIMESTAMP |
TIMESTAMP_NTZ |
|
| TIMESTAMP WITH TIME ZONE |
TIMESTAMP_TZ |
|
| TIMESTAMP WITH LOCAL TIME ZONE |
TIMESTAMP_LTZ |
|
| INTERVAL |
TEXT |
|
| INTERVAL YEAR TO MONTH |
TEXT |
|
| INTERVAL DAY TO SECOND |
TEXT |
|
| RAW |
BINARY |
|
| LONG RAW |
BINARY |
|
| BLOB |
BINARY |
Supported up to the maximum entry size in Snowflake (16 MB). |
| BOOLEAN |
BOOLEAN |
|
| JSON |
VARIANT |
Supported up to the maximum entry size in Snowflake (16 MB). |
| XMLTYPE |
TEXT |
|
Any Oracle data types not listed in this table are mapped to TEXT by default.
## Next steps
Review [](/user-guide/data-integration/openflow/connectors/oracle/setup-tasks) to set up the connector.
---
title: Openflow Connector for Oracle: Enable and manage commercial terms
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms.md
section: Loading & Unloading Data
---
# %oracleofc%: Enable and manage commercial terms
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
The %oracleofc% is also subject to additional terms of service beyond the standard
connector terms of service. For more information, see the
[Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/).
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/oracle/about)
- [](/user-guide/data-integration/openflow/connectors/oracle/setup-tasks)
- [](/user-guide/data-integration/openflow/connectors/oracle/setup-connector)
- [](/user-guide/data-integration/openflow/connectors/oracle/maintenance)
This topic describes how to enable the %oracleofc% in the list of available connectors and manage
the licensing lifecycle.
This task must be performed by the organization administrator (ORGADMIN).
Setting up the %oracleofc% is a two-stage process. First, enable Oracle XStream services to make
the connector available for installation. Then, finalize the license configuration after
the connector detects your source database inventory.
## Part 1: Enable service (pre-installation)
By default, the %oracleofc% isn't displayed in the list of available connectors. You must accept the
[Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/)
terms to make it available for installation. This is required for all license models.
1. Sign in to %sf-web-interface-link%.
2. In the navigation menu, select **Admin** %raa% **Terms**.
3. Locate the item **Oracle Connector Terms** in the list.
4. Select **Review & Enable**.
After you complete these steps, the following changes take effect:
- The %oracleofc% listing becomes visible in the list of available connectors.
- A new tab titled **Openflow for Oracle** appears in the **Admin** %raa% **Terms** tab.
## Part 2: License setup and lifecycle
Complete the steps for the license model you selected during configuration:
- [Option A: Embedded license (Snowflake-provided)](#label-oracle-embedded-license-setup)
- [Option B: Independent license / BYOL](#label-oracle-byol-license-setup)
### Option A: Embedded license (Snowflake-provided)
For this licensing model, you must activate the trial to enable the connector.
Even if you install the connector, data replication doesn't start until this step is complete.
#### Step 1: Start the trial (prerequisite)
To start the trial:
1. Sign in to %sf-web-interface-link%.
2. In the navigation menu, select **Admin** %raa% **Terms**.
3. Select **Openflow for Oracle** tab.
4. Locate the **Trial Status** card (status: "Ready to Activate").
5. Select **Start Trial**.
6. Accept the terms to start the 60-day trial period.
This action enables the captureChangeOracle processor, allowing it to connect to
your database.
#### Step 2: Configure connector
After starting the trial, install and configure the connector. For more information,
see [Configure the connector](/user-guide/data-integration/openflow/connectors/oracle/setup-connector).
After the connector successfully connects to the source database, a subscription is
automatically created and displayed in the **Openflow for Oracle** dashboard.
#### Step 3: Verify inventory
1. Sign in to %sf-web-interface-link%.
2. In the navigation menu, select **Admin** %raa% **Terms**.
3. Select the **Openflow for Oracle** tab.
4. Review the **Subscription Inventory** section.
5. Verify that the CPU core count matches your physical source database hardware.
6. If the core count is incorrect, update the runtime configuration.
#### Step 4: Lifecycle management
For more information about the licensing models and terms, see
[Licensing models and critical constraints](#label-oracle-licensing-models).
The following table describes the actions available at each stage of the embedded
license lifecycle.
| Stage |
Action |
Result |
| Trial period (Day 1 to 60) |
Select **Cancel Trial** in the **Openflow for Oracle** dashboard before Day 60. |
Oracle XStream services stop. No charges are incurred. |
| 36-month commitment (Day 61+) |
No action required. If the trial isn't canceled, the non-cancelable 36-month term begins automatically on Day 61. |
The license can't be canceled during this period. If your Snowflake agreement is terminated, the full remaining balance is due immediately. |
| Post-term S&M renewal (after month 36) |
The license fee drops to $0. The annual Support & Maintenance (S&M) fee continues. You may opt out of S&M renewal in the **Openflow for Oracle** dashboard. |
If you opt out and S&M coverage expires, the connector is permanently locked. To resume, you must purchase a new embedded license, which resets the 36-month commitment. |
### Option B: Independent license / BYOL
If you are using the independent license (Bring Your Own License), no prior trial activation
is required.
#### Step 1: Configure the connector
To set up the connector with the independent/BYOL license, follow the steps in
[Configure the connector](/user-guide/data-integration/openflow/connectors/oracle/setup-connector).
#### Step 2: Verify inventory (recommended)
Verify that Snowflake has correctly identified your database inventory.
1. Sign in to %sf-web-interface-link%.
2. In the navigation menu, select **Admin** %raa% **Terms**.
3. Select the **Openflow for Oracle** tab.
4. Review the database inventory details.
The **Start Trial** button doesn't appear for this license model, and the
36-month lifecycle rules don't apply. You are responsible for maintaining a valid
Oracle license that includes XStream entitlements.
---
title: Openflow Connector for Oracle: Maintenance
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/maintenance.md
section: Loading & Unloading Data
---
# %oracleofc%: Maintenance
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
The %oracleofc% is also subject to additional terms of service beyond the standard
connector terms of service. For more information, see the
[Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/).
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/oracle/about)
- [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms)
- [](/user-guide/data-integration/openflow/connectors/oracle/incremental-replication)
This topic describes maintenance tasks for the %oracleofc%, such as reinstalling the
connector or setting the starting redo log position.
These operations are often used in conjunction with [Incremental replication without snapshots](/user-guide/data-integration/openflow/connectors/oracle/incremental-replication).
## Check the replication status of a table
Interim failures, such as connection errors, do not prevent table replication. However,
permanent failures, such as unsupported data types, prevent table replication.
To troubleshoot replication issues or verify that a table has been successfully removed from the replication flow, check the Table State Store:
1. In the Openflow runtime canvas, right-click a processor group and choose **Controller Services**. A table listing controller services displays.
2. Locate the row labeled **Table State Store**, click the **More** %sf-vertical-more-button% button on the right side of the row, and then choose **View State**.
A list of tables and their current states displays. Type in the search box to filter the list by table name. The possible states are:
- **NEW**: The table is scheduled for replication but replication hasn't started.
- **SNAPSHOT_REPLICATION**: The connector is copying existing data. This status displays until all records are stored in the destination table.
- **INCREMENTAL_REPLICATION**: The connector is actively replicating changes. This status displays after snapshot replication ends and continues to display indefinitely until a table is either removed from replication or replication fails.
- **FAILED**: Replication has permanently stopped due to an error.
The Openflow runtime canvas doesn't display table status changes — only the current table status. However, table status changes are recorded in logs when they occur. Look for the following log message:
```text
Replication state for table .. changed from to
```
If a permanent failure prevents table replication, remove the table from replication. After you address the problem that caused the failure, you can add the table back to replication. For more information, see [Restart table replication](#label-of-oracle-restart-table-replication).
## Reinstall the connector
This section provides instructions on how to reinstall the connector, and continue replicating data for
the same tables without having to snapshot them again.
It covers situations where the new connector is installed in the same runtime, as well as moved to a new runtime.
For the connector to continue replicating from the same CDC stream position where it stopped before reinstallation,
the source database must retain the archived redo logs long enough to cover the time after the prior connector was stopped
and before the new connector is started.
Ensure the archived redo log retention period of the Oracle database is high enough, and keep the reinstallation time to a minimum.
Typically a retention period of 24 hours is sufficient, however longer times might be appropriate to ensure time to reinstall.
For more information on configuring archived redo log retention, see [](/user-guide/data-integration/openflow/connectors/oracle/setup-oracledb).
### Prerequisites
Review and note connector parameter context values.
If you're reinstalling the connector in the same runtime, you can reuse the existing context.
If the new instance is located in a different runtime, you must re-enter all parameters.
1. Finish processing all in-flight FlowFiles in the existing connector, then stop the connector.
1. Sign in to %sf-web-interface-link%.
2. In the navigation menu, select **Ingestion** %raa% **Openflow**.
3. Select **Launch Openflow**.
4. In the **Openflow** pane select the **Runtimes** tab.
5. Select the runtime containing the connector.
6. Select the connector.
7. Stop the topmost processor **Set Tables for Replication** in the **Snapshot Load** group.
8. Stop the topmost processor **Read Oracle CDC Stream** in the **Incremental Load** group.
9. If you changed the value of the **Merge Task Schedule CRON** parameter, return it to `* * * * * ?`, otherwise queues won't be emptied until the next scheduled run.
Wait until all FlowFiles in the connector have been processed, and all queues are empty.
When all FlowFiles have been processed, the **Queued** value on the connector's processor group becomes zero.
If any items remain in the original connector's queues, data gaps might occur when the new connector starts.
10. Stop all processors and controller services in the connector.
The existing connector can remain in the runtime and doesn't interfere with the new instance, as long as it remains stopped.
2. Create a new instance of the connector. If you're using the same runtime as the original connector, you can choose to keep the existing parameter contexts and reuse the settings.
3. If you're installing into a different runtime or you deleted the previous parameter contexts, enter the configuration settings into the new parameter contexts,
including the table names and patterns as described in [](/user-guide/data-integration/openflow/connectors/oracle/setup-connector).
4. Navigate to the `Oracle Ingestion Parameters` context, and set the following parameters:
- Set the `Ingestion Type` parameter to `incremental`. For more information on the concerns see [](#label-oracle-incremental-replication).
- Set the `Starting Redo Log Position` parameter to `Earliest`.
For more information and potential concerns see [](#label-oracle-alter-xstream-outbound-server).
5. Start the new connector.
### Usage notes
The new connector uses the existing destination tables that were created by the original connector, but the connector creates new journal tables.
## Alter XStream outbound server
The connector regularly updates the XStream server with the latest SCN position it processed. If the connector
is reinstalled and connects to the same XStream outbound server, it will resume reading from the SCN position where it left off.
This SCN number can be checked with:
```sql
SELECT PROCESSED_LOW_SCN
FROM DBA_XSTREAM_OUTBOUND_PROGRESS
WHERE SERVER_NAME = 'XOUT1';
```
If you want to re-read data from an earlier position, you must first change the start SCN of the XStream server:
```sql
BEGIN
DBMS_XSTREAM_ADM.ALTER_OUTBOUND(
server_name => 'XOUT1',
start_scn =>
);
END;
/
```
The value of `` must be a valid SCN within the range of available redo logs. The lowest SCN that the start position can be reset to can be checked with:
```sql
SELECT REQUIRED_CHECKPOINT_SCN
FROM DBA_CAPTURE
WHERE CLIENT_NAME = 'XOUT1';
```
This is the lowest SCN for which the capture process requires redo information.
## Specify load from XStream position
The %oracleofc% connector allows you to select the starting position where Oracle redo logs are read.
By default the connector reads from the latest available position. Alternatively, you can choose the earliest position available on the source instance.
Choosing to start from the earliest position is common when reinstalling the connector.
This allows the new instance to catch up and continue replicating existing tables without having to snapshot each again.
Switching a running connector from latest to earliest position causes the entire available redo logs
to be re-read, re-processed, and re-applied to the destination table.
While the redo logs are being re-read, the columns and data in affected destination tables
can become out of sync with their sources until all events have been re-processed and merged.
The following parameters are available in the `Ingestion Parameters` context:
| Parameter |
Description |
| Starting XStream Position |
- `Latest` (default): CDC stream reading starts at the latest available position and continues from there.
- `Earliest`: Switches the incremental load to start, or restart reading from the earliest available
XStream position.
|
| Re-read Tables in State |
- `New` (default):
While re-reading the redo logs, only those LCRs (Logical Change Records) will be processed
from new tables added to replication after the re-reading started.
Other LCRs are discarded until the connector reaches the position just before re-reading started.
- `Any active`: Re-read and re-process events from any table currently in replication.
|
To determine whether the connector finished re-reading the redo logs:
1. Navigate to the Openflow canvas.
2. Open the **Incremental Load** process group.
3. Right-click the topmost processor named **Read Oracle CDC Stream**, then select **View state**.
4. Compare the state entries:
- **lcr.position.rewind**: the latest position the processor read before re-reading of the redo logs started.
- **lcr.position.last**: the current latest position read by the processor. As long as this value is lower than the rewind value above, the processor is still re-reading the redo logs.
### Usage notes
- After a running connector is switched to read from the earliest position, and starts running,
the process can't be reconfigured or cancelled, and continues until the currently-read position reaches the position from before it started.
- Switching to the earliest position on a running connector will, for any tables being re-processed,
finish their existing journals, and create new journal tables.
- If the redo log contains events from a previous table that was dropped
and re-created in the source database, the re-reading the stream re-processes all events in the current destination.
The connector can't distinguish between a previous and current source table if they share the same name.
Schema changes (such as ALTER TABLE statements that add or drop columns) aren't supported
while re-reading the redo logs from the earliest position. If any table's schema was
altered between the earliest available SCN and the current position, that table should
be removed from replication and re-added with a fresh snapshot instead.
---
title: Openflow Connector for Oracle: Set up incremental replication without snapshots
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/incremental-replication.md
section: Loading & Unloading Data
---
# %oracleofc%: Set up incremental replication without snapshots
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
The %oracleofc% is also subject to additional terms of service beyond the standard
connector terms of service. For more information, see the
[Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/).
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/oracle/about)
- [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms)
- [](/user-guide/data-integration/openflow/connectors/oracle/setup-connector)
- [](/user-guide/data-integration/openflow/connectors/oracle/maintenance)
This topic describes how to configure the %oracleofc% connector to start replicating incremental changes for newly added tables immediately, bypassing snapshots. This configuration is useful when you reinstall the connector over previously replicated data and want to continue replication without snapshotting every table again.
You can enable incremental replication on either a new or an existing connector instance.
## Enable incremental replication without snapshots on a new connector
To enable incremental replication on a new connector instance:
1. Set up the connector as described in [](/user-guide/data-integration/openflow/connectors/oracle/setup-connector).
2. In the `Oracle Ingestion Parameters` context, set the `Ingestion Type` parameter to `incremental`.
## Enable incremental replication without snapshots on an existing connector
To enable incremental replication on an existing connector:
1. sign in to %sf-web-interface-link%.
2. in the navigation menu, select **Ingestion** %raa% **Openflow**.
3. In the **Openflow** pane select the **Runtimes** tab.
4. Select the runtime containing the connector.
5. Select the connector.
6. In the `Ingestion Parameters` context, specify `Ingestion Type` = `incremental`.
7. Add new replication tables. These tables immediately switch to their incremental load.
To return to replicating tables with the snapshot load, change **Ingestion Type** from `incremental` to `full`.
# Usage notes
- Changing the value of **Ingestion Type** does not impact any tables that have begun replicating data.
Tables currently in the snapshot phase continue until the snapshot load is complete.
- While **Ingestion Type** is set to `incremental`, new tables added to the list of replicated tables bypass the snapshot phase.
This includes new tables added to the source database that match the `Included Table Regex` parameter.
Ensure that the ingestion type is set to `incremental` to bypass the snapshot phase.
Connectors should only remain in `incremental` mode as long as required as it bypasses snapshots.
Once customer needs for incremental updates have been satisfied the connector should be returned to `full` mode.
- For tables that bypass snapshot load, the connector creates a destination table in Snowflake,
by executing `CREATE TABLE IF NOT EXISTS`, only if no destination table already exists.
Tables going through the snapshot require that no destination table exist.
---
title: Openflow Connector for Oracle: Set up Snowflake
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/setup-snowflake.md
section: Loading & Unloading Data
---
# %oracleofc%: Set up Snowflake
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
The %oracleofc% is also subject to additional terms of service beyond the standard
connector terms of service. For more information, see the
[Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/).
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/oracle/about)
- [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms)
- [](/user-guide/data-integration/openflow/connectors/oracle/setup-oracledb)
- [](/user-guide/data-integration/openflow/connectors/oracle/setup-connector)
This topic describes how to set up your Snowflake environment for the
%oracleofc%.
As a Snowflake administrator, perform the following tasks:
1. Create a destination database in Snowflake to store the replicated data:
```sql
CREATE DATABASE ;
```
2. Create a Snowflake [service user](#label-user-type-property):
```sql
CREATE USER
TYPE = SERVICE
COMMENT='Service user for automated access of Openflow';
```
3. Create a Snowflake role for the connector and grant the required
privileges:
```sql
CREATE ROLE ;
GRANT ROLE TO USER ;
GRANT USAGE ON DATABASE TO ROLE ;
GRANT CREATE SCHEMA ON DATABASE
TO ROLE ;
```
Use this role to manage the connector's access to the Snowflake database.
To create objects in the destination database, you must grant the
[USAGE and CREATE SCHEMA privileges](#label-database-privileges)
on the database to the role used to manage access.
4. Create a Snowflake warehouse for the connector and grant the required
privileges:
```sql
CREATE WAREHOUSE WITH
WAREHOUSE_SIZE = 'XSMALL'
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE;
GRANT USAGE, OPERATE ON WAREHOUSE
TO ROLE ;
```
Snowflake recommends starting with a XSMALL warehouse size, then
experimenting with size depending on the number of tables being
replicated and the amount of data transferred. Large numbers of tables
typically scale better with multi-cluster warehouses, rather than a
larger warehouse size. For more information, see
[multi-cluster warehouses](/user-guide/warehouses-multicluster).
5. Set up the public and private keys for key pair authentication:
1. Create a pair of secure keys (public and private).
2. Store the private key for the user in a file to supply to the
connector's configuration.
3. Assign the public key to the Snowflake service user:
```sql
ALTER USER SET RSA_PUBLIC_KEY = 'thekey';
```
For more information, see [](/user-guide/key-pair-auth).
## Next steps
[Configure the connector](/user-guide/data-integration/openflow/connectors/oracle/setup-connector).
---
title: Openflow Connector for PostgreSQL Maintenance
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/postgres/maintenance.md
section: Loading & Unloading Data
---
# %postgresql% Maintenance
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/postgres/setup)
- [](/user-guide/data-integration/openflow/connectors/postgres/data-mapping)
This topic describes important maintenance considerations and best practices for
maintaining the %postgresql% when making changes to the source PostgreSQL database.
In addition, this topic describes how to restart table replication and reinstall the connector.
## Check the replication status of a table
Interim failures, such as connection errors, do not prevent table replication. However,
permanent failures, such as unsupported data types, prevent table replication.
To troubleshoot replication issues or verify that a table has been successfully removed from the replication flow, check the Table State Store:
1. In the Openflow runtime canvas, right-click a processor group and choose **Controller Services**. A table listing controller services displays.
2. Locate the row labeled **Table State Store**, click the **More** %sf-vertical-more-button% button on the right side of the row, and then choose **View State**.
A list of tables and their current states displays. Type in the search box to filter the list by table name. The possible states are:
- **NEW**: The table is scheduled for replication but replication hasn't started.
- **SNAPSHOT_REPLICATION**: The connector is copying existing data. This status displays until all records are stored in the destination table.
- **INCREMENTAL_REPLICATION**: The connector is actively replicating changes. This status displays after snapshot replication ends and continues to display indefinitely until a table is either removed from replication or replication fails.
- **FAILED**: Replication has permanently stopped due to an error.
The Openflow runtime canvas doesn't display table status changes — only the current table status. However, table status changes are recorded in logs when they occur. Look for the following log message:
```text
Replication state for table .. changed from to
```
If a permanent failure prevents table replication, remove the table from replication. After you address the problem that caused the failure, you can add the table back to replication. For more information, see [Restart table replication](#label-of-postgres-restart-table-replication).
## Restart table replication
A table in FAILED state — for example, due to a missing primary key or unsupported schema change — does not restart automatically. If a table enters a FAILED state or you need to restart replication from scratch, use the following procedure to remove and re-add the table to replication.
If the failure was caused by an issue in the source table such as a missing primary key, resolve that issue in the source database before continuing.
1. Remove the table from replication, using one of the following methods:
- Add the table to the **Re-snapshot Table Exclusions** parameter to temporarily exclude it from replication. This is convenient when the table is matched by an **Included Table Regex** that you don't want to change.
- In the Ingestion Parameters context, either remove the table from **Included Table Names** or modify the **Included Table Regex** so the table is no longer matched.
2. Verify the table has been removed:
1. In the Openflow runtime canvas, right-click a processor group and choose **Controller Services**.
2. In the table listing controller services, locate the **Table State Store** row, click the three vertical dots on the right side of the row, then choose **View State**.
You must wait until the table's state is fully removed from this list before proceeding. Do not continue until this configuration change has completed.
3. Clean up the destination: Once the table's state shows as fully removed, manually [DROP](/sql-reference/sql/drop-table) the destination table in Snowflake. Note that the connector will not overwrite an existing destination table during the snapshot phase; if the table still exists, replication will fail again. Optionally, the journal table and stream can also be removed if they are no longer needed.
4. Re-add the table by reversing the change you made in the first step: either remove the table from **Re-snapshot Table Exclusions**, or add it back to **Included Table Names** or **Included Table Regex**. The connector then re-snapshots the table.
5. Verify the restart: Check the **Table State Store** using the instructions given previously. The state of the table should appear with the status NEW, then transition to SNAPSHOT_REPLICATION, and finally INCREMENTAL_REPLICATION.
## Upgrading PostgreSQL
Upgrading the connector requires a different approach depending on whether PostgreSQL is being upgraded to the next minor or major version.
Minor version upgrades
- Are data safe.
- Require no special treatment.
- Require stopping the connector for the duration of the upgrade to avoid reporting connectivity issues.
- Continue replicating, after the upgrade, with no data loss.
Major version upgrades
- Require the PostgreSQL server to drop replication slots, including any used by the connector.
- Cannot preserve, or migrate replication slots to the new version. See also [](#label-postgres-upgrade-note).
- Restart replicating all tables from the prior snapshot phase.
To perform a minor version upgrade, do the following:
1. Stop the connector, including all Processors and Controller Services.
2. Upgrade PostgreSQL.
3. Restart the connector.
To perform a major version upgrade, do the following:
1. Remove all tables from replication in the connector by clearing the **Included Table Names** and **Included Table Regex** parameters.
2. Wait until all queues in the connector are empty.
3. Remove the destination tables, by dropping them or renaming.
4. Stop the connector, including all Processors and Controller Services.
5. Open the **Incremental Load** group in the connector.
6. Right-click the top Processor in the group, **Read PostgreSQL CDC Stream**, and select **View state**.
7. Click **Clear state**.
8. Click **Close**.
9. Upgrade PostgreSQL.
10. Restart the connector. A new replication slot will be created.
11. Re-add all tables to begin replication.
### PostgresSQL 17 and later versions upgrades
PostgreSQL 17 improved upgrading such that it no longer requires dropping replication slots when upgrading to later versions such as 17.1 %ra% 18.0.
Upgrading to PostgreSQL 17.0 or later from prior versions (16 and earlier) drops replications slots and should be treated as a major upgrade.
Future versions of PostgreSQL may also improve the upgrade process further.
## Reinstall the connector
This section describes how to reinstall the connector.
It covers situations where the new connector is installed in the same runtime, or when it is moved to a new runtime.
Reinstall is often used in conjunction with [Incremental replication without snapshots](/user-guide/data-integration/openflow/connectors/postgres/incremental-replication).
For the connector to be able to continue replicating from the same CDC stream position where it stopped before reinstallation,
the source database must retain the WAL long enough to cover the time since the old connector is stopped and the new connector is started.
Ensure the `max_wal_size` parameter of the PostgreSQL server is high enough, depending on your traffic, and keep the reinstallation time to a minimum.
### Prerequisites
Review and note connector parameter context values.
If you're reinstalling the connector in the same runtime, you can reuse the existing context.
If the new instance will be located in a different runtime, you will have to re-enter all parameters.
To reinstall the connector:
1. Finish processing all in-flight FlowFiles in the existing connector, and then stop the connector.
1. Sign in to %sf-web-interface-link%.
2. In the navigation menu, select **Ingestion** %raa% **Openflow**.
3. Select **Launch Openflow**.
4. In the **Openflow** pane select the **Runtimes** tab.
5. Select the runtime containing the connector.
6. Select the connector.
7. Stop the topmost processor **Set Tables for Replication** in the **Snapshot Load** group.
8. Stop the topmost processor **Read PostgreSQL CDC Stream** in the **Incremental Load** group.
9. If you changed the value of the **Merge Task Schedule CRON** parameter, return it to `* * * * * ?`, otherwise queues won't be emptied until the next scheduled run.
Wait until all FlowFiles in the connector have been processed, and all queues are empty.
When all FlowFiles have been processed, the **Queued** value on the connector's processor group becomes zero.
If there are any items left in the original connector's queues, there may be data gaps when the new connector starts.
10. Stop all processors and controller services in the connector.
2. Find and copy the name of the replication slot used by the original connector,
by viewing the state of the topmost processor in the `Incremental Load` group with name `Read PostgreSQL CDC Stream`.
The replication slot name is stored under the key `replication.slot.name`.
Copy the value of the key to a text editor.
3. Create a new instance of the connector. If you're using the same runtime as the original connector, you can choose to keep the existing parameter contexts, and reuse the settings.
The existing connector can remain in the runtime and doesn't interfere with the new instance, as long as it remains stopped.
4. If you're installing into a different runtime, or you deleted the previous parameter contexts, enter all the configuration settings into the new parameter contexts,
including the table names and patterns as described in [](/user-guide/data-integration/openflow/connectors/postgres/setup).
5. Open the `PostgreSQL Ingestion Parameters` context, and set `Ingestion Type` parameter to `incremental`.
For more information on the concerns see [](#label-postgres-incremental-replication).
6. Open the `PostgreSQL Source Parameters` context, and set the `Replication Slot Name` parameter to the value you copied earlier.
7. Start the new connector.
### Usage notes
The new connector will use the same, existing destination tables that created by the original connector, but will create new journal tables.
---
title: Openflow Connector for PostgreSQL: Data mapping
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/postgres/data-mapping.md
section: Loading & Unloading Data
---
# %postgresql%: Data mapping
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/connectors/postgres/about)
- [](/user-guide/data-integration/openflow/connectors/postgres/setup)
This topic describes how PostgreSQL data types are mapped
to Snowflake data types.
## PostgreSQL to Snowflake data type mapping
The following table shows how PostgreSQL data types are mapped to Snowflake data types
when replicating data.
| PostgreSQL type |
Snowflake type |
Notes |
| SMALLINT / INT2 |
INT |
|
| INTEGER / INT / INT4 |
INT |
|
| BIGINT / INT8 |
INT |
|
| SMALLSERIAL / SERIAL2 |
INT |
|
| SERIAL / SERIAL4 |
INT |
|
| BIGSERIAL / SERIAL8 |
INT |
|
| NUMERIC / DECIMAL |
NUMBER |
Scale and precision are preserved within Snowflake limitations. Negative scale is converted to scale 0 with adjusted precision. |
| REAL / FLOAT4 |
FLOAT |
|
| DOUBLE PRECISION / FLOAT8 |
FLOAT |
|
| MONEY |
FLOAT |
|
| BOOLEAN / BOOL |
BOOLEAN |
|
| CHARACTER / CHAR / BPCHAR |
TEXT |
|
| CHARACTER VARYING / VARCHAR |
TEXT |
|
| TEXT |
TEXT |
|
| BYTEA |
BINARY |
Supported up to the maximum entry size in Snowflake (16 MB). |
| DATE |
DATE |
|
| TIME / TIME WITHOUT TIME ZONE |
TIME |
|
| TIME WITH TIME ZONE / TIMETZ |
TIMESTAMP_TZ |
|
| TIMESTAMP / TIMESTAMP WITHOUT TIME ZONE |
TIMESTAMP_NTZ |
|
| TIMESTAMP WITH TIME ZONE / TIMESTAMPTZ |
TIMESTAMP_LTZ |
|
| INTERVAL |
TEXT |
|
| JSON |
VARIANT |
Supported up to the maximum entry size in Snowflake (16 MB). |
| JSONB |
VARIANT |
Supported up to the maximum entry size in Snowflake (16 MB). |
| UUID |
TEXT |
|
| XML |
TEXT |
|
| BIT |
TEXT |
|
| BIT VARYING / VARBIT |
TEXT |
|
| POINT |
TEXT |
|
| LINE |
TEXT |
|
| LSEG |
TEXT |
|
| BOX |
TEXT |
|
| PATH |
TEXT |
|
| POLYGON |
TEXT |
|
| CIRCLE |
TEXT |
|
| CIDR |
TEXT |
|
| INET |
TEXT |
|
| MACADDR |
TEXT |
|
| MACADDR8 |
TEXT |
|
| TSVECTOR |
TEXT |
|
| TSQUERY |
TEXT |
|
| PG_LSN |
TEXT |
|
Any PostgreSQL data types not listed in this table are mapped to TEXT by default.
---
title: Openflow Connector for PostgreSQL: Set up incremental replication without snapshots
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/postgres/incremental-replication.md
section: Loading & Unloading Data
---
# %postgresql%: Set up incremental replication without snapshots
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/postgres/setup)
- [](/user-guide/data-integration/openflow/connectors/postgres/data-mapping)
You can configure the %postgresql% connector to immediately replicate incremental changes for newly added tables, bypassing snapshots. Use incremental load to continue replication without snapshotting every table again when you reinstall the connector over previously replicated data.
To enable incremental replication in a new connector instance:
1. Set up the connector as described in [](/user-guide/data-integration/openflow/connectors/postgres/setup).
2. In the `PostgreSQL Ingestion Parameters` context, set the `Ingestion Type` parameter to `incremental`.
## Enable incremental replication without snapshots
To enable incremental replication on an existing connector:
1. sign in to %sf-web-interface-link%.
2. in the navigation menu, select **Ingestion** %raa% **Openflow**.
3. In the **Openflow** pane select the **Runtimes** tab.
4. Select the runtime containing the connector.
5. Select the connector.
6. In the `Ingestion Parameters` context, specify `Ingestion Type` = `incremental`.
7. Add new replication tables. These tables immediately switch to their incremental load.
To return to replicating tables with the snapshot load, change **Ingestion Type** from `incremental` to `full`.
# Usage notes
- Changing the value of **Ingestion Type** does not impact any tables that have begun replicating data.
Tables currently in the snapshot phase continue until the snapshot load is complete.
- While **Ingestion Type** is set to `incremental`, new tables added to the list of replicated tables bypass the snapshot phase.
This includes new tables added to the source database that match the `Included Table Regex` parameter.
Ensure that the ingestion type is set to `incremental` to bypass the snapshot phase.
Connectors should only remain in `incremental` mode as long as required as it bypasses snapshots.
Once customer needs for incremental updates have been satisfied the connector should be returned to `full` mode.
- For tables that bypass snapshot load, the connector creates a destination table in Snowflake,
by executing `CREATE TABLE IF NOT EXISTS`, only if no destination table already exists.
Tables going through the snapshot require that no destination table exist.
---
title: Openflow Connector for Salesforce Bulk API: Configure the connector
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/configure-connector.md
section: Loading & Unloading Data
---
# %salesforcebulkapiof%: Configure the connector
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about)
- [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-snowflake)
- [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce)
- [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/formula-fields)
This topic describes the steps to configure the %salesforcebulkapiof%.
## Install the connector
Follow these steps to install the %salesforcebulkapiof% in an Openflow runtime:
1. Navigate to the Openflow **Overview** page. In the **Featured connectors** section, select **View more connectors**.
2. On the Openflow connectors page, find **Openflow connector for Salesforce Bulk API** and select **Add to runtime**.
3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down.
The Openflow canvas appears with the connector process group added to it.
## Configure the connector
To configure the connector, perform the following steps:
1. Right-click on the imported process group and select **Parameters**.
2. Populate the required parameter values as described in the table below.
| Parameter |
Description |
| Column Removal Strategy |
Defines the strategy to adopt when a column should be removed in the destination table based on the latest received schema. Three possible values: `Drop Column`, `Rename Column`, `Ignore Column`.
- `Drop Column`: Drop the column from the Snowflake table.
- `Rename Column`: Rename the column in the Snowflake table.
- `Ignore Column`: Ignore the column, leaving it as is in the Snowflake table.
|
| Connected App Key |
The private key used for JWT Bearer Flow authentication with Salesforce. Copy-paste the content of the `private.key` file generated during the [Salesforce setup](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce). This private key must correspond to the public certificate (`public.crt`) uploaded to the external client app in Salesforce. You can also use the next parameter to upload the private key file instead. |
| Connected App Key File |
Upload the `private.key` file by selecting the **Reference asset** checkbox, then upload the file as an asset and select the asset as the value for the parameter. This is an alternative to pasting the key content in the **Connected App Key** parameter. |
| Connected App Key Password |
Password set on the private key file during the [Salesforce setup](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce) steps. |
| Destination Database |
Name of the database in Snowflake where the Salesforce data will be replicated. The database must exist before starting the connector. |
| Destination Schema |
Name of the schema, in the database above, into which the connector will create tables for the Salesforce data to be added. The schema must exist before starting the connector. |
| Enable Journal Tables |
If set to `true`, a `JOURNAL_` table is created for each synced object that has a `SystemModstamp` or `LastModifiedDate` field. All changes are appended to the journal table, providing a full history of modifications. This is in addition to the main table that contains the merged data for the object. If a full reload occurs for a given object type, its journal table is also recreated. Default: `false`. |
| Enable Views Creation |
If set to `true`, a view named `_FORMULA_VW` is created for each synced object that contains formula fields. The view translates supported Salesforce formula expressions into Snowflake SQL, allowing you to query formula results directly without replicating formula field values from Salesforce. See [](#salesforce-formula-fields) for details. Default: `false`. |
| Filter |
Comma-separated list of objects to replicate from Salesforce, or regular expression to apply against all existing objects. The filter is case-insensitive, meaning that a filter set to `account` would match the object type `Account`. Example: `Account, Opportunity, Contact`.
If left empty, all objects will be replicated. This is not recommended as there are usually thousands of objects in a Salesforce instance.
|
| Incremental Offload |
Whether the processor should perform incremental offload. If `true`, the processor will only fetch the records that have been modified since the last query job submission by using a `WHERE` clause on the appropriate timestamp field. If `false`, all records will be fetched at every execution of the connector. |
| Initial Load Chunking |
If set to a value other than `NONE`, the initial data load will be split into multiple jobs based on this interval. On the first run for an object, the connector will query Salesforce to find the oldest record and use that as the starting point. Each subsequent job will query the next time chunk until caught up to the current time. Should be set with one of: `NONE`, `MONTHLY`, `QUARTERLY`, `YEARLY`.
This is useful for large datasets where loading all historical data in a single query may time out, exceed API limits, or exceed the storage size of the content repository of the runtime. After catching up, the processor continues with normal incremental offload behavior.
|
| OAuth2 Audience |
Audience to set in the JWT token. Set to `https://login.salesforce.com` for production environments or `https://test.salesforce.com` for sandboxes and test environments. |
| OAuth2 Client ID |
Should be set to the **Consumer Key** value retrieved during the Salesforce Setup steps. |
| OAuth2 Subject |
Should be set to the username of an admin-approved user for the application to interact with Salesforce APIs on behalf of this user. |
| OAuth2 Token Endpoint URL |
Endpoint to negotiate tokens via the JWT Bearer Flow. Example: `https://myCompany.my.salesforce.com/services/oauth2/token`. |
| Object Fields Filter JSON |
A JSON specifying which fields and field patterns should be included or excluded, per Salesforce object. Takes the form of an array with one item per object.
Example 1: This will include all fields that end with 'name' in the 'Account' Salesforce object:
`[ {"objectType":"Account", "includedPattern":".*name"} ]`
Example 2: This will include the fields Id, Name, and Revenue in the 'Account' Salesforce object:
`[ {"objectType":"Account", "included": ["Id", "Name", "Revenue"]} ]`
`excluded` and `excludedPattern` are also available for configuring the filters.
|
| Object Identifier Resolution |
Determines if schema / table / column names are treated as case-sensitive or case-insensitive. One of: `CASE_INSENSITIVE` / `CASE_SENSITIVE`.
Changing this parameter value will require clearing the state and doing a full reload of all objects.
|
| Removed Column Name Suffix |
Suffix added to the column name when the parameter **Column Removal Strategy** is set to `Rename Column`. Default: `__deleted`. |
| Run Schedule |
Frequency at which the connector will check for updates in Salesforce for configured objects via the **Filter** parameter. Default: `15 minutes`. |
| Salesforce Instance |
Hostname of the Salesforce instance including the domain name. Do not include the protocol prefix (`https://`). For example, use `myCompany.my.salesforce.com`. |
| Snowflake Account Identifier |
Snowflake account name formatted as `[organization-name]-[account-name]` where data will be persisted. Example: `PM-CONNECTORS`. |
| Snowflake Username |
The name of the service user that the connector uses to connect to Snowflake. The service user is required only when using the `KEY_PAIR` authentication strategy (Openflow BYOC only). |
| Snowflake Private Key |
The RSA Private Key that the connector uses for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line starts with `-----BEGIN PRIVATE`. This is required only when using the `KEY_PAIR` authentication strategy (Openflow BYOC only).
You may also use the next parameter to upload the private key to the Openflow runtime instead.
|
| Snowflake Private Key File |
The file containing the RSA Private Key that the connector uses for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line starts with `-----BEGIN PRIVATE`. Required only when using the `KEY_PAIR` authentication strategy (Openflow BYOC only).
Select the **Reference asset** checkbox to upload the private key file and store it securely in the Openflow runtime.
|
| Snowflake Private Key Password |
The password associated with the Snowflake Private Key File (if encrypted). This is required only when using the `KEY_PAIR` authentication strategy (Openflow BYOC only). |
| Snowflake Role |
Name of the Snowflake role used during query execution. When using `SNOWFLAKE_MANAGED`, this is the Snowflake Role for Openflow Runtimes. When using `KEY_PAIR` (Openflow BYOC only), this is the role assigned to the specified Snowflake username. |
| Snowflake Authentication Strategy |
Authentication strategy for the connector to connect to Snowflake.
Using `SNOWFLAKE_MANAGED` (default) uses the Snowflake managed token associated with the specified Snowflake Runtime Role. If using Openflow BYOC, you can also use `KEY_PAIR` to specify a specific user and role via a custom Key Pair.
|
| Snowflake Warehouse |
The Snowflake warehouse used to run queries. |
| Special Objects Filter |
Comma-separated list of objects to offload from Salesforce (using direct API access), or regular expression to apply against all existing objects. The filter is case-insensitive, meaning that a filter set to `account` would match the object type `Account`.
This filter should only be used for objects that are **not** supported by the Salesforce Bulk API such as knowledge data, for example. This parameter should not overlap with the parameter **Filter**.
Example: `Knowledge.*`
|
## Verify the Salesforce connection
Before enabling and starting the connector, Snowflake recommends verifying that the Salesforce authentication is properly configured. The **Verification** feature on controller services lets you test the connection without starting the full connector flow.
The **JWT Bearer OAuth2 Access Token Provider** controller service depends on two other controller services that must be enabled first: the **Salesforce Private Key Service** and the **Web Client Service Provider**.
1. Double-click the connector process group to open it.
2. Right-click on an empty area of the canvas and select **Controller Services**.
3. Enable the **Salesforce Private Key Service** and the **Web Client Service Provider** services.
4. Locate the **JWT Bearer OAuth2 Access Token Provider** service in the list.
5. Click the **Verification** button for the service. A dialog opens where you can provide property overrides. You can ignore this and click **Verify** directly.
6. If everything is configured properly, the **Acquire token** step shows a green checkmark indicating success. This confirms the connector can authenticate with Salesforce and obtain an access token. You can proceed to the next step to run the connector.
7. If verification fails, review the error message and check the following:
- The **OAuth2 Client ID** parameter matches the **Consumer Key** from the external client app in Salesforce.
- The private key corresponds to the certificate uploaded to the external client app.
- The **OAuth2 Subject** user is authorized for the external client app (see [](#salesforce-approve-client-app)).
- The **OAuth2 Token Endpoint URL** uses the correct Salesforce instance hostname.
- The **OAuth2 Audience** is set to the correct value: `https://login.salesforce.com` for production or `https://test.salesforce.com` for sandboxes.
For detailed troubleshooting, see [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/troubleshoot).
## Run the connector
Follow these steps to start the connector and begin replicating data from Salesforce to Snowflake:
1. Right-click on an empty area in the canvas and select **Enable all Controller Services**.
2. Right-click on the connector process group and select **Start**.
## Manage object replication
After the connector has been started and objects have been replicated, you can add new objects or remove existing objects from replication.
### Add new objects to replication
To add a new object to replication, update the **Filter** parameter (or **Special Objects Filter** parameter, if applicable) with the new object names. You do not need to stop the connector. The new object is replicated at the next scheduled execution.
For example, if the current **Filter** value is `Account, Opportunity` and you want to add the `Contact` object, change the value to `Account, Opportunity, Contact`.
### Remove objects from replication
Removing an object from replication requires stopping the connector and cleaning up both the connector state and the destination table in Snowflake:
1. Stop all processors in the flow by right-clicking on the connector process group and selecting **Stop**.
2. Ensure that no in-flight FlowFiles are being processed.
3. Right-click on the canvas and select **Parameters**, then remove the object name from the **Filter** parameter (or the **Special Objects Filter** parameter, if applicable).
4. Right-click on the canvas and select **Disable all controller services**.
5. Go to **Controller services** and open the state of the controller service named **Salesforce Bulk Jobs State**.
6. Select the trash icon next to the object type you removed to delete its state entry.
7. Right-click on the canvas and select **Enable all controller services**, then start all processors to resume the connector.
8. If applicable, drop the corresponding table from the Snowflake destination database to clean up the previously replicated data. For example:
```sql
DROP TABLE ..;
```
## Next steps
- To monitor and troubleshoot the connector, see [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/troubleshoot).
---
title: Openflow Connector for Salesforce Bulk API: Salesforce formula fields
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/formula-fields.md
section: Loading & Unloading Data
---
# %salesforcebulkapiof%: Salesforce formula fields
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about)
- [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/configure-connector)
This topic describes how the %salesforcebulkapiof% translates Salesforce formula
fields into Snowflake SQL views, including supported functions and limitations.
## How formula views work
When **Enable Views Creation** is set to `true`, the connector performs the following for each object that has formula fields:
1. Retrieves the formula expressions from the Salesforce object metadata via the Describe API.
2. Parses each formula expression and translates it into equivalent Snowflake SQL.
3. Generates a `CREATE OR REPLACE VIEW` statement that combines non-formula columns from the base table with the translated formula expressions as computed columns.
4. Runs the DDL against Snowflake to create or update the view.
The resulting view is named `_FORMULA_VW`. For example, the `Account` object produces a view named `ACCOUNT_FORMULA_VW`. You can query this view to obtain formula field values alongside the replicated data.
The view is automatically updated whenever the connector detects schema changes in the source object, ensuring that formula definitions stay in sync with Salesforce.
## Cross-object formula fields
Salesforce formulas can reference fields from related objects using relationship traversal (for example, `Account.Owner.Name`). The connector supports these cross-object references by generating `LEFT JOIN` clauses in the view definition. Each relationship traversal produces a join to the corresponding related table in Snowflake.
For cross-object formulas to work correctly, the related objects must also be replicated by the connector. The connector does not check whether the referenced tables exist in Snowflake at translation time. If a related object is not being synced, the generated `CREATE OR REPLACE VIEW` statement references a table that does not exist in Snowflake, and the view creation fails. To resolve this, ensure that all related objects referenced by formula fields are included in the **Filter** parameter. The view is automatically recreated on the next connector run after the referenced tables exist.
## Formula view column comments
Each formula column in the generated view includes a SQL `COMMENT` annotation:
- For successfully translated formulas, the comment contains the original Salesforce formula expression.
- For formulas that could not be translated, the comment contains the failure reason code.
You can inspect these comments by running `DESCRIBE VIEW ` in Snowflake.
## Supported formula functions
The following Salesforce formula functions are translated into equivalent Snowflake SQL:
| Category |
Salesforce function |
Snowflake equivalent |
| Logical |
`IF` |
`CASE WHEN ... THEN ... ELSE ... END` |
| Logical |
`CASE` |
`CASE ... WHEN ... THEN ... ELSE ... END` |
| Logical |
`AND` / `OR` / `NOT` |
`AND` / `OR` / `NOT` |
| Null handling |
`ISBLANK` |
`LENGTH(COALESCE(expr, '')) = 0` |
| Null handling |
`ISNULL` |
`expr IS NULL` |
| Null handling |
`NULLVALUE` |
`COALESCE` |
| Null handling |
`BLANKVALUE` |
`CASE WHEN ... IS NULL OR LENGTH(...) = 0 THEN ... END` |
| Text |
`LEFT` |
`LEFT` |
| Text |
`RIGHT` |
`RIGHT` |
| Text |
`MID` |
`SUBSTR` |
| Text |
`LEN` |
`LENGTH` |
| Text |
`SUBSTITUTE` |
`REPLACE` |
| Text |
`TRIM` |
`TRIM` |
| Text |
`UPPER` |
`UPPER` |
| Text |
`LOWER` |
`LOWER` |
| Text |
`CONTAINS` |
`CONTAINS` |
| Text |
`BEGINS` |
`STARTSWITH` |
| Text |
`FIND` |
`CHARINDEX` |
| Text |
`LPAD` |
`LPAD` |
| Text |
`RPAD` |
`RPAD` |
| Text |
`BR` |
Newline character literal |
| Conversion |
`TEXT` |
`CAST(... AS STRING)` |
| Conversion |
`VALUE` |
`TRY_CAST(... AS NUMBER)` |
| Math |
`ABS` |
`ABS` |
| Math |
`ROUND` |
`ROUND` |
| Math |
`CEILING` |
`CEIL` |
| Math |
`FLOOR` |
`FLOOR` |
| Math |
`MOD` |
`MOD` |
| Math |
`SQRT` |
`SQRT` |
| Math |
`MAX` |
`GREATEST` |
| Math |
`MIN` |
`LEAST` |
| Math |
`LOG` |
`LOG(10, ...)` |
| Math |
`EXP` |
`EXP` |
| Math |
`LN` |
`LN` |
| Date and time |
`NOW` |
`CURRENT_TIMESTAMP()` |
| Date and time |
`TODAY` |
`CURRENT_DATE()` |
| Date and time |
`YEAR` |
`YEAR` |
| Date and time |
`MONTH` |
`MONTH` |
| Date and time |
`DAY` |
`DAY` |
| Date and time |
`DATEVALUE` |
`TO_DATE` |
| Date and time |
`DATETIMEVALUE` |
`TO_TIMESTAMP` |
| Date and time |
`ADDMONTHS` |
`DATEADD(MONTH, ...)` |
| Picklist |
`ISPICKVAL` |
`COALESCE(field, '') = COALESCE(value, '')` |
In addition to functions, the following operators are supported:
- Arithmetic: `+`, `-`, `*`, `/`, `^` (exponentiation, translated to `POWER`)
- Comparison: `=`, `==`, `!=`, `<>`, `<`, `<=`, `>`, `>=`
- Logical: `AND`, `OR`, `&&`, `||`
- String concatenation: `&` (translated to `||` with `COALESCE` null handling)
- Unary: `-` (negation), `NOT`
## Unsupported formula constructs
The following formula constructs are not yet supported. Support for additional functions and constructs will be added in future releases. When a formula uses any of these, the corresponding column in the view returns `NULL` and the column comment indicates the failure reason.
| Failure reason |
Description |
| `FUNCTION_NOT_SUPPORTED` |
The formula uses a function that has no Snowflake equivalent or that is specific to the Salesforce UI. This includes: `IMAGE`, `HYPERLINK`, `URLFOR`, `HTMLENCODE`, `JSENCODE`, `LINKTO`, `GEOLOCATION`, `DISTANCE`, `VLOOKUP`, `REGEX`, `PREDICT`, `GETSESSIONID`, `GETRECORDIDS`, `REQUIRESCRIPT`, `ISCHANGED`, `ISNEW`, `ISCLONE`, `PRIORVALUE`. |
| `GLOBAL_VARIABLE_NOT_SUPPORTED` |
The formula references a Salesforce global variable such as `$User.Name`, `$Organization.Name`, or `$Profile.Name`. These variables have no equivalent in Snowflake. |
| `FORMULA_CHAIN_NOT_SUPPORTED` |
The formula references another formula field. Chained formula references (a formula field that depends on another formula field) are not supported. |
| `ROLLUP_NOT_SUPPORTED` |
The field is a rollup summary field rather than a formula field. Rollup summaries aggregate data from child records and cannot be expressed as a simple SQL view. |
| `LOOKUP_NOT_SYNCED` |
The formula references a relationship that cannot be resolved from the Salesforce object metadata. This typically occurs when the relationship name in the formula does not match any known relationship on the object. |
| `ID_FORMAT_MISMATCH` |
The formula contains a hardcoded 15-character Salesforce ID. Salesforce uses 15-character IDs internally, but the Bulk API returns 18-character IDs. Formulas with hardcoded 15-character IDs cannot be reliably translated. |
| `COMPOUND_FIELD_REFERENCE` |
The formula references a compound field (such as `MailingAddress`) that is not stored as a single column in Snowflake. |
| `PARSE_ERROR` |
The formula expression could not be parsed. This might indicate a syntax that the connector does not yet recognize. |
| `UNSUPPORTED_SYNTAX` |
The formula uses a syntax construct that is recognized but cannot be translated (for example, an `IF` function with fewer than three arguments). |
---
title: Openflow Connector for Salesforce Bulk API: Set up Salesforce
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce.md
section: Loading & Unloading Data
---
# %salesforcebulkapiof%: Set up Salesforce
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about)
- [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-snowflake)
- [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/configure-connector)
This topic describes the steps to set up Salesforce for the %salesforcebulkapiof%.
The connector authenticates with Salesforce using the OAuth 2.0 JWT Bearer Flow. This requires creating a certificate key pair, configuring an external client app in Salesforce, and authorizing a user to use the app.
Salesforce has deprecated Connected Apps in favor of External Client Apps. If you have an existing Connected App, Snowflake recommends creating a new External Client App instead.
## Create certificates
You need a private key and public certificate to configure the external client app in Salesforce. The private key is used by the connector to sign JWT tokens, and the public certificate is uploaded to the external client app in Salesforce so that Salesforce can verify the signature.
1. Generate the private key. You are asked for a password to secure the private key.
```bash
openssl genpkey -algorithm RSA -out private.key -aes256
```
Record the password. You need it when configuring the connector parameters in Snowflake.
2. Create a self-signed certificate from the private key.
```bash
openssl req -new -x509 -key private.key -out public.crt -days 365
```
You can also generate a Certificate Signing Request (CSR) to have a certificate signed by your company CA.
You are responsible for safeguarding and rotating the public key and private key files used for key-pair authentication according to the security policies of your organization.
## Create an external client app in Salesforce
Create an external client app in Salesforce with JWT Bearer Flow. The connector requires this specific OAuth flow to authenticate. Using a different OAuth flow (such as Authorization Code Flow) causes `invalid_grant` errors.
1. Log in to Salesforce as an administrator.
2. Go to **Setup** %ra% **Apps** %ra% **App Manager**, and then select **New External Client App**.
3. Fill in the required fields:
- **External Client App Name**: For example, `Openflow connector for Salesforce Bulk API`.
- **Contact Email**: For example, `salesforceadmin@mycompany.com`.
4. In the **API (Enable OAuth Settings)** section, select the **Enable OAuth** checkbox.
5. Provide a valid **Callback URL** (for example, `https://www.google.com/`).
The callback URL is required by Salesforce, but it is not used by the JWT Bearer Flow. You can provide any valid URL.
6. Provide the desired **OAuth Scopes** for the application. The following scopes are required for the connector to operate properly:
- Manage user data via APIs (`api`)
- Perform requests at any time (`refresh_token`, `offline_access`)
7. In **Flow Enablement**, select the **Enable JWT Bearer Flow** checkbox and upload the `public.crt` file created in the previous step.
You must select **Enable JWT Bearer Flow** specifically. Do not enable other flows unless you have a specific reason to do so. The certificate you upload here must correspond to the private key (`private.key`) that you configure in the connector parameters.
8. Click **Create** to complete the application creation process.
9. Go to the **Settings** tab, expand the **OAuth Settings** section, and click **Consumer Key and Secret** to retrieve the credentials of your application.
10. Record the values for the **Consumer Key** and the **Consumer Secret** for use when configuring the connector in Snowflake. The **Consumer Key** is used as the **OAuth2 Client ID** parameter in the connector configuration.
## Approve the client app for a user
The connector interacts with Salesforce APIs on behalf of a specific user (the OAuth2 Subject configured in the connector parameters). You must authorize this user to use the external client app by assigning the appropriate profiles or permission sets.
If this step is not completed, the connector receives a permission error when attempting to authenticate, even if the JWT Bearer Flow is configured correctly.
1. Go to the **Policies** tab of the client application.
2. Click **Edit**.
3. Expand the **OAuth Policies** section and change **Permitted Users** to **Admin approved users are pre-authorized**.
4. Expand the **App Policies** section and select the profiles or permission sets that are assigned to the Salesforce user you want the connector to use. For example, if the user has the `System Administrator` profile, select that profile.
The user specified as the **OAuth2 Subject** in the connector configuration must belong to at least one of the profiles or permission sets selected here. If the user is not authorized, you receive a permission error when verifying or running the connector.
5. Click **Save**.
## Verify credentials match
Before proceeding to the Snowflake setup, confirm that the following credentials all belong to the same external client app and key pair:
- The **Consumer Key** (Client ID) was retrieved from the external client app you just created.
- The **private key** (`private.key`) corresponds to the **certificate** (`public.crt`) uploaded to the same external client app.
- The **OAuth2 Subject** (user) is authorized for this external client app through the profile or permission set assignment.
If you have created multiple external client apps or experimented with different configurations, mixing credentials from different apps or key pairs is a common source of `invalid_grant` errors. When in doubt, create a new external client app with a fresh certificate and key pair.
## Next steps
Perform the Snowflake setup tasks:
[](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-snowflake)
---
title: Openflow Connector for Salesforce Bulk API: Set up Snowflake
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-snowflake.md
section: Loading & Unloading Data
---
# %salesforcebulkapiof%: Set up Snowflake
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about)
- [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/setup-salesforce)
- [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/configure-connector)
This topic describes the steps to set up Snowflake for the %salesforcebulkapiof%.
## Prerequisites
Before you begin, ensure you have completed the following:
- Install Openflow (either BYOC or SPCS). For more information, see [](/user-guide/data-integration/openflow/about).
- Create an Openflow deployment. For more information, see [](/user-guide/data-integration/openflow/setup-openflow-spcs-deployment) or [](/user-guide/data-integration/openflow/setup-openflow-byoc).
- Create an Openflow runtime. For more information, see [](/user-guide/data-integration/openflow/setup-openflow-spcs-create-runtime) or [](/user-guide/data-integration/openflow/setup-openflow-byoc).
- Review the known limitations of the connector in [](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about).
## Create a key pair
Create a key pair that will be used by the service account user in the connector to interact with the database.
This step is only required if you are deploying the connector in Openflow BYOC. It is NOT needed when deploying the connector in Openflow SPCS.
1. Generate a private key. The example below shows how to generate an unencrypted private key.
```bash
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt
```
The content of the `rsa_key.p8` file will look like this:
```text
-----BEGIN PRIVATE KEY-----
MIIE6T...
-----END PRIVATE KEY-----
```
2. Generate the public key by referencing the private key.
```bash
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub
```
The content of the `rsa_key.pub` file will look like this:
```text
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqh...
-----END PUBLIC KEY-----
```
Copy the contents of this file (without the `-----BEGIN PUBLIC KEY-----` and `-----END PUBLIC KEY-----` headers) to use when creating the user in the next section.
## Create objects and grant privileges
Create a service account, role, database, schema, and warehouse for the connector, and grant the appropriate permissions.
1. Use a role with `ACCOUNTADMIN` privileges to set the role:
```sql
USE ROLE ACCOUNTADMIN;
```
2. Create the destination Snowflake database, if it does not
exist:
```sql
CREATE DATABASE IF NOT EXISTS ;
```
3. Create the destination schema in the database, if it does
not exist:
```sql
CREATE SCHEMA IF NOT EXISTS .;
```
4. Create the role used by the Openflow connector:
```sql
CREATE ROLE IF NOT EXISTS ;
```
5. Grant the privileges to the role to use the database:
```sql
GRANT USAGE ON DATABASE TO ROLE ;
GRANT USAGE ON SCHEMA . TO ROLE ;
GRANT CREATE TABLE ON SCHEMA . TO ROLE ;
```
6. Create a warehouse for the connector (or use an existing one) and grant usage privileges to the connector role:
```sql
-- Create a warehouse (skip if you wish to use an existing warehouse)
CREATE OR REPLACE WAREHOUSE MY_WAREHOUSE WITH
WAREHOUSE_SIZE = 'SMALL'
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE;
GRANT USAGE, OPERATE ON WAREHOUSE MY_WAREHOUSE TO ROLE ;
```
7. Create the service user and assign the role and public key:
```sql
-- Create a service user that the connector will use to interact with Snowflake
-- Set default role to
-- Assign the public key generated with openssl in the previous step (only for BYOC)
CREATE OR REPLACE USER
TYPE = SERVICE
DEFAULT_ROLE =
RSA_PUBLIC_KEY = '';
-- Grant the role to the user
GRANT ROLE TO USER ;
```
## Create a network rule (Openflow Snowflake Deployment only)
If you are deploying the connector in a runtime that is in an Openflow Snowflake Deployment, you must create a network rule and external access integration and set them on the runtime.
```sql
USE ROLE SECURITYADMIN;
CREATE NETWORK RULE MY_OPENFLOW_SALESFORCE_NETWORK_RULE
TYPE = HOST_PORT
MODE = EGRESS
VALUE_LIST = (':443');
CREATE EXTERNAL ACCESS INTEGRATION MY_OPENFLOW_SALESFORCE_EAI
ALLOWED_NETWORK_RULES = (MY_OPENFLOW_SALESFORCE_NETWORK_RULE)
ENABLED = TRUE
COMMENT = 'External Access Integration to connect to Salesforce';
GRANT USAGE ON INTEGRATION MY_OPENFLOW_SALESFORCE_EAI TO ROLE ;
```
## Next steps
Configure the connector in Openflow:
[](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/configure-connector)
---
title: Openflow Connector for SQL Server: Data mapping
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/sql-server/data-mapping.md
section: Loading & Unloading Data
---
# %sqlserver%: Data mapping
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/sql-server/about)
- [](/user-guide/data-integration/openflow/connectors/sql-server/setup)
This topic describes how the SQL Server data types are mapped
to Snowflake data types.
## SQL Server to Snowflake data type mapping
The following table shows how SQL Server data types are mapped to Snowflake data types
when replicating data.
| SQL Server type |
Snowflake type |
Notes |
| TINYINT |
INT |
|
| SMALLINT |
INT |
|
| INT |
INT |
|
| BIGINT |
INT |
|
| DECIMAL |
NUMBER |
If precision exceeds Snowflake limitations (precision > 38), the value is stored as TEXT. |
| NUMERIC |
NUMBER |
If precision exceeds Snowflake limitations (precision > 38), the value is stored as TEXT. |
| SMALLMONEY |
NUMBER |
|
| MONEY |
NUMBER |
|
| REAL |
FLOAT |
|
| FLOAT |
FLOAT |
|
| BIT |
BOOLEAN |
|
| CHAR |
TEXT |
|
| VARCHAR |
TEXT |
|
| NCHAR |
TEXT |
|
| NVARCHAR |
TEXT |
|
| TEXT |
TEXT |
|
| NTEXT |
TEXT |
|
| DATE |
DATE |
|
| TIME |
TIME |
|
| SMALLDATETIME |
TIMESTAMP_NTZ |
|
| DATETIME |
TIMESTAMP_NTZ |
|
| DATETIME2 |
TIMESTAMP_NTZ |
|
| DATETIMEOFFSET |
TIMESTAMP_TZ |
|
| BINARY |
BINARY |
|
| VARBINARY |
BINARY |
|
| IMAGE |
BINARY |
Supported up to the maximum entry size in Snowflake (16 MB). |
| JSON |
VARIANT |
Supported up to the maximum entry size in Snowflake (16 MB). |
| VECTOR |
VARIANT |
|
| XML |
TEXT |
|
| UNIQUEIDENTIFIER |
TEXT |
|
| ROWVERSION / TIMESTAMP |
TEXT |
|
| SQL_VARIANT |
TEXT |
|
| GEOGRAPHY |
TEXT |
Values of this type are inserted as NULL. |
| GEOMETRY |
TEXT |
Values of this type are inserted as NULL. |
Any SQL Server data types not listed in this table are mapped to TEXT by default.
---
title: Openflow Connector for SQL Server: Maintenance
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/sql-server/maintenance.md
section: Loading & Unloading Data
---
# %sqlserver%: Maintenance
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/sql-server/setup)
- [](/user-guide/data-integration/openflow/connectors/sql-server/data-mapping)
This topic describes maintenance considerations and best practices for the %sqlserver%, such as reinstalling the connector or setting the change tracking starting position.
These operations are often used in conjunction with [Incremental replication without snapshots](/user-guide/data-integration/openflow/connectors/sql-server/incremental-replication).
## Check the replication status of a table
Interim failures, such as connection errors, do not prevent table replication. However,
permanent failures, such as unsupported data types, prevent table replication.
To troubleshoot replication issues or verify that a table has been successfully removed from the replication flow, check the Table State Store:
1. In the Openflow runtime canvas, right-click a processor group and choose **Controller Services**. A table listing controller services displays.
2. Locate the row labeled **Table State Store**, click the **More** %sf-vertical-more-button% button on the right side of the row, and then choose **View State**.
A list of tables and their current states displays. Type in the search box to filter the list by table name. The possible states are:
- **NEW**: The table is scheduled for replication but replication hasn't started.
- **SNAPSHOT_REPLICATION**: The connector is copying existing data. This status displays until all records are stored in the destination table.
- **INCREMENTAL_REPLICATION**: The connector is actively replicating changes. This status displays after snapshot replication ends and continues to display indefinitely until a table is either removed from replication or replication fails.
- **FAILED**: Replication has permanently stopped due to an error.
The Openflow runtime canvas doesn't display table status changes — only the current table status. However, table status changes are recorded in logs when they occur. Look for the following log message:
```text
Replication state for table .. changed from to
```
If a permanent failure prevents table replication, remove the table from replication. After you address the problem that caused the failure, you can add the table back to replication. For more information, see [Restart table replication](#label-of-sql-server-restart-table-replication).
## Reinstall the connector
This section provides instructions on how to reinstall the connector, and continue replicating data for
the same tables without having to snapshot them again.
It covers situations where the new connector is installed in the same runtime, as well as moved to a new runtime.
### Prerequisites
Review and note connector parameter context values.
If you reinstall the connector in the same runtime, you can reuse the existing context.
If the new instance is located in a different runtime, you must re-enter all parameters.
1. Finish processing all in-flight FlowFiles in the existing connector, then stop the connector.
1. Sign in to %sf-web-interface-link%.
2. In the navigation menu, select **Ingestion** %raa% **Openflow**.
3. Select **Launch Openflow**.
4. In the **Openflow** pane select the **Runtimes** tab.
5. Select the runtime containing the connector.
6. Select the connector.
7. Stop the topmost processor **Set Tables for Replication** in the **Snapshot Load** group.
8. Stop the topmost processor **Read SQLServer Change Tracking tables** in the **Incremental Load** group.
9. If you changed the value of the **Merge Task Schedule CRON** parameter, return it to `* * * * * ?`, otherwise queues won't be emptied until the next scheduled run.
Wait until all FlowFiles in the connector have been processed, and all queues are empty.
When all FlowFiles have been processed, the **Queued** value on the connector's processor group becomes zero.
If there are any items left in the original connector's queues, there may be data gaps when the new connector starts.
10. Stop all processors and controller services in the connector.
The existing connector can remain in the runtime and doesn't interfere with the new instance, as long as it remains stopped.
2. Create a new instance of the connector. If you use the same runtime as the original connector, you can choose to keep the existing parameter contexts and reuse the settings.
3. If you install into a different runtime or you deleted the previous parameter contexts, enter the configuration settings into the new parameter contexts,
including the table names and patterns as described in [](/user-guide/data-integration/openflow/connectors/sql-server/setup).
4. Navigate to the `SQLServer Ingestion Parameters` context, and set the following parameters:
- Set the `Ingestion Type` parameter to `incremental`. For information, see [](#label-sql-server-incremental-replication).
- Set the `Starting Change Tracking Position` parameter to `Earliest`.
For information, see [](#label-sql-server-connector-start-restart-incremental-load-from-earliest-available-change-tracking-position).
5. Start the new connector.
### Usage notes
The new connector uses the existing destination tables created by the original connector, but creates new journal tables.
## Specify load from change tracking table position
The %sqlserver% connector lets you select the starting position where change tracking tables are read.
By default, the connector reads from the latest available position. Alternatively, you can choose the earliest position available on the source instance.
Choosing to start from the earliest position is common when reinstalling the connector.
This allows the new instance to catch up and continue replicating existing tables without having to snapshot each again.
Switching a running connector from latest to earliest position causes the contents of change tracking tables to be re-read, re-processed, and re-applied to the destination table.
While the change tracking tables are being re-read, the data in affected destination tables
can become out of sync with their sources until all events have been re-processed and merged.
The following parameters are available in the `Ingestion Parameters` context:
| Parameter |
Description |
| Starting Change Tracking Position |
- `Latest` (default): change tracking table reading starts at the latest available position and continues from there.
- `Earliest`: Switches the incremental load to start, or restart reading from the earliest available
change tracking table positions.
|
| Re-read Tables in State |
- `New` (default):
Only new tables, added after the starting position was switched to `Earliest`, will have their change
tracking tables read from the earliest available positions. Tables that started replication before the
configuration change will continue reading from their last positions.
- `Any active`: Re-read and re-process changes from any table currently in replication.
|
To determine whether the connector finished re-reading the change tracking tables:
1. Navigate to the Openflow canvas.
2. Open the **Incremental Load** process group.
3. Right-click the topmost processor named **Read SQLServer Change Tracking tables**, then select **View state**.
4. Check the state entries for every table with keys starting with `position.`. If a value is `0/0` then the connector has not yet finished re-reading the changes for this table.
### Usage notes
- After you switch a running connector to read from the earliest positions and start it,
you can't reconfigure or cancel the process, and it will continue until the currently-read positions reach the latest values.
- Switching to the earliest position on a running connector will, for any tables being re-processed,
finish their existing journals, and create new journal tables.
---
title: Openflow Connector for SQL Server: Set up incremental replication without snapshots
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/sql-server/incremental-replication.md
section: Loading & Unloading Data
---
# %sqlserver%: Set up incremental replication without snapshots
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/sql-server/setup)
- [](/user-guide/data-integration/openflow/connectors/sql-server/data-mapping)
You can configure the %sqlserver% connector to immediately replicate incremental changes for newly added tables, bypassing snapshots. Use incremental load to continue replication without snapshotting every table again when you reinstall the connector over previously replicated data.
To enable incremental replication in a new connector instance:
1. Set up the connector as described in [](/user-guide/data-integration/openflow/connectors/sql-server/setup).
2. In the `SQLServer Ingestion Parameters` context, set the `Ingestion Type` parameter to `incremental`.
## Enable incremental replication without snapshots
To enable incremental replication on an existing connector:
1. sign in to %sf-web-interface-link%.
2. in the navigation menu, select **Ingestion** %raa% **Openflow**.
3. In the **Openflow** pane select the **Runtimes** tab.
4. Select the runtime containing the connector.
5. Select the connector.
6. In the `Ingestion Parameters` context, specify `Ingestion Type` = `incremental`.
7. Add new replication tables. These tables immediately switch to their incremental load.
To return to replicating tables with the snapshot load, change **Ingestion Type** from `incremental` to `full`.
# Usage notes
- Changing the value of **Ingestion Type** does not impact any tables that have begun replicating data.
Tables currently in the snapshot phase continue until the snapshot load is complete.
- While **Ingestion Type** is set to `incremental`, new tables added to the list of replicated tables bypass the snapshot phase.
This includes new tables added to the source database that match the `Included Table Regex` parameter.
Ensure that the ingestion type is set to `incremental` to bypass the snapshot phase.
Connectors should only remain in `incremental` mode as long as required as it bypasses snapshots.
Once customer needs for incremental updates have been satisfied the connector should be returned to `full` mode.
- For tables that bypass snapshot load, the connector creates a destination table in Snowflake,
by executing `CREATE TABLE IF NOT EXISTS`, only if no destination table already exists.
Tables going through the snapshot require that no destination table exist.
---
title: Openflow connectors
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/about-openflow-connectors.md
section: Loading & Unloading Data
---
# Openflow connectors
This feature is not available in the People's Republic of China.
Openflow is available to all accounts in AWS [](#label-na-general-regions).
The connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/setup-openflow-byoc)
- [](/user-guide/data-integration/openflow/setup-openflow-spcs)
- [](/user-guide/data-integration/openflow/manage)
Openflow connectors are curated, versioned Apache NiFi flow definitions built using open-source and proprietary NiFi components.
These connectors follow a strict set of design patterns to ensure performance, fault-tolerance, and ease of configuration.
Review the details of the following connectors available in Openflow:
| Connector |
Description |
| [Openflow Connector for Amazon Ads](/user-guide/data-integration/openflow/connectors/amazon-ads/about) |
Bring data from Amazon Ads for Ad performance statistics and insights |
| [Openflow Connector for Box](/user-guide/data-integration/openflow/connectors/box/about) |
Ingest Box content for your own custom processing in Snowflake
Ingest Box content and make it ready for chat in your AI assistants with Snowflake Cortex
Use Box AI to extract metadata from Box content for enrichment in Snowflake
Add enriched metadata from Snowflake to content in Box
|
| [Openflow Connector for Google Ads](/user-guide/data-integration/openflow/connectors/google-ads/about) |
Import metrics from Google Ads for performance tracking and optimization |
| [Openflow Connector for Google BigQuery](/user-guide/data-integration/openflow/connectors/google-big-query/about) |
Replicate datasets and tables from Google BigQuery into Snowflake with incremental change capture |
| [Openflow Connector for Google Drive](/user-guide/data-integration/openflow/connectors/google-drive/about) |
Ingest Google Drive content and make it ready for chat in your AI assistants with Snowflake Cortex
Ingest Google Drive content for your own custom processing in Snowflake
|
| [Openflow Connector for Google Sheets](/user-guide/data-integration/openflow/connectors/google-sheets/about) |
Load data from Google sheets into Snowflake tables for reporting, analytics, and insights |
| [Openflow Connector for HubSpot](/user-guide/data-integration/openflow/connectors/hubspot/about) |
Get HubSpot CRM data into Snowflake for reporting, analytics, and insights |
| [Openflow Connector for Jira Cloud](/user-guide/data-integration/openflow/connectors/jira-cloud/about) |
Ingest Jira issues, projects, comments, changelogs, worklogs, users, and agile boards into
Snowflake for cross‐team visibility and deeper insights
|
| [Openflow Connector for Kafka](/user-guide/data-integration/openflow/connectors/kafka/about) |
Ingest real‐time events from Apache Kafka into Snowflake for near real-time analytics |
| [Openflow Connector for Kinesis Data Streams](/user-guide/data-integration/openflow/connectors/kinesis/about) |
Ingest real‐time events from Amazon Kinesis Data Streams into Snowflake for near real-time analytics |
| [Openflow Connector for LinkedIn Ads](/user-guide/data-integration/openflow/connectors/linkedin-ads/about) |
Import campaign performance data from LinkedIn Ads to Snowflake for reporting, analytics, and insights |
| [Openflow Connector for Meta Ads](/user-guide/data-integration/openflow/connectors/meta-ads/about) |
Bring Meta (Facebook) Ads data to unify and analyze your marketing performance |
| [Openflow Connector for Microsoft Dataverse](/user-guide/data-integration/openflow/connectors/dataverse/about) |
Integrate data from Microsoft Power Platform and Dynamics 365 applications with Snowflake for holistic business insights |
| [Openflow Connector for MySQL](/user-guide/data-integration/openflow/connectors/mysql/about) |
CDC replication of MySQL tables into Snowflake for comprehensive, centralized reporting |
| [Openflow Connector for Oracle](/user-guide/data-integration/openflow/connectors/oracle/about) |
CDC replication of Oracle database tables into Snowflake for comprehensive, centralized reporting |
| [Openflow Connector for PostgreSQL](/user-guide/data-integration/openflow/connectors/postgres/about) |
CDC replication of PostgreSQL data with Snowflake for comprehensive, centralized reporting |
| [Openflow Connector for Salesforce Bulk API](/user-guide/data-integration/openflow/connectors/salesforce-bulk-api/about) |
Ingests Salesforce objects into Snowflake, with support for incremental change detection |
| [Openflow Connector for SharePoint](/user-guide/data-integration/openflow/connectors/sharepoint/about) |
Ingest SharePoint content and make it ready for chat in your AI assistants with Snowflake Cortex
Ingest SharePoint content for your own custom processing in Snowflake
|
| [Openflow Connector for Shopify](/user-guide/data-integration/openflow/connectors/shopify/about) |
Replicate Shopify store data into Snowflake using the Admin GraphQL API for e-commerce analytics and reporting |
| [Openflow Connector for Slack](/user-guide/data-integration/openflow/connectors/slack/about) |
Pull Slack messages and metadata into Snowflake for searchable, organization‐wide insights |
| [Openflow Connector for Snowflake to Kafka](/user-guide/data-integration/openflow/connectors/snowflake-to-kafka/about) |
CDC replication of Snowflake tables into Apache Kafka for real-time insights distribution and event-driven architectures |
| [Openflow Connector for SQL Server](/user-guide/data-integration/openflow/connectors/sql-server/about) |
CDC replication of Microsoft SQL Server data with Snowflake for comprehensive, centralized reporting |
| [Openflow Connector for Veeva Vault](/user-guide/data-integration/openflow/connectors/veeva-vault/about) |
Replicate Veeva Vault data into Snowflake using Direct Data archives for analytics and reporting |
| [Openflow Connector for Workday](/user-guide/data-integration/openflow/connectors/workday/about) |
Get Workday data into Snowflake using Report-as-a-Service (RaaS) streams for enterprise-level analytics and planning |
---
title: Openflow Snowflake Deployment cost and scaling considerations
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/cost-spcs.md
section: Loading & Unloading Data
---
# Openflow Snowflake Deployment cost and scaling considerations
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/about-spcs)
- [](/user-guide/data-integration/openflow/setup-openflow-spcs)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/monitor)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
When running %ofsfspcs% you must be aware of the cost considerations associated with multiple Snowflake components, including, but not limited to the following cost categories:
- Compute pool costs
- Snowpark Container Services infrastructure
- Data Ingestion
- Telemetry Data Ingestion
- Other costs not explicitly mentioned in this topic
Using and scaling Openflow involves understanding these costs. The following sections describe Openflow costs in general, and provide a number of examples of scaling Openflow runtimes and associated costs.
## %ofsfspcs% costs
When using %ofsfspcs%, you can incur costs from multiple Snowflake components that
Openflow uses. These cost categories are described in the following sections.
However, your actual costs may vary based on your specific environment. See [](#label-openflow-spcs-consumption-examples) for examples of different
cost consumption scenarios.
### Openflow compute pool costs
This cost category is shown as **Openflow Compute Snowflake** on your Snowflake bill.
The total costs for running Openflow are based on the number and types of instances used by [Snowpark Container Service compute pools](/developer-guide/snowpark-container-services/working-with-compute-pool) in your Snowflake account.
Openflow uses compute pools for two different purposes:
- Openflow Management Services
Openflow Management Services run as part of an Openflow deployment. They
use a compute pool to manage the Openflow deployment. This compute pool begins running
as soon as you create a deployment. It continues to run as long as the deployment is
active.
The compute pool associated with the Openflow Management Services continues to run and incurs costs, even if there are no runtimes running.
- Openflow runtimes
Openflow uses compute pools to run the Openflow runtimes. The number of compute
pools required and the number of nodes within each compute pool are scaled based on the
number of runtimes that are currently running.
When all runtimes associated with a runtime are stopped, the compute pool associated
with the runtimes is scaled down to 0 nodes. No costs are incurred for a runtime compute pool when it is not in use.
Credits are billed per-second with a 5 minute minimum. For information on the rate per Snowpark Container Services
Compute Instance Family per hour, refer to Table 1(d) in the
[Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf).
The following views in the [](/sql-reference/account-usage) schema provide additional details on Openflow
compute costs:
- [METERING_DAILY_HISTORY](/sql-reference/account-usage/metering_daily_history)
- [METERING_HISTORY](/sql-reference/account-usage/metering_history)
Compute pool costs related to Openflow appear under *SERVICE_TYPE* as *OPENFLOW_COMPUTE_SNOWFLAKE*.
The [OPENFLOW_USAGE_HISTORY](/sql-reference/account-usage/openflow_usage_history) view currently does not
contain records for the *OPENFLOW_COMPUTE_SNOWFLAKE* service type.
For more information on compute costs in Snowflake, see [](/user-guide/cost-exploring-compute).
### Snowpark Container Services infrastructure costs
In addition to compute pool costs, there are costs associated with additional Snowpark Container Services infrastructure, including storage and data transfer.
For additional information, see [](/developer-guide/snowpark-container-services/accounts-orgs-usage-views).
### Data ingestion costs
Costs are incurred when loading data into Snowflake using services such as Snowpipe or Snowpipe Streaming. These costs are based on the volume of data ingested.
These costs appear on your Snowflake bill under their respective ingestion services line items.
Additionally, some connectors may require a warehouse and will incur warehouse costs. For example, database CDC connectors require a warehouse for both the
initial snapshots and ongoing incremental Change Data Capture (CDC).
### Telemetry data ingestion costs
When using an event table to store telemetry data for Openflow, Snowflake charges
for sending logs and metrics to Openflow deployments. There are also charges for
sending runtime telemetry data to your event table within Snowflake.
The rate for credits per GB of telemetry data is specified in Table 5 in the
[Snowflake Service Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf)
This item is referred to as Telemetry Data Ingest.
## Reducing Openflow credit consumption
If you have runtimes that are not actively in use, you can suspend them to reduce costs. Suspending a runtime
stops credit consumption for the associated runtime compute pool. When a runtime is suspended, its compute pool
scales down to 0 nodes and no longer incurs charges.
## %ofsfspcs% costs associated with runtimes and scaling behavior
How you choose to configure and scale runtimes is important for managing costs effectively. Openflow supports different runtime types, each with its own scaling characteristics and associated costs.
### Mapping runtimes to Snowflake compute pools
The runtime type you choose determines the runtime pods that are scheduled on the associated compute pool. Using a larger runtime type will result in a larger compute pool being used, which will incur higher costs.
The runtime sizes and their scaling behavior are described in the following table:
| Runtime type | vCPUs | Available memory (GB) | Snowflake Compute Pool instance family | Snowflake Compute Pool | Instance Family - vCPUs | Instance Family - memory (GB) |
| ------------ | ----- | --------------------- | -------------------------------------- | -------------------------- | ----------------------- | ----------------------------- |
| Small | 1 | 2 | CPU_X64_S | INTERNAL_OPENFLOW_0_SMALL | 4 | 16 |
| Medium | 4 | 10 | CPU_X64_SL | INTERNAL_OPENFLOW_0_MEDIUM | 16 | 64 |
| Large | 8 | 20 | CPU_X64_L | INTERNAL_OPENFLOW_0_LARGE | 32 | 128 |
Openflow scales the underlying Snowflake Compute Pools when additional compute pool
nodes need to be scheduled, based on CPU consumption, and up to the maximum node setting set during runtime creation.
Compute pools are configured with a minimum size of 0 nodes and a maximum of 50 nodes. The required size is dynamically adjusted depending on the CPU and memory
requirements of the runtimes.
If there are no resource demands, for example, if the runtime is not running, a compute pool scales down to 0 nodes after 600 seconds (10 minutes).
| Runtime |
Activity |
Snowflake costs |
Cloud costs |
| No runtimes |
None |
Openflow Control Pool x 1 node = 1 CPU_X64_S instance-hour |
None |
| 1 small runtime (1vCPU) (min=1 max=2) |
Active for 1 hour.
Runtime does not scale to 2.
|
Openflow Control Pool x 1 node + Small Openflow Compute Pool (CPU_X64_S) x 1 node = 2 CPU_X64_S instance-hours |
None |
| 2 small runtime (1 vCPU) (min/max=2) 1 large runtime (8 vCPU) (min/max=10) |
Small: 4 nodes active for 1 hour Large: 10 nodes active for 1 hour |
Openflow Control Pool x 1 node + Small Openflow Compute Pool (CPU_X64_S) x 2 node + Large Openflow Compute Pool (CPU_X64_L) x 4 nodes = 3 CPU_X64_S instance-hours + 4 CPU_X64_L instance-hours |
None |
| 1 medium (4vCPU) (min=1 max=2) |
First 20 minutes 1 node is running After 20 minutes, scales to 2 nodes After 40 minutes, scales back to 1 node Total 1 hour |
Openflow Control Pool x 1 node + Medium Openflow Compute Pool (CPU_X64_SL) x 1 node = 1 CPU_X64_S instance-hour + 1 CPU_X64_SL instance-hour |
None |
| 1 medium (4vCPU) (min/max=2) |
First 30 minutes 2 nodes running Suspends after the first 30 minutes |
Openflow Control Pool x 1 node + Medium Openflow Compute Pool (CPU_X64_SL) x 1 node x 1/2 hour = 1 CPU_X64_S instance-hour + 1/2 CPU_X64_SL instance-hour |
None |
### Examples for calculating %ofsfspcs% consumption
- You created an Openflow Snowflake Deployment and have not created any runtimes.
-
- The Openflow_Control_Pool_0 Compute Pool is running with one CPU_X64_S instance
- Total Openflow consumption = 1 CPU_X64_S instance-hour
- You created one small runtime with Min Nodes = 1 and Max Nodes = 2. Runtime stays at 1 node for 1 hour.
-
- The Openflow_Control_Pool_0 Compute Pool is running with 1 CPU_X64_S instance
- The INTERNAL_OPENFLOW_0_SMALL Compute Pool is running with 1 CPU_X64_S instance
- Total Openflow consumption = 2 CPU_X64_S instance-hours
- You created two small runtimes with min/max of two nodes each, and one large runtime with min/max of 10 nodes. These Runtimes are active for one hour.
-
- The Openflow_Control_Pool_0 Compute Pool is running with 1 CPU_X64_S instance
- Two small runtimes at two nodes = INTERNAL_OPENFLOW_0_SMALL Compute Pool is running with 2 CPU_X64_S instances = 2 CPU_X64_S instance-hours
- One large runtime at 10 nodes = INTERNAL_OPENFLOW_0_LARGE Compute Pool is running with 4 CPU_X64_L instances = 4 CPU_X64_L instance-hours
- Total Openflow consumption = 3 CPU_X64_S instance-hours + 4 CPU_X64_L instance-hour
- You created one medium runtime with one node. After 20 minutes, it scales to two nodes. After 20 minutes, it scales back down to one node and runs for another 20 minutes.
-
- The Openflow_Control_Pool_0 Compute Pool is running with 1 CPU_X64_S instance
- One medium runtime scaling up to two medium runtimes = INTERNAL_OPENFLOW_0_MEDIUM Compute Pool is running with 1 CPU_X64_SL instance = 1 CPU_X64_SL instance-hour
- Total Openflow consumption = 1 CPU_X64_S instance-hour + 1 CPU_X64_SL instance-hour
- You created one medium runtime with two nodes, then suspended it after 30 minutes.
-
- The Openflow_Control_Pool_0 Compute Pool is running with 1 CPU_X64_S instance
- One medium runtime at one node = INTERNAL_OPENFLOW_0_MEDIUM Compute Pool is running with 1 CPU_X64_SL instance
- 30 minutes = 1/2 hour
- Total Openflow consumption = 1 CPU_X64_S instance-hour +1/2 CPU_X64_SL instance-hour
---
title: PackageFlowFile 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/packageflowfile.md
section: Loading & Unloading Data
---
# PackageFlowFile 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
This processor will package FlowFile attributes and content into an output FlowFile that can be exported from NiFi and imported back into NiFi, preserving the original attributes and content.
## Tags
attributes, flowfile, flowfile-stream, flowfile-stream-v3, package
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Maximum Batch Content Size |
Maximum combined content size of FlowFiles to package into one output FlowFile. Note, that FlowFiles whose content exceeds this limit are packaged separately. |
| max-batch-size |
Maximum number of FlowFiles to package into one output FlowFile. |
## Relationships
| Name |
Description |
| original |
The FlowFiles that were used to create the package are sent to this relationship |
| success |
The packaged FlowFile is sent to this relationship |
## Writes attributes
| Name |
Description |
| mime.type |
The mime.type will be changed to application/flowfile-v3 |
## Use Cases Involving Other Components
| Send FlowFile content and attributes from one NiFi instance to another NiFi instance. |
| ------------------------------------------------------------------------------------- |
| Export FlowFile content and attributes from NiFi to external storage and reimport. |
## See also
- [org.apache.nifi.processors.standard.MergeContent](/user-guide/data-integration/openflow/processors/mergecontent)
- [org.apache.nifi.processors.standard.UnpackContent](/user-guide/data-integration/openflow/processors/unpackcontent)
---
title: PaginatedJsonQueryElasticsearch 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/paginatedjsonqueryelasticsearch.md
section: Loading & Unloading Data
---
# PaginatedJsonQueryElasticsearch 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-elasticsearch-restapi-nar
## Description
A processor that allows the user to run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. It will use the flowfile's content for the query unless the QUERY attribute is populated. Search After/Point in Time queries must include a valid "sort" field.
## Tags
elasticsearch, elasticsearch7, elasticsearch8, elasticsearch9, json, page, query, read, scroll
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Aggregation Results Format |
Format of Aggregation output. |
| Aggregation Results Split |
Output a flowfile containing all aggregations or one flowfile for each individual aggregation. |
| Aggregations |
One or more query aggregations (or "aggs"), in JSON syntax. Ex: \{"items": \{"terms": \{"field": "product", "size": 10\}\}\} |
| Client Service |
An Elasticsearch client service to use for running queries. |
| Fields |
Fields of indexed documents to be retrieved, in JSON syntax. Ex: ["user.id", "http.response.*", \{"field": "@timestamp", "format": "epoch_millis"\}] |
| Index |
The name of the index to use. |
| Max JSON Field String Length |
The maximum allowed length of a string value when parsing a JSON document or attribute. |
| Output No Hits |
Output a "hits" flowfile even if no hits found for query. If true, an empty "hits" flowfile will be output even if "aggregations" are output. |
| Pagination Keep Alive |
Pagination "keep_alive" period. Period Elasticsearch will keep the scroll/pit cursor alive in between requests (this is not the time expected for all pages to be returned, but the maximum allowed time for requests between page retrievals). |
| Pagination Type |
Pagination method to use. Not all types are available for all Elasticsearch versions, check the Elasticsearch docs to confirm which are applicable and recommended for your service. |
| Query |
A query in JSON syntax, not Lucene syntax. Ex: \{"query":\{"match":\{"somefield":"somevalue"\}\}\}. If this parameter is not set, the query will be read from the flowfile content. If the query (property and flowfile content) is empty, a default empty JSON Object will be used, which will result in a "match_all" query in Elasticsearch. |
| Query Attribute |
If set, the executed query will be set on each result flowfile in the specified attribute. |
| Query Clause |
A "query" clause in JSON syntax, not Lucene syntax. Ex: \{"match":\{"somefield":"somevalue"\}\}. If the query is empty, a default JSON Object will be used, which will result in a "match_all" query in Elasticsearch. |
| Query Definition Style |
How the JSON Query will be defined for use by the processor. |
| Script Fields |
Fields to created using script evaluation at query runtime, in JSON syntax. Ex: \{"test1": \{"script": \{"lang": "painless", "source": "doc[ 'price'].value * 2"\}\}, "test2": \{"script": \{"lang": "painless", "source": "doc[ 'price'].value * params.factor", "params": \{"factor": 2.0\}\}\}\} |
| Search Results Format |
Format of Hits output. |
| Search Results Split |
Output a flowfile containing all hits or one flowfile for each individual hit or one flowfile containing all hits from all paged responses. |
| Size |
The maximum number of documents to retrieve in the query. If the query is paginated, this "size" applies to each page of the query, not the "size" of the entire result set. |
| Sort |
Sort results by one or more fields, in JSON syntax. Ex: [\{"price" : \{"order" : "asc", "mode" : "avg"\}\}, \{"post_date" : \{"format": "strict_date_optional_time_nanos"\}\}] |
| Type |
The type of this document (used by Elasticsearch for indexing and searching). |
## Relationships
| Name |
Description |
| aggregations |
Aggregations are routed to this relationship. |
| failure |
All flowfiles that fail for reasons unrelated to server availability go to this relationship. |
| hits |
Search hits are routed to this relationship. |
| original |
All original flowfiles that don't cause an error to occur go to this relationship. |
## Writes attributes
| Name |
Description |
| mime.type |
application/json |
| aggregation.name |
The name of the aggregation whose results are in the output flowfile |
| aggregation.number |
The number of the aggregation whose results are in the output flowfile |
| page.number |
The number of the page (request), starting from 1, in which the results were returned that are in the output flowfile |
| hit.count |
The number of hits that are in the output flowfile |
| elasticsearch.query.error |
The error message provided by Elasticsearch if there is an error querying the index. |
## See also
- [org.apache.nifi.processors.elasticsearch.ConsumeElasticsearch](/user-guide/data-integration/openflow/processors/consumeelasticsearch)
- [org.apache.nifi.processors.elasticsearch.JsonQueryElasticsearch](/user-guide/data-integration/openflow/processors/jsonqueryelasticsearch)
- [org.apache.nifi.processors.elasticsearch.SearchElasticsearch](/user-guide/data-integration/openflow/processors/searchelasticsearch)
---
title: ParquetIcebergWriter
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/parqueticebergwriter.md
section: Loading & Unloading Data
---
# ParquetIcebergWriter
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Provides record serialization for Apache Iceberg using Apache Parquet formatting
## Tags
iceberg, openflow, parquet, record
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Write Target File Size * |
Write Target File Size |
512 MB |
|
Controls the size of files generated to target about this many bytes |
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: ParseEvtx 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/parseevtx.md
section: Loading & Unloading Data
---
# ParseEvtx 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-evtx-nar
## Description
Parses the contents of a Windows Event Log file (evtx) and writes the resulting XML to the FlowFile
## Tags
event, evtx, file, logs, message, windows
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Granularity |
Output flow file for each Record, Chunk, or File encountered in the event log |
## Relationships
| Name |
Description |
| bad chunk |
Any bad chunks of records will be transferred to this relationship in their original binary form |
| failure |
Any FlowFile that encountered an exception during conversion will be transferred to this relationship with as much parsing as possible done |
| original |
The unmodified input FlowFile will be transferred to this relationship |
| success |
Any FlowFile that was successfully converted from evtx to XML |
## Writes attributes
| Name |
Description |
| filename |
The output filename |
| mime.type |
The output filetype (application/xml for success and failure relationships, original value for bad chunk and original relationships) |
---
title: ParseExcelCellReference 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/parseexcelcellreference.md
section: Loading & Unloading Data
---
# ParseExcelCellReference 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-office-nar
## Description
Processor responsible for parsing Excel cell reference formula.
## Tags
cell, excel, parse, spreadsheet, xls, xlsx
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Ranges |
The comma-separated Excel ranges to parse in the A1 notation. For example: Sheet1!A1:B2,Sheet2!D4:E5,Sheet3. Ranges in R1C1 and 3-D reference style are not allowed. The value can't be empty. |
## Relationships
| Name |
Description |
| failure |
FlowFile with errors occurred while parsing ranges. |
| success |
FlowFile annotated with attributes containing parsed Excel range. For each range a separate FlowFile is produced. |
## Writes attributes
| Name |
Description |
| range.formula |
Single range formula that was used to produce other attributes, e.g. Sheet1!A1:B2. |
| range.sheetname |
Parsed sheet name. |
| range.rows.starting |
Starting row (numbered from 1) of parsed range. |
| range.rows.ending |
Ending row of parsed range. |
| range.columns.starting |
Number of starting column of parsed range. |
| range.columns.ending |
Number of ending column of parsed range. |
---
title: ParseSyslog 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/parsesyslog.md
section: Loading & Unloading Data
---
# ParseSyslog 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Attempts to parses the contents of a Syslog message in accordance to RFC5424 and RFC3164 formats and adds attributes to the FlowFile for each of the parts of the Syslog message. Note: Be mindfull that RFC3164 is informational and a wide range of different implementations are present in the wild. If messages fail parsing, considering using RFC5424 or using a generic parsing processors such as ExtractGrok.
## Tags
attributes, event, logs, message, syslog, system
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Character Set |
Specifies which character set of the Syslog messages |
## Relationships
| Name |
Description |
| failure |
Any FlowFile that could not be parsed as a Syslog message will be transferred to this Relationship without any attributes being added |
| success |
Any FlowFile that is successfully parsed as a Syslog message will be to this Relationship. |
## Writes attributes
| Name |
Description |
| syslog.priority |
The priority of the Syslog message. |
| syslog.severity |
The severity of the Syslog message derived from the priority. |
| syslog.facility |
The facility of the Syslog message derived from the priority. |
| syslog.version |
The optional version from the Syslog message. |
| syslog.timestamp |
The timestamp of the Syslog message. |
| syslog.hostname |
The hostname or IP address of the Syslog message. |
| syslog.sender |
The hostname of the Syslog server that sent the message. |
| syslog.body |
The body of the Syslog message, everything after the hostname. |
## See also
- [org.apache.nifi.processors.standard.ListenSyslog](/user-guide/data-integration/openflow/processors/listensyslog)
- [org.apache.nifi.processors.standard.PutSyslog](/user-guide/data-integration/openflow/processors/putsyslog)
---
title: ParseSyslog5424 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/parsesyslog5424.md
section: Loading & Unloading Data
---
# ParseSyslog5424 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Attempts to parse the contents of a well formed Syslog message in accordance to RFC5424 format and adds attributes to the FlowFile for each of the parts of the Syslog message, including Structured Data. Structured Data will be written to attributes as one attribute per item id + parameter see [https://tools.ietf.org/html/rfc5424.Note](https://tools.ietf.org/html/rfc5424.Note): ParseSyslog5424 follows the specification more closely than ParseSyslog. If your Syslog producer does not follow the spec closely, with regards to using '-' for missing header entries for example, those logs will fail with this parser, where they would not fail with ParseSyslog.
## Tags
attributes, event, logs, message, syslog, syslog5424, system
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Character Set |
Specifies which character set of the Syslog messages |
| include_policy |
If true, then the Syslog Message body will be included in the attributes. |
| nil_policy |
Defines how NIL values are handled for header fields. |
## Relationships
| Name |
Description |
| failure |
Any FlowFile that could not be parsed as a Syslog message will be transferred to this Relationship without any attributes being added |
| success |
Any FlowFile that is successfully parsed as a Syslog message will be to this Relationship. |
## Writes attributes
| Name |
Description |
| syslog.priority |
The priority of the Syslog message. |
| syslog.severity |
The severity of the Syslog message derived from the priority. |
| syslog.facility |
The facility of the Syslog message derived from the priority. |
| syslog.version |
The optional version from the Syslog message. |
| syslog.timestamp |
The timestamp of the Syslog message. |
| syslog.hostname |
The hostname or IP address of the Syslog message. |
| syslog.appname |
The appname of the Syslog message. |
| syslog.procid |
The procid of the Syslog message. |
| syslog.messageid |
The messageid the Syslog message. |
| syslog.structuredData |
Multiple entries per structuredData of the Syslog message. |
| syslog.sender |
The hostname of the Syslog server that sent the message. |
| syslog.body |
The body of the Syslog message, everything after the hostname. |
## See also
- [org.apache.nifi.processors.standard.ListenSyslog](/user-guide/data-integration/openflow/processors/listensyslog)
- [org.apache.nifi.processors.standard.ParseSyslog](/user-guide/data-integration/openflow/processors/parsesyslog)
- [org.apache.nifi.processors.standard.PutSyslog](/user-guide/data-integration/openflow/processors/putsyslog)
---
title: PartitionRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/partitionrecord.md
section: Loading & Unloading Data
---
# PartitionRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Splits, or partitions, record-oriented data based on the configured fields in the data. One or more properties must be added. The name of the property is the name of an attribute to add. The value of the property is a RecordPath to evaluate against each Record. Two records will go to the same outbound FlowFile only if they have the same value for each of the given RecordPaths. Because we know that all records in a given output FlowFile have the same value for the fields that are specified by the RecordPath, an attribute is added for each field. See Additional Details on the Usage page for more information and examples.
## Tags
bin, group, organize, partition, record, recordpath, rpath, segment, split
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| record-reader |
Specifies the Controller Service to use for reading incoming data |
| record-writer |
Specifies the Controller Service to use for writing out the records |
## Relationships
| Name |
Description |
| failure |
If a FlowFile cannot be partitioned from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship |
| original |
Once all records in an incoming FlowFile have been partitioned, the original FlowFile is routed to this relationship. |
| success |
FlowFiles that are successfully partitioned will be routed to this relationship |
## Writes attributes
| Name |
Description |
| record.count |
The number of records in an outgoing FlowFile |
| mime.type |
The MIME Type that the configured Record Writer indicates is appropriate |
| fragment.identifier |
All partitioned FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute |
| fragment.index |
A one-up number that indicates the ordering of the partitioned FlowFiles that were created from a single parent FlowFile |
| fragment.count |
The number of partitioned FlowFiles generated from the parent FlowFile |
| segment.original.filename |
The filename of the parent FlowFile |
| <dynamic property name> |
For each dynamic property that is added, an attribute may be added to the FlowFile. See the description for Dynamic Properties for more information. |
## Use cases
| Separate records into separate FlowFiles so that all of the records in a FlowFile have the same value for a given field or set of fields. |
| ----------------------------------------------------------------------------------------------------------------------------------------- |
| Separate records based on whether or not they adhere to a specific criteria |
## See also
- [org.apache.nifi.processors.standard.ConvertRecord](/user-guide/data-integration/openflow/processors/convertrecord)
- [org.apache.nifi.processors.standard.QueryRecord](/user-guide/data-integration/openflow/processors/queryrecord)
- [org.apache.nifi.processors.standard.SplitRecord](/user-guide/data-integration/openflow/processors/splitrecord)
- [org.apache.nifi.processors.standard.UpdateRecord](/user-guide/data-integration/openflow/processors/updaterecord)
---
title: PEMEncodedSSLContextProvider
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/pemencodedsslcontextprovider.md
section: Loading & Unloading Data
---
# PEMEncodedSSLContextProvider
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
SSLContext Provider configurable using PEM Private Key and Certificate files. Supports PKCS1 and PKCS8 encoding for Private Keys as well as X.509 encoding for Certificates.
## Tags
Certificate, ECDSA, Ed25519, Key, PEM, PKCS1, PKCS8, RSA, SSL, TLS, X.509
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Certificate Authorities * |
Certificate Authorities |
|
|
PEM X.509 Certificate Authorities trusted for verifying peers in TLS communications containing one or more standard certificates |
| Certificate Authorities Source * |
Certificate Authorities Source |
PROPERTIES |
- Properties
- System
|
Source of information for loading trusted Certificate Authorities |
| Certificate Chain * |
Certificate Chain |
|
|
PEM X.509 Certificate Chain associated with Private Key starting with standard BEGIN CERTIFICATE header |
| Certificate Chain Location * |
Certificate Chain Location |
|
|
PEM X.509 Certificate Chain file location associated with Private Key starting with standard BEGIN CERTIFICATE header |
| Private Key * |
Private Key |
|
|
PEM Private Key encoded using either PKCS1 or PKCS8. Supported algorithms include ECDSA, Ed25519, and RSA |
| Private Key Location * |
Private Key Location |
|
|
PEM Private Key file location encoded using either PKCS1 or PKCS8. Supported algorithms include ECDSA, Ed25519, and RSA |
| Private Key Source * |
Private Key Source |
PROPERTIES |
- Undefined
- Properties
- Files
|
Source of information for loading Private Key and Certificate Chain |
| TLS Protocol * |
TLS Protocol |
TLS |
- TLS
- TLSv1.3
- TLSv1.2
|
TLS protocol version required for negotiating encrypted communications. |
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: Performance tuning of the Openflow Connector for Amazon Kinesis Data Streams
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kinesis/performance-tuning.md
section: Loading & Unloading Data
---
# Performance tuning of the %kinesis%
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/kinesis/about)
- [](/user-guide/data-integration/openflow/connectors/kinesis/setup)
- [](/user-guide/data-integration/openflow/connectors/kinesis/maintenance)
- [](/user-guide/data-integration/openflow/connectors/kinesis/troubleshoot)
When configuring the Openflow Connector for Kinesis for optimal performance, consider the following key factors that impact ingestion throughput and latency.
## Flowfile size
For optimal performance, flowfiles should be in the range 1-10 MB rather than containing individual small messages. Larger flowfiles reduce processing overhead and improve throughput by minimizing the number of individual file operations. Default settings should yield flowfiles in an acceptable size range. Small flowfiles are expected when throughput is low.
If you observe small flowfiles with high throughput, contact [Snowflake Support](/user-guide/contacting-support) for assistance.
## Network and infrastructure
### Network latency
Lower latency between Kinesis and Openflow improves overall performance. It's highly advised that your Kinesis stream and Openflow are located in the same cloud service provider (CSP) region.
### Node size recommendations
The following table provides configuration recommendations based on expected workload characteristics. Throughput values are relative and depend heavily on the source system configuration, topic and stream sizes, data format, and other factors.
| Node Size |
Recommended For |
Message Rate Capacity |
| Small (S) |
Low to moderate throughput scenarios |
Up to 27 MB/s per node |
| Medium (M) |
Moderate to high throughput scenarios |
Up to 135 MB/s per node |
| Large (L) |
High throughput scenarios |
Exceeding 135 MB/s per node. Up to 310 MB/s per node. |
## Performance optimization best practices
### Tuning Max Records Per Request
When the ConsumeKinesis processor uses the **SHARED_THROUGHPUT** consumer type, the **Max Records Per Request** property controls the maximum number of records that the processor retrieves from Kinesis in a single request. If ingestion throughput is low and you don't see an obvious bottleneck in Openflow, Snowflake, or the network, this value might be too low for your workload.
For most workloads, start by setting **Max Records Per Request** so that each request retrieves about 1 MB of data. Estimate the value by dividing 1 MB by your average Kinesis record size.
The following table shows example starting values for common average record sizes:
| Average record size |
Approximate calculation |
Max Records Per Request |
| 1 KB |
1 MB / 1 KB |
1000 |
| 200 bytes |
1 MB / 200 bytes |
5000 |
| 5 KB |
1 MB / 5 KB |
200 |
After changing the value, monitor consumer lag, throughput, and runtime resource usage. Increase the value gradually if Kinesis consumption remains the bottleneck.
### Adjusting processor concurrent tasks
To optimize processor performance, you can adjust the number of concurrent tasks for both ConsumeKinesis and PublishSnowpipeStreaming processors. Concurrent tasks allow processors to run multiple threads simultaneously, improving throughput for high-volume scenarios.
To adjust concurrent tasks for a processor, perform the following tasks:
1. Right-click on the processor in the Openflow canvas.
2. Select **Configure** from the context menu.
3. Navigate to the **Scheduling** tab.
4. In the **Concurrent tasks** field, enter the preferred number of concurrent tasks.
5. Select **Apply** to save the configuration.
#### Recommended concurrent task settings
| Node Size |
ConsumeKinesis Tasks |
PublishSnowpipeStreaming Tasks |
| Small (S) |
2 |
1 |
| Medium (M) |
4 |
2 |
| Large (L) |
6 |
3 |
#### Important considerations
- **Memory usage**: Each concurrent task consumes additional memory. Monitor JVM heap usage when increasing concurrent tasks.
- **Start conservatively**: Begin with lower values and gradually increase while monitoring performance metrics.
## Troubleshoot common performance bottlenecks
### High consumer lag or Snowflake ingestion bottlenecks
If Kinesis consumer lag is increasing or Snowflake ingestion is slow, then perform the following tasks:
1. Verify network connectivity and bandwidth between Openflow and Kinesis.
2. Observe if the queue in front of the PublishSnowpipeStreaming processor increases.
1. If yes, consider adding more concurrent tasks for the PublishSnowpipeStreaming processor in the range limitations provided in [Adjusting processor concurrent tasks](#adjusting-processor-concurrent-tasks).
2. If not, consider adding more concurrent tasks for the ConsumeKinesis processor in the range limitations provided in [Adjusting processor concurrent tasks](#adjusting-processor-concurrent-tasks).
3. Consider using a bigger node type.
4. Consider increasing the number of nodes for the runtime. This can be done by stopping the connectors in the runtime. Changing node min and max size numbers and starting the connectors again..
### Memory pressure
If experiencing memory-related issues:
1. Reduce the batch sizes to lower the memory footprint. This can be done by changing the File Fragment Size and File Fragment Count parameters in the PublishSnowpipeStreaming processor.
2. Reduce the number of concurrent tasks for the ConsumeKinesis processor.
3. Consider using a bigger node type.
### Network latency issues
If experiencing high latency:
1. Verify network configuration between Openflow and external systems.
2. Consider deploying Openflow in the same region as your Kinesis stream.
3. If working with low throughput, consider lowering the Client Lag settings in the PublishSnowpipeStreaming processor and Max Uncommitted Time in the ConsumeKinesis processor.
---
title: Performance Tuning of the Openflow Connector for Kafka
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kafka/performance-tuning.md
section: Loading & Unloading Data
---
# Performance Tuning of the Openflow Connector for Kafka
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
This topic provides guidance for optimizing the performance of the [](/user-guide/data-integration/openflow/connectors/kafka/about)
to achieve optimal throughput and minimize latency when ingesting data into Snowflake.
## Performance considerations
When configuring the Openflow Connector for Kafka for optimal performance, consider the following key factors that impact ingestion throughput and latency:
### Kafka configuration
#### Partition count
More partitions allow for higher parallelism but require careful coordination with consumer configuration. Excessive partitions can cause several issues: increased memory usage, slower leader elections during failures, and significant metadata management overhead on brokers.
#### Compression
Message compression can reduce network bandwidth but increases CPU overhead.
### Flowfile optimization
#### Flowfile size
For optimal performance, flowfiles should be in the range 1-10 MB rather than containing individual small messages. Larger flowfiles reduce processing overhead and improve throughput by minimizing the number of individual file operations. Default settings should yield flowfiles in an acceptable size range. Small flowfiles are expected when throughput is low.
If you observe small flowfiles with high throughput, contact [Snowflake Support](/user-guide/contacting-support) for assistance.
### Network and infrastructure
#### Network latency
Lower latency between Kafka brokers and Openflow improves overall performance. Snowflake recommends deploying Kafka brokers and Openflow in the same CSP region.
#### Node size recommendations
The following table provides configuration recommendations based on expected workload characteristics:
| Node Size |
Recommended For |
Message Rate Capacity |
| Small (S) |
Low to moderate throughput scenarios |
Up to 18 MB/s per node |
| Medium (M) |
Moderate to high throughput scenarios |
Up to 145 MB/s per node |
| Large (L) |
High throughput scenarios |
Up to 250 MB/s per node |
### Performance optimization best practices
#### Adjusting processor concurrent tasks
To optimize processor performance, you can adjust the number of concurrent tasks for both
[ConsumeKafka](/user-guide/data-integration/openflow/processors/consumekafka) and PublishSnowpipeStreaming processors. Concurrent tasks allow processors to run multiple threads simultaneously, improving throughput for high-volume scenarios.
To adjust concurrent tasks for a processor, perform the following tasks:
1. Right-click on the processor in the Openflow canvas.
2. Select Configure from the context menu.
3. Navigate to the Scheduling tab.
4. In the Concurrent tasks field, enter the preferred number of concurrent tasks.
5. Select Apply to save the configuration.
#### Recommended concurrent task settings
The following table provides recommended concurrent task settings for different node sizes:
| Node Size |
ConsumeKafka Tasks |
PublishSnowpipeStreaming Tasks |
| Small (S) |
1 |
1 |
| Medium (M) |
4 |
2 |
| Large (L) |
8 |
2 |
#### Important considerations
- Memory usage
-
Each concurrent task consumes additional memory. Monitor JVM heap usage when increasing concurrent tasks.
- Kafka partitions
-
For ConsumeKafka, the number of concurrent tasks multiplied by the number of runtime nodes should not exceed the number of total Kafka partitions from all topics.
- Start conservatively
-
Begin with lower values and gradually increase while monitoring performance metrics.
#### Troubleshooting performance issues: Common performance bottlenecks
##### High consumer lag or Snowflake ingestion bottlenecks
If Kafka consumer lag is increasing or Snowflake ingestion is slow, then perform the following tasks:
1. Verify network connectivity and bandwidth between Openflow and Kafka brokers.
2. Observe if the queue in front of the PublishSnowpipeStreaming processor increases.
1. If yes, consider adding more concurrent tasks for the PublishSnowpipeStreaming processor in the range limitations provided in [](#label-openflow-kafka-adjust-concurrent-tasks).
2. If not, consider adding more concurrent tasks for the ConsumeKafka processor in the range limitations provided in [](#label-openflow-kafka-adjust-concurrent-tasks).
3. Consider using a bigger node type.
4. Consider increasing the max number of nodes for the runtime.
##### Memory pressure
If experiencing memory-related issues:
1. Reduce the batch sizes to lower the memory footprint.
2. Reduce the number of concurrent tasks for the ConsumeKafka processor.
3. Consider upgrading to a bigger node type.
##### Network latency issues
If experiencing high latency:
1. Verify network configuration between Openflow and external systems.
2. Consider deploying Openflow closer to your Kafka cluster.
3. If working with low throughput, consider lowering the Client Lag settings in the PublishSnowpipeStreaming processor and Max Uncommitted Time in the ConsumeKafka processor.
---
title: PerformSnowflakeCortexOCR 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/performsnowflakecortexocr.md
section: Loading & Unloading Data
---
# PerformSnowflakeCortexOCR 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-snowflake-processors-nar
## Description
Performs Optical Character Recognition (OCR) on PDF documents using Snowflake Cortex ML functions. Documents must be staged in a Snowflake internal stage with server-side encryption enabled. The processor extracts text content from PDFs and can output the results either as FlowFile content or as an attribute.
## Tags
ai, cortex, document, ml, ocr, openflow, pdf, snowflake
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Database |
The Snowflake database containing the stage |
| Filename |
The filename of the file to perform OCR on, it must be uploaded to the stage prior to performing OCR. FlowFile attributes may be referenced via Expression Language. |
| Max Attribute Size |
The maximum size of the OCR results that can be written to an attribute. If the OCR results exceed this, the FlowFile will be routed to failure. |
| OCR Mode |
Specifies how document text and structure should be extracted. In 'OCR' mode, only raw text content is extracted, ignoring formatting and table structures. In 'LAYOUT' mode, the output preserves table structures as markdown. |
| Output Strategy |
Determines response output destination |
| Results Attribute |
The name of the attribute to write the OCR response to. |
| Schema |
The Snowflake schema containing the stage |
| Snowflake Connection Service |
Database Connection Service for accessing Snowflake |
| Stage |
The Snowflake stage where PDFs will be temporarily stored. The stage must have server-side encryption enabled. FlowFile attributes may be referenced via Expression Language |
## Relationships
| Name |
Description |
| empty |
FlowFiles for which OCR results are empty |
| failure |
FlowFiles that cannot be processed are routed to this relationship |
| success |
FlowFiles that are successfully processed (with non-empty OCR results) are routed to this relationship |
## Writes attributes
| Name |
Description |
| mime.type |
The MIME type of the output content (text/plain when output strategy is FLOW_FILE) |
| snowflake.error.information |
Contains error information if Snowflake Cortex OCR operation returns an error |
## See also
- [com.snowflake.openflow.runtime.processors.snowflake.PutSnowflakeInternalStageFile](/user-guide/data-integration/openflow/processors/putsnowflakeinternalstagefile)
---
title: PickTablesForReplication 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/picktablesforreplication.md
section: Loading & Unloading Data
---
# PickTablesForReplication 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-database-cdc-processors-nar
## Description
Accepts a list of fully qualified table names and determines if a table: - is new (is not replicated, but was added in the source) - is existing (is replicated and exists in the source) - is stale (is replicated but no longer exists in the source) Configuration is passed as a FlowFile attribute. Processor generates a separate FlowFile for each source table.
## Tags
snowflake, state, table
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Table State Service |
A service containing currently replicated tables and their states |
## Relationships
| Name |
Description |
| existing |
FlowFile with qualified table name that is already being replicated |
| failure |
If a FlowFile attribute cannot be read or is incorrect, it will be routed to this Relationship. |
| new |
FlowFile with qualified table name that was is not replicated |
| stale |
FlowFile with qualified table name that used to be replicated but no longer is, either because it was removed from source database or excluded by parameter |
## Writes attributes
| Name |
Description |
| source.schema.name |
Name of the schema of the table from which an event originated |
| source.table.name |
Name of the table from which an event originated |
---
title: PolarisIcebergCatalog
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/polarisicebergcatalog.md
section: Loading & Unloading Data
---
# PolarisIcebergCatalog
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Provides Apache Iceberg integration with Apache Polaris Catalog access over REST HTTP
## Tags
catalog, iceberg, openflow, polaris
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Access Token Scopes * |
Access Token Scopes |
catalog |
|
Comma-separated list of one or more OAuth 2 scopes requested for Access Tokens |
| Authentication Strategy * |
Authentication Strategy |
OAUTH2 |
- Bearer Authentication
- OAuth 2.0
|
Strategy for authenticating with the Apache Iceberg Catalog over HTTP |
| Authorization Grant Type * |
Authorization Grant Type |
CLIENT_CREDENTIALS |
- Client Credentials |
OAuth 2.0 Authorization Grant Type for obtaining Access Tokens |
| Authorization Server URI * |
Authorization Server URI |
|
|
Authorization Server URI supporting OAuth 2 |
| Bearer Token * |
Bearer Token |
|
|
Bearer Token for authentication to Apache Iceberg Catalog |
| Catalog URI * |
Catalog URI |
|
|
Apache Iceberg Catalog REST URI |
| Client ID * |
Client ID |
|
|
Client ID for OAuth 2 Client Credentials |
| Client Secret * |
Client Secret |
|
|
Client Secret for OAuth 2 Client Credentials |
| Warehouse Location |
Warehouse Location |
|
|
Apache Iceberg Catalog Warehouse location or identifier |
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: PromptAnthropicAI 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/promptanthropicai.md
section: Loading & Unloading Data
---
# PromptAnthropicAI 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-anthropic-nar
## Description
Sends a prompt to Anthropic, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. The prompt may consist of pure text interaction or may include an image. Use dynamic properties to enable beta features in the Anthropic endpoint.
## Tags
ai, anthropic, chat, image, openflow, prompt, text
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Anthropic API Key |
The API Key for authenticating to Anthropic |
| Assistant Message |
The assistant message to send to Anthropic. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}. The assistant message is added last |
| Image MIME Type |
The MIME type of the image in the FlowFile content. Supported types are image/jpeg, image/png, image/gif, and image/webp. |
| Max File Size |
The maximum size of a FlowFile that can be sent to Anthropic as an image. If the FlowFile is larger than this, it will be routed to 'failure'. |
| Max Tokens |
The maximum number of tokens to generate |
| Model Name |
The name of the Anthropic model |
| Output Strategy |
Determines response output destination |
| Prompt Type |
The type of prompt to send to Anthropic. TEXT to send a simple prompt. IMAGE to send an image first and then a prompt. Use JSON for advanced use of Anthropic's /v1/messages endpoint. |
| Response Format |
The format of the response from Anthropic |
| Results Attribute |
The name of the attribute to write the response to. |
| Stop Sequences |
A comma delimited list of strings act as stop sequences. The model will halt after encountering one of the stop sequences. |
| System Message |
The system message to send to Anthropic. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\} |
| Temperature |
The temperature to use for generating the response. Defaults to 1.0. Ranges from 0.0 to 1.0. Use temperature closer to 0.0 for analytical / multiple choice, and closer to 1.0 for creative and generative tasks. |
| Top K |
The top K value to use for generating the response. Only sample from the top K options for each subsequent token. Recommended for advanced use cases only. You usually only need to use temperature. |
| Top P |
The top P value to use for generating the response. Top P is for nucleus sampling, we compute the cumulative distribution over all the options for each subsequent token in decreasing probability order and cut it off once it reaches a particular probability specified by top_p. Recommended for advanced use cases only. You usually only need to use temperature. |
| User ID |
The user id to set in the request metadata |
| User Message |
The user message to send to Anthropic. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}. The user message is added first, unless an image is present. |
| Web Client Service |
The Web Client Service to use for communicating with Anthropic |
## Relationships
| Name |
Description |
| failure |
If unable to obtain a valid response from Anthropic, the original FlowFile will be routed to this relationship |
| retry |
If a 5XX response from Anthropic is returned, the original FlowFile will be routed to this relationship |
| success |
The response from Anthropic is routed to this relationship |
## Writes attributes
| Name |
Description |
| anthropic.usage.inputTokens |
The number of input tokens read in the request. |
| anthropic.usage.outputTokens |
The number of output tokens generated in the response. |
| anthropic.chat.completion.id |
A unique id assigned to the conversation |
| anthropic.chat.completion.stop.reason |
The reason that we stopped. |
| anthropic.chat.completion.stop.sequence |
Which custom stop sequence was generated, if any, may be 'null'. |
| mime.type |
The mime type of the response. |
| filename |
An updated filename for the response. |
---
title: PromptAzureOpenAI 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/promptazureopenai.md
section: Loading & Unloading Data
---
# PromptAzureOpenAI 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-openai-nar
## Description
Sends a prompt to Azure's OpenAI service, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. The prompt may consist of pure text interaction or may include images. In the case of images, a URL may be provided, or the contents of the FlowFile may be used, depending on the provided configuration
## Tags
ai, azure, chat, image, openai, openflow, prompt, text
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| API Key |
The API key for authenticating to the Azure OpenAI service |
| Deployment Name |
The name of the OpenAI model deployment |
| Detail Level |
The image detail level that OpenAI should use for processing the image. Low detail will be less expensive and lower latency, while a high level may provide better results. |
| Image MIME Type |
The MIME type of the image |
| Image URL |
The URL of the image to send to OpenAI. If not specified, the contents of the FlowFile will be used as the image. |
| Max File Size |
The maximum size of a FlowFile that can be sent to OpenAI as an image. If the FlowFile is larger than this, it will be routed to 'failure'. |
| Max Tokens |
The maximum number of tokens to generate |
| OpenAI Service Name |
The name of the OpenAI service to use |
| Prompt Type |
The type of prompt to send to OpenAI |
| Response Format |
The format of the response from OpenAI |
| Results Attribute |
The name of the attribute to write the response to. If unset, the response will be written to the FlowFile content. |
| Seed |
The seed to use for generating the response |
| System Message |
The system message to send to OpenAI. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\} |
| Temperature |
The temperature to use for generating the response. |
| Top P |
The top P value to use for generating the response |
| User |
Your end user, sent to OpenAI for monitoring and detection of abuse |
| User Message |
The user message to send to OpenAI. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\} |
| Web Client Service |
The Web Client Service to use for communicating with OpenAI |
## Relationships
| Name |
Description |
| failure |
If unable to obtain a valid response from Azure OpenAI, the original FlowFile will be routed to this relationship |
| success |
The response from Azure OpenAI is routed to this relationship |
## See also
- [com.snowflake.openflow.runtime.processors.openai.CreateAzureOpenAiEmbeddings](/user-guide/data-integration/openflow/processors/createazureopenaiembeddings)
- [com.snowflake.openflow.runtime.processors.openai.PromptOpenAI](/user-guide/data-integration/openflow/processors/promptopenai)
---
title: PromptLLM 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/promptllm.md
section: Loading & Unloading Data
---
# PromptLLM 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-llm-processors-nar
## Description
This processor sends a user defined prompt to a Large Language Model (LLM) to respond.
## Tags
ai, llm, openflow, prompt, text processing
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Assistant Message |
The assistant message to send to the LLM. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}. The assistant message is added last |
| LLM Provider Service |
The provider service for sending evaluation prompts to LLM |
| Output Strategy |
Determines response output destination |
| Results Attribute |
The name of the attribute to write the response to. |
| System Message |
The system message to send to the LLM. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}. The system message is added first. |
| User Message |
The user message to send to the LLM. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}. |
## Relationships
| Name |
Description |
| failure |
FlowFiles that cannot be processed are routed to this relationship |
| success |
FlowFiles that are successfully processed are routed to this relationship |
---
title: PromptOpenAI 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/promptopenai.md
section: Loading & Unloading Data
---
# PromptOpenAI 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-openai-nar
## Description
Sends a prompt to OpenAI, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. The prompt may consist of pure text interaction or may include images. In the case of images, a URL may be provided, or the contents of the FlowFile may be used, depending on the provided configuration
## Tags
ai, chat, image, openai, openflow, prompt, text
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Detail Level |
The image detail level that OpenAI should use for processing the image. Low detail will be less expensive and lower latency, while a high level may provide better results. |
| Image MIME Type |
The MIME type of the image |
| Image Model Name |
The name of the OpenAI model |
| Image URL |
The URL of the image to send to OpenAI. If not specified, the contents of the FlowFile will be used as the image. |
| Max File Size |
The maximum size of a FlowFile that can be sent to OpenAI as an image. If the FlowFile is larger than this, it will be routed to 'failure'. |
| Max Tokens |
The maximum number of tokens to generate |
| OpenAI API Key |
The API Key for authenticating to OpenAI |
| OpenAI Organization |
The organization to use for OpenAI |
| Prompt Type |
The type of prompt to send to OpenAI |
| Response Format |
The format of the response from OpenAI |
| Results Attribute |
The name of the attribute to write the response to. If unset, the response will be written to the FlowFile content. |
| Seed |
The seed to use for generating the response |
| System Message |
The system message to send to OpenAI. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\} |
| Temperature |
The temperature to use for generating the response. |
| Text Model Name |
The name of the OpenAI model |
| Top P |
The top P value to use for generating the response |
| User |
Your end user, sent to OpenAI for monitoring and detection of abuse |
| User Message |
The user message to send to OpenAI. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\} |
| Web Client Service |
The Web Client Service to use for communicating with OpenAI |
## Relationships
| Name |
Description |
| failure |
If unable to obtain a valid response from OpenAI, the original FlowFile will be routed to this relationship |
| success |
The response from OpenAI is routed to this relationship |
## See also
- [com.snowflake.openflow.runtime.processors.openai.CreateOpenAiEmbeddings](/user-guide/data-integration/openflow/processors/createopenaiembeddings)
- [com.snowflake.openflow.runtime.processors.openai.PromptAzureOpenAI](/user-guide/data-integration/openflow/processors/promptazureopenai)
---
title: PromptSnowflakeCortex 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/promptsnowflakecortex.md
section: Loading & Unloading Data
---
# PromptSnowflakeCortex 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-snowflake-processors-nar
## Description
Sends a prompt to Snowflake Cortex, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. The prompt may consist of pure text interaction only.
## Tags
ai, chat, cortex, openflow, prompt, snowflake, text
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Enable Cortex Guardrails |
Filters potentially unsafe and harmful responses from a language model. Either true or false. |
| Max Tokens |
The maximum number of tokens to generate |
| Output Strategy |
Determines response output destination |
| Response Format |
The format of the response from Snowflake Cortex |
| Results Attribute |
The name of the attribute to write the response to. |
| Snowflake Connection Service |
Database Connection Service for accessing Snowflake |
| System Message |
The system message to send to Snowflake Cortex. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\} |
| Temperature |
The temperature to use for generating the response. |
| Text Model Name |
The name of the Snowflake Cortex model |
| Top P |
The top P value to use for generating the response |
| User Message |
The user message to send to Snowflake Cortex. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\} |
## Relationships
| Name |
Description |
| failure |
If unable to obtain a valid response from Snowflake Cortex, the original FlowFile will be routed to this relationship |
| success |
The response from Snowflake Cortex is routed to this relationship |
---
title: PromptVertexAI 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/promptvertexai.md
section: Loading & Unloading Data
---
# PromptVertexAI 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-vertexai-nar
## Description
Sends a prompt to VertexAI, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile. The prompt may consist of pure text interaction or may include multimedia.
## Tags
ai, chat, cloud, gcp, google, image, openflow, pdf, prompt, text, video
## Input Requirement
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| GCP Credentials Service |
The Controller Service used to obtain Google Cloud Platform credentials. |
| GCP Location |
The location to configure the Vertex client with |
| GCP Project ID |
The project ID to configure the Vertex client with |
| Max File Size |
The maximum size of a FlowFile that can be sent to Vertex as an image. If the FlowFile is larger than this, it will be routed to 'failure'. |
| Max Tokens |
The maximum number of tokens to generate |
| Media MIME Type |
The MIME type of the media in the FlowFile content. Supported media types are listed here: [https://firebase.google.com/docs/vertex-ai/input-file-requirements](https://firebase.google.com/docs/vertex-ai/input-file-requirements) |
| Model Name |
The name of the Vertex model |
| Output Strategy |
Determines response output destination |
| Prompt Type |
The type of prompt to send to Vertex. Text to send a simple prompt. Media to send a multimedia type first followed by a text prompt. |
| Response Format |
The format of the response from Vertex |
| Results Attribute |
The name of the attribute to write the response to. |
| Stop Sequences |
A comma delimited list of strings act as stop sequences. The model will halt after encountering one of the stop sequences. |
| System Message |
The system message to send to Vertex. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\} |
| Temperature |
The temperature to use for generating the response. Defaults to 1.0. Ranges from 0.0 to 1.0. Use temperature closer to 0.0 for analytical / multiple choice, and closer to 1.0 for creative and generative tasks. |
| Top K |
The top K value to use for generating the response. Only sample from the top K options for each subsequent token. Recommended for advanced use cases only. You usually only need to use temperature. |
| Top P |
The top P value to use for generating the response. Top P is for nucleus sampling, we compute the cumulative distribution over all the options for each subsequent token in decreasing probability order and cut it off once it reaches a particular probability specified by top_p. Recommended for advanced use cases only. You usually only need to use temperature. |
| User Message |
The user message to send to Vertex. FlowFile attributes may be referenced via Expression Language, and the contents of the FlowFile may be referenced via the flowfile_content variable. E.g., $\{flowfile_content\}. The user message is added first, unless an image is present. |
## Relationships
| Name |
Description |
| failure |
If unable to obtain a valid response from Vertex, the original FlowFile will be routed to this relationship |
| success |
The response from Vertex is routed to this relationship |
## Writes attributes
| Name |
Description |
| vertex.usage.inputTokens |
The number of input tokens read in the request. |
| vertex.usage.outputTokens |
The number of output tokens generated in the response. |
| vertex.chat.completion.id |
A unique id assigned to the conversation |
| mime.type |
The mime type of the response. |
| filename |
An updated filename for the response. |
---
title: PropertiesFileLookupService
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/propertiesfilelookupservice.md
section: Loading & Unloading Data
---
# PropertiesFileLookupService
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
A reloadable properties file-based lookup service
## Tags
cache, enrich, join, key, lookup, properties, reloadable, value
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Configuration File * |
configuration-file |
|
|
A configuration file |
## State management
This component does not store state.
## Restricted
## Restrictions
| Required Permission |
Explanation |
| read filesystem |
Provides operator the ability to read from any file that NiFi has access to. |
## System Resource Considerations
This component does not specify system resource considerations.
---
title: ProtobufReader
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/protobufreader.md
section: Loading & Unloading Data
---
# ProtobufReader
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Parses a Protocol Buffers message from binary format.
## Tags
parser, protobuf, reader, record
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Message Type * |
Message Type |
|
|
Fully qualified name of the Protocol Buffers message type including its package (eg. mypackage.MyMessage). The .proto files configured in 'Proto Directory' must contain the definition of this message type. |
| Proto Directory * |
Proto Directory |
|
|
Directory containing Protocol Buffers message definition (.proto) file(s). |
| Schema Access Strategy * |
Schema Access Strategy |
generate-from-proto-file |
- Use 'Schema Name' Property
- Use 'Schema Text' Property
- Schema Reference Reader
- Generate from Proto file
|
Specifies how to obtain the schema that is to be used for interpreting the data. |
| Schema Branch |
Schema Branch |
|
|
Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored. |
| Schema Name |
Schema Name |
$\{schema.name\} |
|
Specifies the name of the schema to lookup in the Schema Registry property |
| Schema Reference Reader * |
Schema Reference Reader |
|
|
Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier |
| Schema Registry |
Schema Registry |
|
|
Specifies the Controller Service to use for the Schema Registry |
| Schema Text |
Schema Text |
$\{avro.schema\} |
|
The text of an Avro-formatted Schema |
| Schema Version |
Schema Version |
|
|
Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved. |
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: Publish Data from Snowflake to SAP® BDC Connect for Snowflake
source: https://docs.snowflake.com/en/user-guide/data-integration/zero-copy/sap-sql/publish-data.md
section: Loading & Unloading Data
---
# Publish Data from Snowflake to %sapbdc%
- [](/user-guide/data-integration/zero-copy/sap-sql/setup)
- [](/user-guide/data-integration/zero-copy/sap-sql/explore-data-products)
- [](/user-guide/data-integration/zero-copy/sap-sql/security)
This topic describes how to publish Snowflake data back to %sapbdc% by
creating a share, granting access to databases, schemas, and tables, and
associating the share with a Zerocopy Connector.
The connector must be in `CONNECTED` state and have `SHARE_BACK` enabled
before associating a share. See [](/user-guide/data-integration/zero-copy/sap-sql/setup) for details.
## Enable Share Back
Before publishing data to %sapbdc%, enable share back on the connector:
```sql
ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector
SET SHARE_BACK = TRUE;
```
The role used to create the share must have the `CREATE SHARE` privilege
on the account. For the full list of required privileges, see
[](/user-guide/data-integration/zero-copy/sap-sql/security).
## Grant Access to Snowflake Objects
To publish Snowflake data to %sapbdc%, you first create a Snowflake share and
grant access to the databases, schemas, and tables you want to publish. For
more information about creating and managing shares, see
[](/sql-reference/sql/create-share).
- Only Iceberg V3 tables with copy-on-write enabled can be shared with a
Zerocopy Connector. For more information, see
[](#label-tables-iceberg-row-level-deletes).
- Iceberg tables must use Snowflake as the catalog
([Snowflake-managed Iceberg tables](#label-tables-iceberg-snowflake-as-catalog)).
To specify this when creating a table, use `CATALOG = 'SNOWFLAKE'` and
`STORAGE_SERIALIZATION_POLICY = 'COMPATIBLE'`. Alternatively, you can
set both properties at the database or schema level so that all tables
automatically inherit them. For more information, see
[](/sql-reference/sql/create-iceberg-table-snowflake).
- Each shared data product should map to a single dedicated database.
To create an Iceberg table that can be shared with a Zerocopy Connector:
```sql
CREATE ICEBERG TABLE my_publish_db.my_schema.my_table (
id STRING,
name STRING,
value NUMBER
)
ICEBERG_VERSION = 3
-- The following parameters can be omitted if they have been set at the parent schema, database, or account level.
CATALOG = 'SNOWFLAKE'
ICEBERG_MERGE_ON_READ_BEHAVIOR = 'DISABLED'
STORAGE_SERIALIZATION_POLICY = 'COMPATIBLE';
```
### Create a Share
To create a share, the role must have the `CREATE SHARE` privilege on the
account. For the full list of required privileges, see [](/user-guide/data-integration/zero-copy/sap-sql/security).
Create a share using [CREATE SHARE](/sql-reference/sql/create-share):
```sql
CREATE SHARE IF NOT EXISTS my_share;
```
### Grant Access to the Share
Grant `USAGE` on the database:
```sql
GRANT USAGE ON DATABASE my_publish_db TO SHARE my_share;
```
Grant `USAGE` on the schema:
```sql
GRANT USAGE ON SCHEMA my_publish_db.my_schema TO SHARE my_share;
```
Grant `SELECT` on a specific table:
```sql
GRANT SELECT ON TABLE my_publish_db.my_schema.my_table TO SHARE my_share;
```
### Associate the Share with the Connector
After granting access, associate the share with the Zerocopy Connector:
```sql
ALTER ZEROCOPY CONNECTOR my_db.my_schema.my_sap_connector
ADD SHARE my_share;
```
To view the shares associated with a Zerocopy Connector, use
`DESC ZEROCOPY CONNECTOR`:
```sql
DESC ZEROCOPY CONNECTOR my_db.my_schema.my_sap_connector;
```
## Revoke Access
To disassociate a share from the Zerocopy Connector:
```sql
ALTER ZEROCOPY CONNECTOR my_db.my_schema.my_sap_connector
REMOVE SHARE my_share;
```
To revoke access to a previously granted object from the share:
```sql
REVOKE USAGE ON DATABASE my_publish_db FROM SHARE my_share;
REVOKE USAGE ON SCHEMA my_publish_db.my_schema FROM SHARE my_share;
REVOKE SELECT ON TABLE my_publish_db.my_schema.my_table FROM SHARE my_share;
```
## Publish a Data Product to %sapbdc%
After granting access to Snowflake objects, publish the data product to
SAP® BDC by calling the `SYSTEM$SAP_PUBLISH_DATA_PRODUCT` function. This
makes the data product discoverable and accessible from the SAP® BDC side.
The `OPERATE` privilege on the connector is required to call
`SYSTEM$SAP_PUBLISH_DATA_PRODUCT`.
```sql
SELECT SYSTEM$SAP_PUBLISH_DATA_PRODUCT(
'',
'',
'',
''
);
```
For example:
```sql
SELECT SYSTEM$SAP_PUBLISH_DATA_PRODUCT(
'my_db.my_schema.my_sap_connector',
'my_share',
'{
"title": "Airline Data Product",
"shortDescription": "Airline dimension data from Snowflake.",
"description": "Contains airline identifiers and attributes published from Snowflake to SAP BDC."
}',
'{
"csnInteropEffective": "1.2",
"$version": "2.0",
"meta": {
"document": {
"version": "1.2.3",
"doc": "This is a minimal CSN example document."
}
},
"definitions": {
"AirlineService": {
"kind": "service",
"doc": "This is describing the service that exposes the CDS entities through an API."
},
"AirlineService.Airline": {
"kind": "entity",
"doc": "Human readable description of the entity, in **markdown**.",
"@EndUserText.label": "Airline",
"@ObjectModel.modelingPattern": {
"#": "ANALYTICAL_DIMENSION"
},
"elements": {
"AirlineID": {
"doc": "Human readable description of the element, in **markdown**.",
"key": true,
"type": "cds.UUID"
}
}
}
}
}'
);
```
| Parameter |
Description |
| `connector_name` |
Fully qualified name of the Zerocopy Connector
(e.g., `my_db.my_schema.my_sap_connector`).
|
| `snowflake_share_name` |
Name of the Snowflake share, also the name of the share on the SAP® BDC
side.
|
| `open_resource_discovery_metadata` |
A JSON object describing the data product in SAP® BDC. Contains the
following fields:
- `title`: Display name of the data product.
- `shortDescription`: Brief summary of the data product.
- `description`: Full description of the data product.
|
| `csn_document_json` |
The SAP® Core Schema Notation (CSN) JSON payload describing the
structure of the data product. Provided by the caller.
|
If the function fails to resolve `connector_name` or
`snowflake_share_name`, verify that the names use the correct case.
Snowflake identifiers are case-sensitive when quoted. For more information,
see [](/sql-reference/identifiers-syntax).
---
title: PublishAMQP 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishamqp.md
section: Loading & Unloading Data
---
# PublishAMQP 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-amqp-nar
## Description
Creates an AMQP Message from the contents of a FlowFile and sends the message to an AMQP Exchange. In a typical AMQP exchange model, the message that is sent to the AMQP Exchange will be routed based on the 'Routing Key' to its final destination in the queue (the binding). If due to some misconfiguration the binding between the Exchange, Routing Key and Queue is not set up, the message will have no final destination and will return (i.e., the data will not make it to the queue). If that happens you will see a log in both app-log and bulletin stating to that effect, and the FlowFile will be routed to the 'failure' relationship.
## Tags
amqp, message, publish, put, rabbit, send
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| AMQP Version |
AMQP Version. Currently only supports AMQP v0.9.1. |
| Brokers |
A comma-separated list of known AMQP Brokers in the format <host>:<port> (e.g., localhost:5672). If this is set, Host Name and Port are ignored. Only include hosts from the same AMQP cluster. |
| Client Certificate Authentication Enabled |
Authenticate using the SSL certificate rather than user name/password. |
| Exchange Name |
The name of the AMQP Exchange the messages will be sent to. Usually provided by the AMQP administrator (e.g., 'amq.direct'). It is an optional property. If kept empty the messages will be sent to a default AMQP exchange. |
| Header Separator |
The character that is used to split key-value for headers. The value must only one character. Otherwise you will get an error message |
| Headers Pattern |
Regular expression that will be evaluated against the FlowFile attributes to select the matching attributes and put as AMQP headers. Attribute name will be used as header key. |
| Headers Source |
The source of the headers which will be applied to the published message. |
| Host Name |
Network address of AMQP broker (e.g., localhost). If Brokers is set, then this property is ignored. |
| Password |
Password used for authentication and authorization. |
| Port |
Numeric value identifying Port of AMQP broker (e.g., 5671). If Brokers is set, then this property is ignored. |
| Routing Key |
The name of the Routing Key that will be used by AMQP to route messages from the exchange to a destination queue(s). Usually provided by the administrator (e.g., 'myKey')In the event when messages are sent to a default exchange this property corresponds to a destination queue name, otherwise a binding from the Exchange to a Queue via Routing Key must be set (usually by the AMQP administrator) |
| SSL Context Service |
The SSL Context Service used to provide client certificate information for TLS/SSL connections. |
| Username |
Username used for authentication and authorization. |
| Virtual Host |
Virtual Host name which segregates AMQP system for enhanced security. |
## Relationships
| Name |
Description |
| failure |
All FlowFiles that cannot be routed to the AMQP destination are routed to this relationship |
| success |
All FlowFiles that are sent to the AMQP destination are routed to this relationship |
---
title: PublishChangeDataSnowpipeStreaming 2026.4.28.15
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishchangedatasnowpipestreaming.md
section: Loading & Unloading Data
---
# PublishChangeDataSnowpipeStreaming 2026.4.28.15
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-snowpipe-streaming-2-processors-nar
## Description
Publishes change data records formatted as Newline Delimited JSON to Snowflake Database Pipes using Snowpipe Streaming High Availability. The processor supports **Concurrency Group** serialization so FlowFiles that share the same group are not processed against the channel concurrently.
After data is transferred, the processor waits for the streaming channel to report committed offset tokens (according to **Offset Tracking Resolution** and **Offset Tracking Timeout**) before routing FlowFiles to **success**, **invalid**, or **failure**. It can run when the incoming connection has no FlowFiles so that pending batches finish polling.
## Tags
CDC, Change Data Capture, NDJSON, Preview, Snowflake, Snowpipe Streaming
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Account |
Snowflake Account Identifier with Organization Name and Account Name formatted as [organization-name]-[account-name] |
| Authentication Strategy |
Strategy for authenticating Snowflake connections |
| Channel Group |
Group for managing distinct Snowpipe Streaming Channels with partitioning |
| Channel Insert Timeout |
Maximum duration to retry inserting records before failing with an upper bound of 5 minutes |
| Concurrency Group |
Controls access to the configured channel with serialized claims according to the configured value or expression |
| Database |
Snowflake Database destination for processed records |
| Destination Type |
Snowflake destination object for processed records with support for derived default pipes |
| Offset Token End Expression |
Expression Language definition to produce the highest offset token for a FlowFile as a monotonically increasing number |
| Offset Token Record Pointer |
JSON Pointer to offset token in each record required when the last committed offset token is between start and end boundaries |
| Offset Token Start Expression |
Expression Language definition to produce the lowest offset token for a FlowFile as a monotonically increasing number |
| Offset Tracking Resolution |
Resolution level for evaluating committed offset tokens against input FlowFiles and records. **Disabled**: opaque offset token handling without tracking across FlowFiles or records. **FlowFile**: track each FlowFile with monotonically increasing offset tokens. **Record**: track each record in each FlowFile with monotonically increasing offset tokens. |
| Offset Tracking Timeout |
Maximum duration to wait for channel status to confirm committed offset tokens before routing to failure |
| Pipe |
Snowflake Pipe destination for processed records |
| Private Key Service |
RSA Private Key Service for authenticating connections |
| Role |
Snowflake Role the user will assume when authenticating connections |
| Schema |
Snowflake Schema destination for processed records |
| Table |
Snowflake Table destination for processed records |
| Transfer Strategy |
Strategy for transferring records to Snowpipe Streaming. **Managed**: transfer records as either batches of rows or file fragments based on uncompressed size. **Rows**: transfer records as batches of rows over HTTP to Snowpipe Streaming. **File Fragments**: transfer records as file fragments over HTTP to cloud storage services. |
| User |
Snowflake User for authenticating connections |
| Web Client Service Provider |
Web Client Service Provider supporting HTTP request and response handling |
## Relationships
| Name |
Description |
| empty |
FlowFiles with empty content not sent to Snowflake |
| failure |
FlowFiles that failed to upload to Snowflake |
| invalid |
FlowFiles that Snowflake identified as containing one or more invalid rows resulting in partial transmission |
| success |
FlowFiles successfully uploaded to Snowflake |
---
title: PublishGCPubSub 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishgcpubsub.md
section: Loading & Unloading Data
---
# PublishGCPubSub 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-gcp-nar
## Description
Publishes the content of the incoming flowfile to the configured Google Cloud PubSub topic. The processor supports dynamic properties. If any dynamic properties are present, they will be sent along with the message in the form of 'attributes'.
## Tags
gcp, google, google-cloud, message, publish, pubsub
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| GCP Credentials Provider Service |
The Controller Service used to obtain Google Cloud Platform credentials. |
| Input Batch Size |
Maximum number of FlowFiles processed for each Processor invocation |
| Maximum Message Size |
The maximum size of a Google PubSub message in bytes. Defaults to 1 MB (1048576 bytes) |
| Message Derivation Strategy |
The strategy used to publish the incoming FlowFile to the Google Cloud PubSub endpoint. |
| Record Reader |
The Record Reader to use for incoming FlowFiles |
| Record Writer |
The Record Writer to use in order to serialize the data before sending to GCPubSub endpoint |
| api-endpoint |
Override the gRPC endpoint in the form of [host:port] |
| gcp-batch-bytes |
Publish request gets triggered based on this Batch Bytes Threshold property and the Batch Size Threshold property, whichever condition is met first. |
| gcp-project-id |
Google Cloud Project ID |
| gcp-pubsub-publish-batch-delay |
Indicates the delay threshold to use for batching. After this amount of time has elapsed (counting from the first element added), the elements will be wrapped up in a batch and sent. This value should not be set too high, usually on the order of milliseconds. Otherwise, calls might appear to never complete. |
| gcp-pubsub-publish-batch-size |
Indicates the number of messages the cloud service should bundle together in a batch. If not set and left empty, only one message will be used in a batch |
| gcp-pubsub-topic |
Name of the Google Cloud PubSub Topic |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
FlowFiles are routed to this relationship if the Google Cloud Pub/Sub operation fails. |
| retry |
FlowFiles are routed to this relationship if the Google Cloud Pub/Sub operation fails but attempting the operation again may succeed. |
| success |
FlowFiles are routed to this relationship after a successful Google Cloud Pub/Sub operation. |
## Writes attributes
| Name |
Description |
| gcp.pubsub.messageId |
ID of the pubsub message published to the configured Google Cloud PubSub topic |
| gcp.pubsub.count.records |
Count of pubsub messages published to the configured Google Cloud PubSub topic |
| gcp.pubsub.topic |
Name of the Google Cloud PubSub topic the message was published to |
## See also
- [org.apache.nifi.processors.gcp.pubsub.ConsumeGCPubSub](/user-guide/data-integration/openflow/processors/consumegcpubsub)
---
title: PublishJMS 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishjms.md
section: Loading & Unloading Data
---
# PublishJMS 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-jms-processors-nar
## Description
Creates a JMS Message from the contents of a FlowFile and sends it to a JMS Destination (queue or topic) as JMS BytesMessage or TextMessage. FlowFile attributes will be added as JMS headers and/or properties to the outgoing JMS message.
## Tags
jms, message, publish, put, send
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Connection Client ID |
The client id to be set on the connection, if set. For durable non shared consumer this is mandatory, for all others it is optional, typically with shared consumers it is undesirable to be set. Please see JMS spec for further details |
| Connection Factory Service |
The Controller Service that is used to obtain Connection Factory. Alternatively, the 'JNDI *' or the 'JMS *' properties can also be used to configure the Connection Factory. |
| Destination Name |
The name of the JMS Destination. Usually provided by the administrator (e.g., 'topic://myTopic' or 'myTopic'). |
| Destination Type |
The type of the JMS Destination. Could be one of 'QUEUE' or 'TOPIC'. Usually provided by the administrator. Defaults to 'QUEUE' |
| Maximum Batch Size |
The maximum number of messages to publish or consume in each invocation of the processor. |
| Password |
Password used for authentication and authorization. |
| SSL Context Service |
The SSL Context Service used to provide client certificate information for TLS/SSL connections. |
| User Name |
User Name used for authentication and authorization. |
| allow-illegal-chars-in-jms-header-names |
Specifies whether illegal characters in header names should be sent to the JMS broker. Usually hyphens and full-stops. |
| attributes-to-send-as-jms-headers-regex |
Specifies the Regular Expression that determines the names of FlowFile attributes that should be sent as JMS Headers |
| broker |
URI pointing to the network location of the JMS Message broker. Example for ActiveMQ: '[tcp://myhost:61616](tcp://myhost:61616)'. Examples for IBM MQ: 'myhost(1414)' and 'myhost01(1414),myhost02(1414)'. |
| cf |
The fully qualified name of the JMS ConnectionFactory implementation class (eg. org.apache.activemq. ActiveMQConnectionFactory). |
| cflib |
Path to the directory with additional resources (eg. JARs, configuration files etc.) to be added to the classpath (defined as a comma separated list of values). Such resources typically represent target JMS client libraries for the ConnectionFactory implementation. |
| character-set |
The name of the character set to use to construct or interpret TextMessages |
| connection.factory.name |
The name of the JNDI Object to lookup for the Connection Factory. |
| java.naming.factory.initial |
The fully qualified class name of the JNDI Initial Context Factory Class (java.naming.factory.initial). |
| java.naming.provider.url |
The URL of the JNDI Provider to use as the value for java.naming.provider.url. See additional details documentation for allowed URL schemes. |
| java.naming.security.credentials |
The Credentials to use when authenticating with JNDI (java.naming.security.credentials). |
| java.naming.security.principal |
The Principal to use when authenticating with JNDI (java.naming.security.principal). |
| message-body-type |
The type of JMS message body to construct. |
| naming.factory.libraries |
Specifies jar files and/or directories to add to the ClassPath in order to load the JNDI / JMS client libraries. This should be a comma-separated list of files, directories, and/or URLs. If a directory is given, any files in that directory will be included, but subdirectories will not be included (i.e., it is not recursive). |
| record-reader |
The Record Reader to use for parsing the incoming FlowFile into Records. |
| record-writer |
The Record Writer to use for serializing Records before publishing them as an JMS Message. |
## Restrictions
| Required Permission |
Explanation |
| reference remote resources |
Client Library Location can reference resources over HTTP |
## Relationships
| Name |
Description |
| failure |
All FlowFiles that cannot be sent to JMS destination are routed to this relationship |
| success |
All FlowFiles that are sent to the JMS destination are routed to this relationship |
## See also
- [org.apache.nifi.jms.processors.ConsumeJMS](/user-guide/data-integration/openflow/processors/consumejms)
---
title: PublishKafka 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishkafka.md
section: Loading & Unloading Data
---
# PublishKafka 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-kafka-nar
## Description
Sends the contents of a FlowFile as either a message or as individual records to Apache Kafka using the Kafka Producer API. The messages to send may be individual FlowFiles, may be delimited using a user-specified delimiter (such as a new-line), or may be record-oriented data that can be read by the configured Record Reader. The complementary NiFi processor for fetching messages is ConsumeKafka.
## Tags
apache, avro, csv, json, kafka, logs, message, openflow, pubsub, put, record, send
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Failure Strategy |
Specifies how the processor handles a FlowFile if it is unable to publish the data to Kafka |
| FlowFile Attribute Header Pattern |
A Regular Expression that is matched against all FlowFile attribute names. Any attribute whose name matches the pattern will be added to the Kafka messages as a Header. If not specified, no FlowFile attributes will be added as headers. |
| Header Encoding |
For any attribute that is added as a Kafka Record Header, this property indicates the Character Encoding to use for serializing the headers. |
| Kafka Connection Service |
Provides connections to Kafka Broker for publishing Kafka Records |
| Kafka Key |
The Key to use for the Message. If not specified, the FlowFile attribute 'kafka.key' is used as the message key, if it is present. Beware that setting Kafka key and demarcating at the same time may potentially lead to many Kafka messages with the same key. Normally this is not a problem as Kafka does not enforce or assume message and key uniqueness. Still, setting the demarcator and Kafka key at the same time poses a risk of data loss on Kafka. During a topic compaction on Kafka, messages will be deduplicated based on this key. |
| Kafka Key Attribute Encoding |
FlowFiles that are emitted have an attribute named 'kafka.key'. This property dictates how the value of the attribute should be encoded. |
| Message Demarcator |
Specifies the string (interpreted as UTF-8) to use for demarcating multiple messages within a single FlowFile. If not specified, the entire content of the FlowFile will be used as a single message. If specified, the contents of the FlowFile will be split on this delimiter and each section sent as a separate Kafka message. To enter special character such as 'new line' use CTRL+Enter or Shift+Enter, depending on your OS. |
| Message Key Field |
The name of a field in the Input Records that should be used as the Key for the Kafka message. |
| Publish Strategy |
The format used to publish the incoming FlowFile record to Kafka. |
| Record Key Writer |
The Record Key Writer to use for outgoing FlowFiles |
| Record Metadata Strategy |
Specifies whether the Record 's metadata (topic and partition) should come from the Record's metadata field or if it should come from the configured Topic Name and Partition / Partitioner class properties |
| Record Reader |
The Record Reader to use for incoming FlowFiles |
| Record Writer |
The Record Writer to use in order to serialize the data before sending to Kafka |
| Topic Name |
Name of the Kafka Topic to which the Processor publishes Kafka Records |
| Transactional ID Prefix |
Specifies the KafkaProducer config transactional.id will be a generated UUID and will be prefixed with the configured string. |
| Transactions Enabled |
Specifies whether to provide transactional guarantees when communicating with Kafka. If there is a problem sending data to Kafka, and this property is set to false, then the messages that have already been sent to Kafka will continue on and be delivered to consumers. If this is set to true, then the Kafka transaction will be rolled back so that those messages are not available to consumers. Setting this to true requires that the [Delivery Guarantee] property be set to [Guarantee Replicated Delivery.] |
| acks |
Specifies the requirement for guaranteeing that a message is sent to Kafka. Corresponds to Kafka Client acks property. |
| compression.type |
Specifies the compression strategy for records sent to Kafka. Corresponds to Kafka Client compression.type property. |
| max.request.size |
The maximum size of a request in bytes. Corresponds to Kafka Client max.request.size property. |
| partition |
Specifies the Kafka Partition destination for Records. |
| partitioner.class |
Specifies which class to use to compute a partition id for a message. Corresponds to Kafka Client partitioner.class property. |
## Relationships
| Name |
Description |
| failure |
Any FlowFile that cannot be sent to Kafka will be routed to this Relationship |
| success |
FlowFiles for which all content was sent to Kafka. |
## Writes attributes
| Name |
Description |
| msg.count |
The number of messages that were sent to Kafka for this FlowFile. This attribute is added only to FlowFiles that are routed to success. |
## See also
- [com.snowflake.openflow.runtime.processors.kafka.ConsumeKafka](/user-guide/data-integration/openflow/processors/consumekafka)
---
title: PublishMQTT 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishmqtt.md
section: Loading & Unloading Data
---
# PublishMQTT 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-mqtt-nar
## Description
Publishes a message to an MQTT topic
## Tags
IOT, MQTT, publish
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Broker URI |
The URI(s) to use to connect to the MQTT broker (e.g., [tcp://localhost:1883](tcp://localhost:1883)). The 'tcp', 'ssl', 'ws' and 'wss'schemes are supported. In order to use 'ssl', the SSL Context Service property must be set. When a comma-separated URI list is set (e.g., [tcp://localhost:1883,tcp://localhost:1884](tcp://localhost:1883,tcp://localhost:1884)), the processor will use a round-robin algorithm to connect to the brokers on connection failure. |
| Client ID |
MQTT client ID to use. If not set, a UUID will be generated. |
| Connection Timeout (seconds) |
Maximum time interval the client will wait for the network connection to the MQTT server to be established. The default timeout is 30 seconds. A value of 0 disables timeout processing meaning the client will wait until the network connection is made successfully or fails. |
| Keep Alive Interval (seconds) |
Defines the maximum time interval between messages sent or received. It enables the client to detect if the server is no longer available, without having to wait for the TCP/IP timeout. The client will ensure that at least one message travels across the network within each keep alive period. In the absence of a data-related message during the time period, the client sends a very small "ping" message, which the server will acknowledge. A value of 0 disables keepalive processing in the client. |
| Last Will Message |
The message to send as the client's Last Will. |
| Last Will QoS Level |
QoS level to be used when publishing the Last Will Message. |
| Last Will Retain |
Whether to retain the client's Last Will. |
| Last Will Topic |
The topic to send the client's Last Will to. |
| MQTT Specification Version |
The MQTT specification version when connecting with the broker. See the allowable value descriptions for more details. |
| Password |
Password to use when connecting to the broker |
| Quality of Service(QoS) |
The Quality of Service (QoS) to send the message with. Accepts three values '0', '1' and '2'; '0' for 'at most once', '1' for 'at least once', '2' for 'exactly once'. Expression language is allowed in order to support publishing messages with different QoS but the end value of the property must be either '0', '1' or '2'. |
| Retain Message |
Whether or not the retain flag should be set on the MQTT message. |
| SSL Context Service |
The SSL Context Service used to provide client certificate information for TLS/SSL connections. |
| Session Expiry Interval |
After this interval the broker will expire the client and clear the session state. |
| Session state |
Whether to start a fresh or resume previous flows. See the allowable value descriptions for more details. |
| Topic |
The topic to publish the message to. |
| Username |
Username to use when connecting to the broker |
| message-demarcator |
With this property, you have an option to publish multiple messages from a single FlowFile. This property allows you to provide a string (interpreted as UTF-8) to use for demarcating apart the FlowFile content. This is an optional property ; if not provided, and if not defining a Record Reader/Writer, each FlowFile will be published as a single message. To enter special character such as 'new line' use CTRL+Enter or Shift+Enter depending on the OS. |
| record-reader |
The Record Reader to use for parsing the incoming FlowFile into Records. |
| record-writer |
The Record Writer to use for serializing Records before publishing them as an MQTT Message. |
## Relationships
| Name |
Description |
| failure |
FlowFiles that failed to send to the destination are transferred to this relationship. |
| success |
FlowFiles that are sent successfully to the destination are transferred to this relationship. |
## See also
- [org.apache.nifi.processors.mqtt.ConsumeMQTT](/user-guide/data-integration/openflow/processors/consumemqtt)
---
title: PublishSlack 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishslack.md
section: Loading & Unloading Data
---
# PublishSlack 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-slack-nar
## Description
Posts a message to the specified Slack channel. The content of the message can be either a user-defined message that makes use of Expression Language or the contents of the FlowFile can be sent as the message. If sending a user-defined message, the contents of the FlowFile may also be optionally uploaded as a file attachment.
## Tags
chat.postMessage, conversation, publish, send, slack, social media, team, text, unstructured, upload, write
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Access Token |
OAuth Access Token used for authenticating/authorizing the Slack request sent by NiFi. This may be either a User Token or a Bot Token. The token must be granted the chat:write scope. Additionally, in order to upload FlowFile contents as an attachment, it must be granted files:write. |
| Channel |
The name or identifier of the channel to send the message to. If using a channel name, it must be prefixed with the # character. For example, #general. This is valid only for public channels. Otherwise, the unique identifier of the channel to publish to must be provided. |
| Character Set |
Specifies the name of the Character Set used to encode the FlowFile contents. |
| Include FlowFile Content as Attachment |
Specifies whether or not the contents of the FlowFile should be uploaded as an attachment to the Slack message. |
| Max FlowFile Size |
The maximum size of a FlowFile that can be sent to Slack. If any FlowFile exceeds this size, it will be routed to failure. This plays an important role because the entire contents of the file must be loaded into NiFi's heap in order to send the data to Slack. |
| Message Text |
The text of the message to send to Slack. |
| Methods Endpoint Url Prefix |
Customization of the Slack Client. Set the methodsEndpointUrlPrefix. If you need to set a different URL prefix for Slack API Methods calls, you can set the one. Default value: [https://slack.com/api/](https://slack.com/api/) |
| Publish Strategy |
Specifies how the Processor will send the message or file to Slack. |
| Thread Timestamp |
The Timestamp identifier for the thread that this message is to be a part of. If not specified, the message will be a top-level message instead of being in a thread. |
## Relationships
| Name |
Description |
| failure |
FlowFiles are routed to 'failure' if unable to be sent to Slack for any other reason |
| rate limited |
FlowFiles are routed to 'rate limited' if the Rate Limit has been exceeded |
| success |
FlowFiles are routed to success after being successfully sent to Slack |
## Writes attributes
| Name |
Description |
| slack.channel.id |
The ID of the Slack Channel from which the messages were retrieved |
| slack.ts |
The timestamp of the slack messages that was sent; this is used by Slack as a unique identifier |
## Use cases
| Send specific text as a message to Slack, optionally including the FlowFile's contents as an attached file. |
| ----------------------------------------------------------------------------------------------------------- |
| Send the contents of the FlowFile as a message to Slack. |
## Use Cases Involving Other Components
| Respond to a Slack message in a thread. |
| --------------------------------------- |
## See also
- [org.apache.nifi.processors.slack.ConsumeSlack](/user-guide/data-integration/openflow/processors/consumeslack)
- [org.apache.nifi.processors.slack.ListenSlack](/user-guide/data-integration/openflow/processors/listenslack)
---
title: PublishSnowpipeStreaming 2026.4.28.15
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/publishsnowpipestreaming.md
section: Loading & Unloading Data
---
# PublishSnowpipeStreaming 2026.4.28.15
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-snowpipe-streaming-2-processors-nar
## Description
Publishes records formatted as Newline Delimited JSON to Snowflake Database Pipes using Snowpipe Streaming High Availability.
After data is transferred, the processor waits for the streaming channel to report committed offset tokens (according to **Offset Tracking Resolution** and **Offset Tracking Timeout**) before routing FlowFiles to **success**, **invalid**, or **failure**. It can run when the incoming connection has no FlowFiles so that pending batches finish polling.
## Tags
NDJSON, Preview, Snowflake, Snowpipe Streaming
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Account |
Snowflake Account Identifier with Organization Name and Account Name formatted as [organization-name]-[account-name] |
| Authentication Strategy |
Strategy for authenticating Snowflake connections |
| Channel Group |
Group for managing distinct Snowpipe Streaming Channels with partitioning |
| Channel Insert Timeout |
Maximum duration to retry inserting records before failing with an upper bound of 5 minutes |
| Database |
Snowflake Database destination for processed records |
| Destination Type |
Snowflake destination object for processed records with support for derived default pipes |
| File Fragment Count |
Maximum number of file fragments sent to object storage for Snowpipe Streaming ingestion from input FlowFiles. Must be between 1 and 100. |
| File Fragment Size |
Maximum size in bytes for each file fragment sent to object storage for Snowpipe Streaming ingestion. Must be between 1 KB and 256 MB |
| Offset Token End Expression |
Expression Language definition to produce the highest offset token for a FlowFile as a monotonically increasing number |
| Offset Token Record Pointer |
JSON Pointer to offset token in each record required when the last committed offset token is between start and end boundaries |
| Offset Token Start Expression |
Expression Language definition to produce the lowest offset token for a FlowFile as a monotonically increasing number |
| Offset Tracking Resolution |
Resolution level for evaluating committed offset tokens against input FlowFiles and records. **Disabled**: opaque offset token handling without tracking across FlowFiles or records. **FlowFile**: track each FlowFile with monotonically increasing offset tokens. **Record**: track each record in each FlowFile with monotonically increasing offset tokens. |
| Offset Tracking Timeout |
Maximum duration to wait for channel status to confirm committed offset tokens before routing to failure |
| Pipe |
Snowflake Pipe destination for processed records |
| Private Key Service |
RSA Private Key Service for authenticating connections |
| Role |
Snowflake Role the user will assume when authenticating connections |
| Schema |
Snowflake Schema destination for processed records |
| Table |
Snowflake Table destination for processed records |
| Transfer Strategy |
Strategy for transferring records to Snowpipe Streaming. **Managed**: transfer records as either batches of rows or file fragments based on uncompressed size. **Rows**: transfer records as batches of rows over HTTP to Snowpipe Streaming. **File Fragments**: transfer records as file fragments over HTTP to cloud storage services. |
| User |
Snowflake User for authenticating connections |
| Web Client Service Provider |
Web Client Service Provider supporting HTTP request and response handling |
## Relationships
| Name |
Description |
| empty |
FlowFiles with empty content not sent to Snowflake |
| failure |
FlowFiles that failed to upload to Snowflake |
| invalid |
FlowFiles that Snowflake identified as containing one or more invalid rows resulting in partial transmission |
| success |
FlowFiles successfully uploaded to Snowflake |
---
title: PutAzureBlobStorage_v12 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putazureblobstorage_v12.md
section: Loading & Unloading Data
---
# PutAzureBlobStorage_v12 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-azure-nar
## Description
Puts content into a blob on Azure Blob Storage. The processor uses Azure Blob Storage client library v12.
## Tags
azure, blob, cloud, microsoft, storage
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Blob Name |
The full name of the blob |
| Client-Side Encryption Key ID |
Specifies the ID of the key to use for client-side encryption. |
| Client-Side Encryption Key Type |
Specifies the key type to use for client-side encryption. |
| Client-Side Encryption Local Key |
When using local client-side encryption, this is the raw key, encoded in hexadecimal |
| Conflict Resolution Strategy |
Specifies whether an existing blob will have its contents replaced upon conflict. |
| Container Name |
Name of the Azure storage container. In case of PutAzureBlobStorage processor, container can be created if it does not exist. |
| Create Container |
Specifies whether to check if the container exists and to automatically create it if it does not. Permission to list containers is required. If false, this check is not made, but the Put operation will fail if the container does not exist. |
| File Resource Service |
File Resource Service providing access to the local resource to be transferred |
| Resource Transfer Source |
The source of the content to be transferred |
| Storage Credentials |
Controller Service used to obtain Azure Blob Storage Credentials. |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor. |
## Relationships
| Name |
Description |
| failure |
Unsuccessful operations will be transferred to the failure relationship. |
| success |
All successfully processed FlowFiles are routed to this relationship |
## Writes attributes
| Name |
Description |
| azure.container |
The name of the Azure Blob Storage container |
| azure.blobname |
The name of the blob on Azure Blob Storage |
| azure.primaryUri |
Primary location of the blob |
| azure.etag |
ETag of the blob |
| azure.blobtype |
Type of the blob (either BlockBlob, PageBlob or AppendBlob) |
| mime.type |
MIME Type of the content |
| lang |
Language code for the content |
| azure.timestamp |
Timestamp of the blob |
| azure.length |
Length of the blob |
| azure.error.code |
Error code reported during blob operation |
| azure.ignored |
When Conflict Resolution Strategy is 'ignore', this property will be true/false depending on whether the blob was ignored. |
## See also
- [org.apache.nifi.processors.azure.storage.CopyAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/copyazureblobstorage_v12)
- [org.apache.nifi.processors.azure.storage.DeleteAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/deleteazureblobstorage_v12)
- [org.apache.nifi.processors.azure.storage.FetchAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/fetchazureblobstorage_v12)
- [org.apache.nifi.processors.azure.storage.ListAzureBlobStorage_v12](/user-guide/data-integration/openflow/processors/listazureblobstorage_v12)
---
title: PutAzureCosmosDBRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putazurecosmosdbrecord.md
section: Loading & Unloading Data
---
# PutAzureCosmosDBRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-azure-nar
## Description
This processor is a record-aware processor for inserting data into Cosmos DB with Core SQL API. It uses a configured record reader and schema to read an incoming record set from the body of a Flowfile and then inserts those records into a configured Cosmos DB Container.
## Tags
azure, cosmos, insert, put, record
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Cosmos DB Access Key |
Cosmos DB Access Key from Azure Portal (Settings->Keys). Choose a read-write key to enable database or container creation at run time |
| Cosmos DB Conflict Handling Strategy |
Choose whether to ignore or upsert when conflict error occurs during insertion |
| Cosmos DB Connection Service |
If configured, the controller service used to obtain the connection string and access key |
| Cosmos DB Consistency Level |
Choose from five consistency levels on the consistency spectrum. Refer to Cosmos DB documentation for their differences |
| Cosmos DB Container ID |
The unique identifier for the container |
| Cosmos DB Name |
The database name or id. This is used as the namespace for document collections or containers |
| Cosmos DB Partition Key |
The partition key used to distribute data among servers |
| Cosmos DB URI |
Cosmos DB URI, typically in the form of https://\{databaseaccount\}.documents.azure.com:443/ Note this host URL is for Cosmos DB with Core SQL API from Azure Portal (Overview->URI) |
| Insert Batch Size |
The number of records to group together for one single insert operation against Cosmos DB |
| Record Reader |
Specifies the Controller Service to use for parsing incoming data and determining the data's schema |
## Relationships
| Name |
Description |
| failure |
All FlowFiles that cannot be written to Cosmos DB are routed to this relationship |
| success |
All FlowFiles that are written to Cosmos DB are routed to this relationship |
---
title: PutAzureDataExplorer 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putazuredataexplorer.md
section: Loading & Unloading Data
---
# PutAzureDataExplorer 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-azure-nar
## Description
Acts as an Azure Data Explorer sink which sends FlowFiles to the provided endpoint. Data can be sent through queued ingestion or streaming ingestion to the Azure Data Explorer cluster.
## Tags
ADX, Azure, Data, Explorer, Kusto
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Data Format |
The format of the data that is sent to Azure Data Explorer. Supported formats include: avro, csv, json |
| Database Name |
Azure Data Explorer Database Name for ingesting data |
| Ingest Mapping Name |
The name of the mapping responsible for storing the data in the appropriate columns. |
| Ingest Status Polling Interval |
Defines the value of interval of time to poll for ingestion status |
| Ingest Status Polling Timeout |
Defines the total amount time to poll for ingestion status |
| Ingestion Ignore First Record |
Defines whether ignore first record while ingestion. |
| Kusto Ingest Service |
Azure Data Explorer Kusto Ingest Service |
| Partially Succeeded Routing Strategy |
Defines where to route FlowFiles that resulted in a partially succeeded status. |
| Poll for Ingest Status |
Determines whether to poll on ingestion status after an ingestion to Azure Data Explorer is completed |
| Streaming Enabled |
Whether to stream data to Azure Data Explorer. |
| Table Name |
Azure Data Explorer Table Name for ingesting data |
## Relationships
| Name |
Description |
| failure |
Ingest processing failed |
| success |
Ingest processing succeeded |
---
title: PutAzureDataLakeStorage 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putazuredatalakestorage.md
section: Loading & Unloading Data
---
# PutAzureDataLakeStorage 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-azure-nar
## Description
Writes the contents of a FlowFile as a file on Azure Data Lake Storage Gen 2
## Tags
adlsgen2, azure, cloud, datalake, microsoft, storage
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| ADLS Credentials |
Controller Service used to obtain Azure Credentials. |
| Base Temporary Path |
The Path where the temporary directory will be created. The Path name cannot contain a leading '/'. The root directory can be designated by the empty string value. Non-existing directories will be created. The Temporary File Directory name is _nifitempdirectory |
| Conflict Resolution Strategy |
Indicates what should happen when a file with the same name already exists in the output directory |
| Directory Name |
Name of the Azure Storage Directory. The Directory Name cannot contain a leading '/'. The root directory can be designated by the empty string value. In case of the PutAzureDataLakeStorage processor, the directory will be created if not already existing. |
| File Name |
The filename |
| File Resource Service |
File Resource Service providing access to the local resource to be transferred |
| Filesystem Name |
Name of the Azure Storage File System (also called Container). It is assumed to be already existing. |
| Resource Transfer Source |
The source of the content to be transferred |
| Writing Strategy |
Defines the approach for writing the Azure file. |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor. |
## Relationships
| Name |
Description |
| failure |
Files that could not be written to Azure storage for some reason are transferred to this relationship |
| success |
Files that have been successfully written to Azure storage are transferred to this relationship |
## Writes attributes
| Name |
Description |
| azure.filesystem |
The name of the Azure File System |
| azure.directory |
The name of the Azure Directory |
| azure.filename |
The name of the Azure File |
| azure.primaryUri |
Primary location for file content |
| azure.length |
The length of the Azure File |
## See also
- [org.apache.nifi.processors.azure.storage.DeleteAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/deleteazuredatalakestorage)
- [org.apache.nifi.processors.azure.storage.FetchAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/fetchazuredatalakestorage)
- [org.apache.nifi.processors.azure.storage.ListAzureDataLakeStorage](/user-guide/data-integration/openflow/processors/listazuredatalakestorage)
---
title: PutAzureEventHub 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putazureeventhub.md
section: Loading & Unloading Data
---
# PutAzureEventHub 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-azure-nar
## Description
Send FlowFile contents to Azure Event Hubs
## Tags
azure, cloud, eventhub, events, microsoft, streaming, streams
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Event Hub Name |
Name of Azure Event Hubs destination |
| Event Hub Namespace |
Namespace of Azure Event Hubs prefixed to Service Bus Endpoint domain |
| Maximum Batch Size |
Maximum number of FlowFiles processed for each Processor invocation |
| Partitioning Key Attribute Name |
If specified, the value from argument named by this field will be used as a partitioning key to be used by event hub. |
| Service Bus Endpoint |
To support namespaces not in the default windows.net domain. |
| Shared Access Policy Key |
The key of the shared access policy. Either the primary or the secondary key can be used. |
| Shared Access Policy Name |
The name of the shared access policy. This policy must have Send claims. |
| Transport Type |
Advanced Message Queuing Protocol Transport Type for communication with Azure Event Hubs |
| Use Azure Managed Identity |
Choose whether or not to use the managed identity of Azure VM/VMSS |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
Any FlowFile that could not be sent to the event hub will be transferred to this Relationship. |
| success |
Any FlowFile that is successfully sent to the event hubs will be transferred to this Relationship. |
---
title: PutAzureQueueStorage_v12 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putazurequeuestorage_v12.md
section: Loading & Unloading Data
---
# PutAzureQueueStorage_v12 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-azure-nar
## Description
Writes the content of the incoming FlowFiles to the configured Azure Queue Storage.
## Tags
azure, cloud, enqueue, microsoft, queue, storage
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Credentials Service |
Controller Service used to obtain Azure Storage Credentials. |
| Endpoint Suffix |
Storage accounts in public Azure always use a common FQDN suffix. Override this endpoint suffix with a different suffix in certain circumstances (like Azure Stack or non-public Azure regions). |
| Message Time To Live |
Maximum time to allow the message to be in the queue |
| Queue Name |
Name of the Azure Storage Queue |
| Request Timeout |
The timeout for read or write requests to Azure Queue Storage. Defaults to 1 second. |
| Visibility Timeout |
The length of time during which the message will be invisible after it is read. If the processing unit fails to delete the message after it is read, then the message will reappear in the queue. |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor. |
## Relationships
| Name |
Description |
| failure |
Unsuccessful operations will be transferred to the failure relationship. |
| success |
All successfully processed FlowFiles are routed to this relationship |
## See also
- [org.apache.nifi.processors.azure.storage.queue.GetAzureQueueStorage_v12](/user-guide/data-integration/openflow/processors/getazurequeuestorage_v12)
---
title: PutBigQuery 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putbigquery.md
section: Loading & Unloading Data
---
# PutBigQuery 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-gcp-nar
## Description
Writes the contents of a FlowFile to a Google BigQuery table. The processor is record based so the schema that is used is driven by the RecordReader. Attributes that are not matched to the target schema are skipped. Exactly once delivery semantics are achieved via stream offsets.
## Tags
bigquery, bq, google, google cloud
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| GCP Credentials Provider Service |
The Controller Service used to obtain Google Cloud Platform credentials. |
| bigquery-api-endpoint |
Can be used to override the default BigQuery endpoint. Default is bigquerystorage.googleapis.com:443. Format must be hostname:port. |
| bq.append.record.count |
The number of records to be appended to the write stream at once. Applicable for both batch and stream types |
| bq.dataset |
BigQuery dataset name (Note - The dataset must exist in GCP) |
| bq.record.reader |
Specifies the Controller Service to use for parsing incoming data. |
| bq.skip.invalid.rows |
Sets whether to insert all valid rows of a request, even if invalid rows exist. If not set the entire insert request will fail if it contains an invalid row. |
| bq.table.name |
BigQuery table name |
| bq.transfer.type |
Defines the preferred transfer type streaming or batching |
| gcp-project-id |
Google Cloud Project ID |
| gcp-retry-count |
How many retry attempts should be made before routing to the failure relationship. |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
FlowFiles are routed to this relationship if the Google BigQuery operation fails. |
| success |
FlowFiles are routed to this relationship after a successful Google BigQuery operation. |
## Writes attributes
| Name |
Description |
| bq.records.count |
Number of records successfully inserted |
---
title: PutBoxFile 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putboxfile.md
section: Loading & Unloading Data
---
# PutBoxFile 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-box-nar
## Description
Puts content to a Box folder.
## Tags
box, put, storage
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Box Client Service |
Controller Service used to obtain a Box API connection. |
| Chunked Upload Threshold |
The maximum size of the content which is uploaded at once. FlowFiles larger than this threshold are uploaded in chunks. Chunked upload is allowed for files larger than 20 MB. It is recommended to use chunked upload for files exceeding 50 MB. |
| Conflict Resolution Strategy |
Indicates what should happen when a file with the same name already exists in the specified Box folder. |
| Create Subfolder |
Specifies whether to check if the subfolder exists and to automatically create it if it does not. Permission to list folders is required. |
| Filename |
The name of the file to upload to the specified Box folder. |
| Folder ID |
The ID of the folder where the file is uploaded. Please see Additional Details to obtain Folder ID. |
| Subfolder Name |
The name (path) of the subfolder where files are uploaded. The subfolder name is relative to the folder specified by 'Folder ID'. Example: subFolder, subFolder1/subfolder2 |
## Relationships
| Name |
Description |
| failure |
Files that could not be written to Box for some reason are transferred to this relationship. |
| success |
Files that have been successfully written to Box are transferred to this relationship. |
## Writes attributes
| Name |
Description |
| box.id |
The id of the file |
| filename |
The name of the file |
| path |
The folder path where the file is located |
| box.size |
The size of the file |
| box.timestamp |
The last modified time of the file |
| error.code |
The error code returned by Box |
| error.message |
The error message returned by Box |
## See also
- [org.apache.nifi.processors.box.FetchBoxFile](/user-guide/data-integration/openflow/processors/fetchboxfile)
- [org.apache.nifi.processors.box.ListBoxFile](/user-guide/data-integration/openflow/processors/listboxfile)
---
title: PutCloudWatchMetric 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putcloudwatchmetric.md
section: Loading & Unloading Data
---
# PutCloudWatchMetric 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-aws-nar
## Description
Publishes metrics to Amazon CloudWatch. Metric can be either a single value, or a StatisticSet comprised of minimum, maximum, sum and sample count.
## Tags
amazon, aws, cloudwatch, metrics, publish, put
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| AWS Credentials Provider service |
The Controller Service that is used to obtain AWS credentials provider |
| Communications Timeout |
|
| Endpoint Override URL |
Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. |
| Maximum |
The maximum value of the sample set. Must be a double |
| Metric Name |
The name of the metric |
| Minimum |
The minimum value of the sample set. Must be a double |
| Namespace |
The namespace for the metric data for CloudWatch |
| Region |
|
| SSL Context Service |
Specifies an optional SSL Context Service that, if provided, will be used to create connections |
| Sample Count |
The number of samples used for the statistic set. Must be a double |
| Sum |
The sum of values for the sample set. Must be a double |
| Timestamp |
A point in time expressed as the number of milliseconds since Jan 1, 1970 00:00:00 UTC. If not specified, the default value is set to the time the metric data was received |
| Unit |
The unit of the metric. (e.g Seconds, Bytes, Megabytes, Percent, Count, Kilobytes/Second, Terabits/Second, Count/Second) For details see [http://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_MetricDatum.html](http://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_MetricDatum.html) |
| Value |
The value for the metric. Must be a double |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
FlowFiles are routed to failure relationship |
| success |
FlowFiles are routed to success relationship |
---
title: PutDatabaseRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdatabaserecord.md
section: Loading & Unloading Data
---
# PutDatabaseRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL statements and executed as a single transaction. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is routed to success. The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute. IMPORTANT: If the Statement Type is UPDATE, then the incoming records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing (if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys exist).
## Tags
database, delete, insert, jdbc, put, record, sql, update
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Column Name Translation Pattern |
Column name will be normalized with this regular expression |
| Column Name Translation Strategy |
The strategy used to normalize table column name. Column Name will be uppercased to do case-insensitive matching irrespective of strategy |
| Data Record Path |
If specified, this property denotes a RecordPath that will be evaluated against each incoming Record and the Record that results from evaluating the RecordPath will be sent to the database instead of sending the entire incoming Record. If not specified, the entire incoming Record will be published to the database. |
| Database Dialect Service |
Database Dialect Service for generating statements specific to a particular service or vendor. |
| Delete Keys |
A comma-separated list of column names that uniquely identifies a row in the database for DELETE statements. If the Statement Type is DELETE and this property is not set, the table's columns are used. This property is ignored if the Statement Type is not DELETE |
| Rollback On Failure |
Specify how to handle error. By default (false), if an error occurs while processing a FlowFile, the FlowFile will be routed to 'failure' or 'retry' relationship based on error type, and processor can continue with next FlowFile. Instead, you may want to rollback currently processed FlowFiles and stop further processing immediately. In that case, you can do so by enabling this 'Rollback On Failure' property. If enabled, failed FlowFiles will stay in the input relationship without penalizing it and being processed repeatedly until it gets processed successfully or removed by other means. It is important to set adequate 'Yield Duration' to avoid retrying too frequently. |
| Statement Type Record Path |
Specifies a RecordPath to evaluate against each Record in order to determine the Statement Type. The RecordPath should equate to either INSERT, UPDATE, UPSERT, or DELETE. (Debezium style operation types are also supported: "r" and "c" for INSERT, "u" for UPDATE, and "d" for DELETE) |
| database-session-autocommit |
The autocommit mode to set on the database connection being used. If set to false, the operation(s) will be explicitly committed or rolled back (based on success or failure respectively). If set to true, the driver/database automatically handles the commit/rollback. |
| db-type |
Database Type for generating statements specific to a particular service or vendor. The Generic Type supports most cases but selecting a specific type enables optimal processing or additional features. |
| put-db-record-allow-multiple-statements |
If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates whether to split the field value by a semicolon and execute each statement separately. If any statement causes an error, the entire set of statements will be rolled back. If the Statement Type is not 'SQL', this field is ignored. |
| put-db-record-binary-format |
The format to be applied when decoding string values to binary. |
| put-db-record-catalog-name |
The name of the database (or the name of the catalog, depending on the destination system) that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty. Note that if the property is set and the database is case-sensitive, the catalog name must match the database's catalog name exactly. |
| put-db-record-dcbp-service |
The Controller Service that is used to obtain a connection to the database for sending records. |
| put-db-record-field-containing-sql |
If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored. |
| put-db-record-max-batch-size |
Specifies maximum number of sql statements to be included in each batch sent to the database. Zero means the batch size is not limited, and all statements are put into a single batch which can cause high memory usage issues for a very large number of statements. |
| put-db-record-query-timeout |
The maximum amount of time allowed for a running SQL statement , zero means there is no limit. Max time less than 1 second will be equal to zero. |
| put-db-record-quoted-identifiers |
Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables. |
| put-db-record-quoted-table-identifiers |
Enabling this option will cause the table name to be quoted to support the use of special characters in the table name. |
| put-db-record-record-reader |
Specifies the Controller Service to use for parsing incoming data and determining the data's schema. |
| put-db-record-schema-name |
The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty. Note that if the property is set and the database is case-sensitive, the schema name must match the database's schema name exactly. |
| put-db-record-statement-type |
Specifies the type of SQL Statement to generate. Please refer to the database documentation for a description of the behavior of each operation. Please note that some Database Types may not support certain Statement Types. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL'statement type. If 'SQL' is specified, the value of the field specified by the 'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is. |
| put-db-record-table-name |
The name of the table that the statement should affect. Note that if the database is case-sensitive, the table name must match the database's table name exactly. |
| put-db-record-translate-field-names |
If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. If false, the field names must match the column names exactly, or the column will not be updated |
| put-db-record-unmatched-column-behavior |
If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation |
| put-db-record-unmatched-field-behavior |
If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation |
| put-db-record-update-keys |
A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. This property is ignored if the Statement Type is INSERT |
| table-schema-cache-size |
Specifies how many Table Schemas should be cached |
## Relationships
| Name |
Description |
| failure |
A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, such as an invalid query or an integrity constraint violation |
| retry |
A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed |
| success |
Successfully created FlowFile from SQL query result set. |
## Writes attributes
| Name |
Description |
| putdatabaserecord.error |
If an error occurs during processing, the flow file will be routed to failure or retry, and this attribute will be populated with the cause of the error. |
## Use cases
| Insert records into a database |
| ------------------------------ |
---
title: PutDatabricksSQL 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdatabrickssql.md
section: Loading & Unloading Data
---
# PutDatabricksSQL 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-databricks-processors-nar
## Description
Submit a SQL Execution using Databricks REST API then write the JSON response to FlowFile Content. For high performance SELECT or INSERT queries use ExecuteSQL instead.
## Tags
databricks, openflow, sql
## Input Requirement
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Databricks Client |
Databricks Client Service. |
| Default Catalog |
Default table catalog, some SQL statements such as 'COPY INTO' do not support using a default catalog |
| Default Schema |
Default table schema, some SQL statements such as 'COPY INTO' do not support using a default schema |
| Record Writer |
Specifies the Controller Service to use for writing results to a FlowFile. The Record Writer may use Inherit Schema to emulate the inferred schema behavior, i.e. an explicit schema need not be defined in the writer, and will be supplied by the same logic used to infer the schema from the column types. |
| SQL Warehouse ID |
Warehouse ID used to execute SQL |
| SQL Warehouse Name |
SQL Warehouse Name used to execute SQL, will search through all SQL Warehouses to find matching name. |
| Statement |
SQL statement to execute |
## Relationships
| Name |
Description |
| failure |
Databricks failure relationship |
| http.response |
HTTP Response to SQL API Request |
| original |
The original FlowFile is routed to this relationship when processing is successful. |
| records |
Serialized SQL Records |
## Writes attributes
| Name |
Description |
| statement.state |
The final state of the executed SQL statement |
| error.code |
The error code for the SQL statement if an error occurred. |
| error.message |
The error message for the SQL statement if an error occurred. |
---
title: PutDBFSFile 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdbfsfile.md
section: Loading & Unloading Data
---
# PutDBFSFile 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-databricks-processors-nar
## Description
Write FlowFile content to DBFS.
## Tags
databricks, dbfs, openflow
## Input Requirement
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| DBFS File Path |
DBFS file path e.g. /directory/file.txt |
| Databricks Client |
Databricks Client Service. |
| Overwrite Policy |
What action to take if a file already exists at the destination path. |
## Relationships
| Name |
Description |
| failure |
Databricks failure relationship |
| success |
Databricks success relationship |
## Writes attributes
| Name |
Description |
| error.code |
The error code for the SQL statement if an error occurred. |
| error.message |
The error message for the SQL statement if an error occurred. |
---
title: PutDistributedMapCache 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdistributedmapcache.md
section: Loading & Unloading Data
---
# PutDistributedMapCache 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Gets the content of a FlowFile and puts it to a distributed map cache, using a cache key computed from FlowFile attributes. If the cache already contains the entry and the cache update strategy is 'keep original' the entry is not replaced.'
## Tags
cache, distributed, map, put
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Cache Entry Identifier |
A FlowFile attribute, or the results of an Attribute Expression Language statement, which will be evaluated against a FlowFile in order to determine the cache key |
| Cache update strategy |
Determines how the cache is updated if the cache already contains the entry |
| Distributed Cache Service |
The Controller Service that is used to cache flow files |
| Max cache entry size |
The maximum amount of data to put into cache |
## Relationships
| Name |
Description |
| failure |
Any FlowFile that cannot be inserted into the cache will be routed to this relationship |
| success |
Any FlowFile that is successfully inserted into cache will be routed to this relationship |
## Writes attributes
| Name |
Description |
| cached |
All FlowFiles will have an attribute 'cached'. The value of this attribute is true, is the FlowFile is cached, otherwise false. |
## See also
- [org.apache.nifi.processors.standard.FetchDistributedMapCache](/user-guide/data-integration/openflow/processors/fetchdistributedmapcache)
---
title: PutDropbox 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdropbox.md
section: Loading & Unloading Data
---
# PutDropbox 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-dropbox-processors-nar
## Description
Puts content to a Dropbox folder.
## Tags
dropbox, put, storage
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Chunked Upload Size |
Defines the size of a chunk. Used when a FlowFile 's size exceeds'Chunked Upload Threshold 'and content is uploaded in smaller chunks. It is recommended to specify chunked upload size smaller than'Chunked Upload Threshold' and as multiples of 4 MB. Maximum allowed value is 150 MB. |
| Chunked Upload Threshold |
The maximum size of the content which is uploaded at once. FlowFiles larger than this threshold are uploaded in chunks. Maximum allowed value is 150 MB. |
| Conflict Resolution Strategy |
Indicates what should happen when a file with the same name already exists in the specified Dropbox folder. |
| Dropbox Credential Service |
Controller Service used to obtain Dropbox credentials (App Key, App Secret, Access Token, Refresh Token). See controller service's Additional Details for more information. |
| Filename |
The full name of the file to upload. |
| Folder |
The path of the Dropbox folder to upload files to. The folder will be created if it does not exist yet. |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
Files that could not be written to Dropbox for some reason are transferred to this relationship. |
| success |
Files that have been successfully written to Dropbox are transferred to this relationship. |
## Writes attributes
| Name |
Description |
| error.message |
The error message returned by Dropbox |
| dropbox.id |
The Dropbox identifier of the file |
| path |
The folder path where the file is located |
| filename |
The name of the file |
| dropbox.size |
The size of the file |
| dropbox.timestamp |
The server modified time of the file |
| dropbox.revision |
Revision of the file |
## See also
- [org.apache.nifi.processors.dropbox.FetchDropbox](/user-guide/data-integration/openflow/processors/fetchdropbox)
- [org.apache.nifi.processors.dropbox.ListDropbox](/user-guide/data-integration/openflow/processors/listdropbox)
---
title: PutDynamoDB 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdynamodb.md
section: Loading & Unloading Data
---
# PutDynamoDB 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-aws-nar
## Description
Puts a document from DynamoDB based on hash and range key. The table can have either hash and range or hash key alone. Currently the keys supported are string and number and value can be json document. In case of hash and range keys both key are required for the operation. The FlowFile content must be JSON. FlowFile content is mapped to the specified Json Document attribute in the DynamoDB item.
## Tags
AWS, Amazon, DynamoDB, Insert, Put
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| AWS Credentials Provider service |
The Controller Service that is used to obtain AWS credentials provider |
| Batch items for each request (between 1 and 50) |
The items to be retrieved in one batch |
| Character set of document |
Character set of data in the document |
| Communications Timeout |
|
| Endpoint Override URL |
Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. |
| Hash Key Name |
The hash key name of the item |
| Hash Key Value |
The hash key value of the item |
| Hash Key Value Type |
The hash key value type of the item |
| Json Document attribute |
The Json document to be retrieved from the dynamodb item ( 's' type in the schema) |
| Range Key Name |
The range key name of the item |
| Range Key Value |
|
| Range Key Value Type |
The range key value type of the item |
| Region |
|
| SSL Context Service |
Specifies an optional SSL Context Service that, if provided, will be used to create connections |
| Table Name |
The DynamoDB table name |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
FlowFiles are routed to failure relationship |
| success |
FlowFiles are routed to success relationship |
| unprocessed |
FlowFiles are routed to unprocessed relationship when DynamoDB is not able to process all the items in the request. Typical reasons are insufficient table throughput capacity and exceeding the maximum bytes per request. Unprocessed FlowFiles can be retried with a new request. |
## Writes attributes
| Name |
Description |
| dynamodb.key.error.unprocessed |
DynamoDB unprocessed keys |
| dynmodb.range.key.value.error |
DynamoDB range key error |
| dynamodb.key.error.not.found |
DynamoDB key not found |
| dynamodb.error.exception.message |
DynamoDB exception message |
| dynamodb.error.code |
DynamoDB error code |
| dynamodb.error.message |
DynamoDB error message |
| dynamodb.error.service |
DynamoDB error service |
| dynamodb.error.retryable |
DynamoDB error is retryable |
| dynamodb.error.request.id |
DynamoDB error request id |
| dynamodb.error.status.code |
DynamoDB error status code |
| dynamodb.item.io.error |
IO exception message on creating item |
## See also
- [org.apache.nifi.processors.aws.dynamodb.DeleteDynamoDB](/user-guide/data-integration/openflow/processors/deletedynamodb)
- [org.apache.nifi.processors.aws.dynamodb.GetDynamoDB](/user-guide/data-integration/openflow/processors/getdynamodb)
- [org.apache.nifi.processors.aws.dynamodb.PutDynamoDBRecord](/user-guide/data-integration/openflow/processors/putdynamodbrecord)
---
title: PutDynamoDBRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putdynamodbrecord.md
section: Loading & Unloading Data
---
# PutDynamoDBRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-aws-nar
## Description
Inserts items into DynamoDB based on record-oriented data. The record fields are mapped into DynamoDB item fields, including partition and sort keys if set. Depending on the number of records the processor might execute the insert in multiple chunks in order to overcome DynamoDB's limitation on batch writing. This might result partially processed FlowFiles in which case the FlowFile will be transferred to the "unprocessed" relationship with the necessary attribute to retry later without duplicating the already executed inserts.
## Tags
AWS, Amazon, DynamoDB, Insert, Put, Record
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| AWS Credentials Provider service |
The Controller Service that is used to obtain AWS credentials provider |
| Communications Timeout |
|
| Endpoint Override URL |
Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. |
| Partition Key Attribute |
Specifies the FlowFile attribute that will be used as the value of the partition key when using "Partition by attribute" partition key strategy. |
| Partition Key Field |
Defines the name of the partition key field in the DynamoDB table. Partition key is also known as hash key. Depending on the "Partition Key Strategy" the field value might come from the incoming Record or a generated one. |
| Partition Key Strategy |
Defines the strategy the processor uses to assign partition key value to the inserted Items. |
| Record Reader |
Specifies the Controller Service to use for parsing incoming data and determining the data's schema. |
| Region |
|
| SSL Context Service |
Specifies an optional SSL Context Service that, if provided, will be used to create connections |
| Sort Key Field |
Defines the name of the sort key field in the DynamoDB table. Sort key is also known as range key. |
| Sort Key Strategy |
Defines the strategy the processor uses to assign sort key to the inserted Items. |
| Table Name |
The DynamoDB table name |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
FlowFiles are routed to failure relationship |
| success |
FlowFiles are routed to success relationship |
| unprocessed |
FlowFiles are routed to unprocessed relationship when DynamoDB is not able to process all the items in the request. Typical reasons are insufficient table throughput capacity and exceeding the maximum bytes per request. Unprocessed FlowFiles can be retried with a new request. |
## Writes attributes
| Name |
Description |
| dynamodb.chunks.processed |
Number of chunks successfully inserted into DynamoDB. If not set, it is considered as 0 |
| dynamodb.key.error.unprocessed |
DynamoDB unprocessed keys |
| dynmodb.range.key.value.error |
DynamoDB range key error |
| dynamodb.key.error.not.found |
DynamoDB key not found |
| dynamodb.error.exception.message |
DynamoDB exception message |
| dynamodb.error.code |
DynamoDB error code |
| dynamodb.error.message |
DynamoDB error message |
| dynamodb.error.service |
DynamoDB error service |
| dynamodb.error.retryable |
DynamoDB error is retryable |
| dynamodb.error.request.id |
DynamoDB error request id |
| dynamodb.error.status.code |
DynamoDB error status code |
| dynamodb.item.io.error |
IO exception message on creating item |
## See also
- [org.apache.nifi.processors.aws.dynamodb.DeleteDynamoDB](/user-guide/data-integration/openflow/processors/deletedynamodb)
- [org.apache.nifi.processors.aws.dynamodb.GetDynamoDB](/user-guide/data-integration/openflow/processors/getdynamodb)
- [org.apache.nifi.processors.aws.dynamodb.PutDynamoDB](/user-guide/data-integration/openflow/processors/putdynamodb)
---
title: PutElasticsearchJson 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putelasticsearchjson.md
section: Loading & Unloading Data
---
# PutElasticsearchJson 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-elasticsearch-restapi-nar
## Description
An Elasticsearch put processor that uses the official Elastic REST client libraries. Each FlowFile is treated as a document to be sent to the Elasticsearch _bulk API. Multiple FlowFiles can be batched together into each Request sent to Elasticsearch.
## Tags
elasticsearch, elasticsearch7, elasticsearch8, elasticsearch9, index, json, put
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Batch Size |
The preferred number of FlowFiles to send over in a single batch |
| Character Set |
Specifies the character set of the document data. |
| Client Service |
An Elasticsearch client service to use for running queries. |
| Dynamic Templates |
The dynamic_templates for the document. Must be parsable as a JSON Object. Requires Elasticsearch 7+ |
| Identifier Attribute |
The name of the FlowFile attribute containing the identifier for the document. If the Index Operation is "index", this property may be left empty or evaluate to an empty value, in which case the document's identifier will be auto-generated by Elasticsearch. For all other Index Operations, the attribute must evaluate to a non-empty value. |
| Index |
The name of the index to use. |
| Index Operation |
The type of the operation used to index (create, delete, index, update, upsert) |
| Log Error Responses |
If this is enabled, errors will be logged to the NiFi logs at the error log level. Otherwise, they will only be logged if debug logging is enabled on NiFi as a whole. The purpose of this option is to give the user the ability to debug failed operations without having to turn on debug logging. |
| Max JSON Field String Length |
The maximum allowed length of a string value when parsing a JSON document or attribute. |
| Output Error Responses |
If this is enabled, response messages from Elasticsearch marked as "error" will be output to the "error_responses" relationship. This does not impact the output of flowfiles to the "successful" or "errors" relationships |
| Script |
The script for the document update/upsert. Only applies to Update/Upsert operations. Must be parsable as JSON Object. If left blank, the FlowFile content will be used for document update/upsert |
| Scripted Upsert |
Whether to add the scripted_upsert flag to the Upsert Operation. If true, forces Elasticsearch to execute the Script whether or not the document exists, defaults to false. If the Upsert Document provided (from FlowFile content) will be empty, but sure to set the Client Service controller service's Suppress Null and Empty Values to Never Suppress or no "upsert" doc will be, included in the request to Elasticsearch and the operation will not create a new document for the script to execute against, resulting in a "not_found" error |
| Treat Not Found as Success |
If true, "not_found" Elasticsearch Document associated Records will be routed to the "successful" relationship, otherwise to the "errors" relationship. If Output Error Responses is "true" then "not_found" responses from Elasticsearch will be sent to the error_responses relationship. |
| Type |
The type of this document (used by Elasticsearch for indexing and searching). |
## Relationships
| Name |
Description |
| errors |
Record(s)/Flowfile(s) corresponding to Elasticsearch document(s) that resulted in an "error" (within Elasticsearch) will be routed here. |
| failure |
All flowfiles that fail for reasons unrelated to server availability go to this relationship. |
| original |
All flowfiles that are sent to Elasticsearch without request failures go to this relationship. |
| retry |
All flowfiles that fail due to server/cluster availability go to this relationship. |
| successful |
Record(s)/Flowfile(s) corresponding to Elasticsearch document(s) that did not result in an "error" (within Elasticsearch) will be routed here. |
## Writes attributes
| Name |
Description |
| elasticsearch.put.error |
The error message if there is an issue parsing the FlowFile, sending the parsed document to Elasticsearch or parsing the Elasticsearch response |
| elasticsearch.bulk.error |
The _bulk response if there was an error during processing the document within Elasticsearch. |
## See also
- [org.apache.nifi.processors.elasticsearch.PutElasticsearchRecord](/user-guide/data-integration/openflow/processors/putelasticsearchrecord)
---
title: PutElasticsearchRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putelasticsearchrecord.md
section: Loading & Unloading Data
---
# PutElasticsearchRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-elasticsearch-restapi-nar
## Description
A record-aware Elasticsearch put processor that uses the official Elastic REST client libraries. Each Record within the FlowFile is converted into a document to be sent to the Elasticsearch _bulk APi. Multiple documents can be batched into each Request sent to Elasticsearch. Each document's Bulk operation can be configured using Record Path expressions.
## Tags
elasticsearch, elasticsearch7, elasticsearch8, elasticsearch9, index, json, put, record
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Batch Size |
The number of records to send over in a single batch. |
| Client Service |
An Elasticsearch client service to use for running queries. |
| Date Format |
Specifies the format to use when writing Date fields. If not specified, the default format 'yyyy-MM-dd' is used. If specified, the value must match the Java Simple Date Format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/25/2017). |
| Dynamic Templates Record Path |
A RecordPath pointing to a field in the record(s) that contains the dynamic_templates for the document. Field must be Map-type compatible (e.g. a Map or Record) or a String parsable into a JSON Object. Requires Elasticsearch 7+ |
| Group Results by Bulk Error Type |
The errored records written to the "errors" relationship will be grouped by error type and the error related to the first record within the FlowFile added to the FlowFile as "elasticsearch.bulk.error". If "Treat Not Found as Success" is "false" then records associated with "not_found" Elasticsearch document responses will also be send to the "errors" relationship. |
| ID Record Path |
A record path expression to retrieve the ID field for use with Elasticsearch. If left blank the ID will be automatically generated by Elasticsearch. |
| Index |
The name of the index to use. |
| Index Operation |
The type of the operation used to index (create, delete, index, update, upsert) |
| Index Operation Record Path |
A record path expression to retrieve the Index Operation field for use with Elasticsearch. If left blank the Index Operation will be determined using the main Index Operation property. |
| Index Record Path |
A record path expression to retrieve the index field for use with Elasticsearch. If left blank the index will be determined using the main index property. |
| Log Error Responses |
If this is enabled, errors will be logged to the NiFi logs at the error log level. Otherwise, they will only be logged if debug logging is enabled on NiFi as a whole. The purpose of this option is to give the user the ability to debug failed operations without having to turn on debug logging. |
| Max JSON Field String Length |
The maximum allowed length of a string value when parsing a JSON document or attribute. |
| Output Error Responses |
If this is enabled, response messages from Elasticsearch marked as "error" will be output to the "error_responses" relationship. This does not impact the output of flowfiles to the "successful" or "errors" relationships |
| Record Reader |
The record reader to use for reading incoming records from flowfiles. |
| Result Record Writer |
The response from Elasticsearch will be examined for failed records and the failed records will be written to a record set with this record writer service and sent to the "errors" relationship. Successful records will be written to a record set with this record writer service and sent to the "successful" relationship. |
| Retain ID (Record Path) |
Whether to retain the existing field used as the ID Record Path. |
| Retain Record Timestamp |
Whether to retain the existing field used as the @timestamp Record Path. |
| Script Record Path |
A RecordPath pointing to a field in the record(s) that contains the script for the document update/upsert. Only applies to Update/Upsert operations. Field must be Map-type compatible (e.g. a Map or a Record) or a String parsable into a JSON Object |
| Scripted Upsert Record Path |
A RecordPath pointing to a field in the record(s) that contains the scripted_upsert boolean flag. Whether to add the scripted_upsert flag to the Upsert Operation. Forces Elasticsearch to execute the Script whether or not the document exists, defaults to false. If the Upsert Document provided (from FlowFile content) will be empty, but sure to set the Client Service controller service's Suppress Null and Empty Values to Never Suppress or no "upsert" doc will be, included in the request to Elasticsearch and the operation will not create a new document for the script to execute against, resulting in a "not_found" error |
| Time Format |
Specifies the format to use when writing Time fields. If not specified, the default format 'HH:mm:ss' is used. If specified, the value must match the Java Simple Date Format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15). |
| Timestamp Format |
Specifies the format to use when writing Timestamp fields. If not specified, the default format 'yyyy-MM-dd HH:mm:ss' is used. If specified, the value must match the Java Simple Date Format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/25/2017 18:04:15). |
| Timestamp Record Path |
A RecordPath pointing to a field in the record(s) that contains the @timestamp for the document. If left blank the @timestamp will be determined using the main @timestamp property |
| Timestamp Value |
The value to use as the @timestamp field (required for Elasticsearch Data Streams) |
| Treat Not Found as Success |
If true, "not_found" Elasticsearch Document associated Records will be routed to the "successful" relationship, otherwise to the "errors" relationship. If Output Error Responses is "true" then "not_found" responses from Elasticsearch will be sent to the error_responses relationship. |
| Type |
The type of this document (used by Elasticsearch for indexing and searching). |
| Type Record Path |
A record path expression to retrieve the type field for use with Elasticsearch. If left blank the type will be determined using the main type property. |
## Relationships
| Name |
Description |
| errors |
Record(s)/Flowfile(s) corresponding to Elasticsearch document(s) that resulted in an "error" (within Elasticsearch) will be routed here. |
| failure |
All flowfiles that fail for reasons unrelated to server availability go to this relationship. |
| original |
All flowfiles that are sent to Elasticsearch without request failures go to this relationship. |
| retry |
All flowfiles that fail due to server/cluster availability go to this relationship. |
| successful |
Record(s)/Flowfile(s) corresponding to Elasticsearch document(s) that did not result in an "error" (within Elasticsearch) will be routed here. |
## Writes attributes
| Name |
Description |
| elasticsearch.put.error |
The error message if there is an issue parsing the FlowFile records, sending the parsed documents to Elasticsearch or parsing the Elasticsearch response. |
| elasticsearch.put.error.count |
The number of records that generated errors in the Elasticsearch _bulk API. |
| elasticsearch.put.success.count |
The number of records that were successfully processed by the Elasticsearch _bulk API. |
| elasticsearch.bulk.error |
The _bulk response if there was an error during processing the record within Elasticsearch. |
## See also
- [org.apache.nifi.processors.elasticsearch.PutElasticsearchJson](/user-guide/data-integration/openflow/processors/putelasticsearchjson)
---
title: PutEmail 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putemail.md
section: Loading & Unloading Data
---
# PutEmail 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Sends an e-mail to configured recipients for each incoming FlowFile
## Tags
email, notify, put, smtp
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
true
## Properties
| Property |
Description |
| Attach File |
Specifies whether or not the FlowFile content should be attached to the email |
| BCC |
The recipients to include in the BCC-Line of the email. Comma separated sequence of addresses following RFC822 syntax. |
| CC |
The recipients to include in the CC-Line of the email. Comma separated sequence of addresses following RFC822 syntax. |
| Content Type |
Mime Type used to interpret the contents of the email, such as text/plain or text/html |
| From |
Specifies the Email address to use as the sender. Comma separated sequence of addresses following RFC822 syntax. |
| Include All Attributes In Message |
Specifies whether or not all FlowFile attributes should be recorded in the body of the email message |
| Message |
The body of the email message |
| Reply-To |
The recipients that will receive the reply instead of the from (see RFC2822 §3.6.2).This feature is useful, for example, when the email is sent by a no-reply account. This field is optional. Comma separated sequence of addresses following RFC822 syntax. |
| SMTP Auth |
Flag indicating whether authentication should be used |
| SMTP Hostname |
The hostname of the SMTP host |
| SMTP Password |
Password for the SMTP account |
| SMTP Port |
The Port used for SMTP communications |
| SMTP Socket Factory |
Socket Factory to use for SMTP Connection |
| SMTP TLS |
Flag indicating whether Opportunistic TLS should be enabled using STARTTLS command |
| SMTP Username |
Username for the SMTP account |
| SMTP X-Mailer Header |
X-Mailer used in the header of the outgoing email |
| Subject |
The email subject |
| To |
The recipients to include in the To-Line of the email. Comma separated sequence of addresses following RFC822 syntax. |
| attribute-name-regex |
A Regular Expression that is matched against all FlowFile attribute names. Any attribute whose name matches the regex will be added to the Email messages as a Header. If not specified, no FlowFile attributes will be added as headers. |
| authorization-mode |
How to authorize sending email on the user's behalf. |
| email-ff-content-as-message |
Specifies whether or not the FlowFile content should be the message of the email. If true, the 'Message' property is ignored. |
| input-character-set |
Specifies the character set of the FlowFile contents for reading input FlowFile contents to generate the message body or as an attachment to the message. If not set, UTF-8 will be the default value. |
| oauth2-access-token-provider |
OAuth2 service that can provide access tokens. |
## Relationships
| Name |
Description |
| failure |
FlowFiles that fail to send will be routed to this relationship |
| success |
FlowFiles that are successfully sent will be routed to this relationship |
---
title: PutFile 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putfile.md
section: Loading & Unloading Data
---
# PutFile 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Writes the contents of a FlowFile to the local file system
## Tags
archive, copy, files, filesystem, local, put
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Conflict Resolution Strategy |
Indicates what should happen when a file with the same name already exists in the output directory |
| Create Missing Directories |
If true, then missing destination directories will be created. If false, flowfiles are penalized and sent to failure. |
| Directory |
The directory to which files should be written. You may use expression language such as /aa/bb/$\{path\} |
| Group |
Sets the group on the output file to the value of this attribute. You may also use expression language such as $\{file.group\}. |
| Last Modified Time |
Sets the lastModifiedTime on the output file to the value of this attribute. Format must be yyyy-MM-dd 'T'HH:mm:ssZ. You may also use expression language such as $\{file.lastModifiedTime\}. |
| Maximum File Count |
Specifies the maximum number of files that can exist in the output directory |
| Owner |
Sets the owner on the output file to the value of this attribute. You may also use expression language such as $\{file.owner\}. Note on many operating systems Nifi must be running as a super-user to have the permissions to set the file owner. |
| Permissions |
Sets the permissions on the output file to the value of this attribute. Format must be either UNIX rwxrwxrwx with a - in place of denied permissions (e.g. rw-r–r–) or an octal number (e.g. 644). You may also use expression language such as $\{file.permissions\}. |
## Restrictions
| Required Permission |
Explanation |
| write filesystem |
Provides operator the ability to write to any file that NiFi has access to. |
## Relationships
| Name |
Description |
| failure |
Files that could not be written to the output directory for some reason are transferred to this relationship |
| success |
Files that have been successfully written to the output directory are transferred to this relationship |
## See also
- [org.apache.nifi.processors.standard.FetchFile](/user-guide/data-integration/openflow/processors/fetchfile)
- [org.apache.nifi.processors.standard.GetFile](/user-guide/data-integration/openflow/processors/getfile)
---
title: PutFTP 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putftp.md
section: Loading & Unloading Data
---
# PutFTP 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Sends FlowFiles to an FTP Server
## Tags
archive, copy, egress, files, ftp, put, remote
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Batch Size |
The maximum number of FlowFiles to send in a single connection |
| Conflict Resolution |
Determines how to handle the problem of filename collisions |
| Connection Mode |
The FTP Connection Mode |
| Connection Timeout |
Amount of time to wait before timing out while creating a connection |
| Create Directory |
Specifies whether or not the remote directory should be created if it does not exist. |
| Data Timeout |
When transferring a file between the local and remote system, this value specifies how long is allowed to elapse without any data being transferred between systems |
| Dot Rename |
If true, then the filename of the sent file is prepended with a "." and then renamed back to the original once the file is completely sent. Otherwise, there is no rename. This property is ignored if the Temporary Filename property is set. |
| Hostname |
The fully qualified hostname or IP address of the remote system |
| Internal Buffer Size |
Set the internal buffer size for buffered data streams |
| Last Modified Time |
The lastModifiedTime to assign to the file after transferring it. If not set, the lastModifiedTime will not be changed. Format must be yyyy-MM-dd 'T'HH:mm:ssZ. You may also use expression language such as $\{file.lastModifiedTime\}. If the value is invalid, the processor will not be invalid but will fail to change lastModifiedTime of the file. |
| Password |
Password for the user account |
| Permissions |
The permissions to assign to the file after transferring it. Format must be either UNIX rwxrwxrwx with a - in place of denied permissions (e.g. rw-r–r–) or an octal number (e.g. 644). If not set, the permissions will not be changed. You may also use expression language such as $\{file.permissions\}. If the value is invalid, the processor will not be invalid but will fail to change permissions of the file. |
| Port |
The port that the remote system is listening on for file transfers |
| Reject Zero-Byte Files |
Determines whether or not Zero-byte files should be rejected without attempting to transfer |
| Remote Path |
The path on the remote system from which to pull or push files |
| Temporary Filename |
If set, the filename of the sent file will be equal to the value specified during the transfer and after successful completion will be renamed to the original filename. If this value is set, the Dot Rename property is ignored. |
| Transfer Mode |
The FTP Transfer Mode |
| Use Compression |
Indicates whether or not ZLIB compression should be used when transferring files |
| Username |
Username |
| ftp-use-utf8 |
Tells the client to use UTF-8 encoding when processing files and filenames. If set to true, the server must also support UTF-8 encoding. |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
FlowFiles that failed to send to the remote system; failure is usually looped back to this processor |
| reject |
FlowFiles that were rejected by the destination system |
| success |
FlowFiles that are successfully sent will be routed to success |
## See also
- [org.apache.nifi.processors.standard.GetFTP](/user-guide/data-integration/openflow/processors/getftp)
---
title: PutGCSObject 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putgcsobject.md
section: Loading & Unloading Data
---
# PutGCSObject 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-gcp-nar
## Description
Writes the contents of a FlowFile as an object in a Google Cloud Storage.
## Tags
archive, gcs, google, google cloud, put
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| File Resource Service |
File Resource Service providing access to the local resource to be transferred |
| GCP Credentials Provider Service |
The Controller Service used to obtain Google Cloud Platform credentials. |
| Resource Transfer Source |
The source of the content to be transferred |
| gcp-project-id |
Google Cloud Project ID |
| gcp-retry-count |
How many retry attempts should be made before routing to the failure relationship. |
| gcs-bucket |
Bucket of the object. |
| gcs-content-disposition-type |
Type of RFC-6266 Content Disposition to be attached to the object |
| gcs-content-type |
Content Type for the file, i.e. text/plain |
| gcs-key |
Name of the object. |
| gcs-object-acl |
Access Control to be attached to the object uploaded. Not providing this will revert to bucket defaults. |
| gcs-object-crc32c |
CRC32C Checksum (encoded in Base64, big-Endian order) of the file for server-side validation. |
| gcs-overwrite-object |
If false, the upload to GCS will succeed only if the object does not exist. |
| gcs-server-side-encryption-key |
An AES256 Encryption Key (encoded in base64) for server-side encryption of the object. |
| gzip.content.enabled |
Signals to the GCS Blob Writer whether GZIP compression during transfer is desired. False means do not gzip and can boost performance in many cases. |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
| storage-api-url |
Overrides the default storage URL. Configuring an alternative Storage API URL also overrides the HTTP Host header on requests as described in the Google documentation for Private Service Connections. |
## Relationships
| Name |
Description |
| failure |
FlowFiles are routed to this relationship if the Google Cloud Storage operation fails. |
| success |
FlowFiles are routed to this relationship after a successful Google Cloud Storage operation. |
## Writes attributes
| Name |
Description |
| gcs.bucket |
Bucket of the object. |
| gcs.key |
Name of the object. |
| gcs.size |
Size of the object. |
| gcs.cache.control |
Data cache control of the object. |
| gcs.component.count |
The number of components which make up the object. |
| gcs.content.disposition |
The data content disposition of the object. |
| gcs.content.encoding |
The content encoding of the object. |
| gcs.content.language |
The content language of the object. |
| mime.type |
The MIME/Content-Type of the object |
| gcs.crc32c |
The CRC32C checksum of object's data, encoded in base64 in big-endian order. |
| gcs.create.time |
The creation time of the object (milliseconds) |
| gcs.update.time |
The last modification time of the object (milliseconds) |
| gcs.encryption.algorithm |
The algorithm used to encrypt the object. |
| gcs.encryption.sha256 |
The SHA256 hash of the key used to encrypt the object |
| gcs.etag |
The HTTP 1.1 Entity tag for the object. |
| gcs.generated.id |
The service-generated for the object |
| gcs.generation |
The data generation of the object. |
| gcs.md5 |
The MD5 hash of the object's data encoded in base64. |
| gcs.media.link |
The media download link to the object. |
| gcs.metageneration |
The metageneration of the object. |
| gcs.owner |
The owner (uploader) of the object. |
| gcs.owner.type |
The ACL entity type of the uploader of the object. |
| gcs.uri |
The URI of the object as a string. |
## See also
- [org.apache.nifi.processors.gcp.storage.DeleteGCSObject](/user-guide/data-integration/openflow/processors/deletegcsobject)
- [org.apache.nifi.processors.gcp.storage.FetchGCSObject](/user-guide/data-integration/openflow/processors/fetchgcsobject)
- [org.apache.nifi.processors.gcp.storage.ListGCSBucket](/user-guide/data-integration/openflow/processors/listgcsbucket)
---
title: PutGoogleDrive 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putgoogledrive.md
section: Loading & Unloading Data
---
# PutGoogleDrive 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-gcp-nar
## Description
Writes the contents of a FlowFile as a file in Google Drive.
## Tags
drive, google, put, storage
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| chunked-upload-size |
Defines the size of a chunk. Used when a FlowFile 's size exceeds'Chunked Upload Threshold' and content is uploaded in smaller chunks. Minimum allowed chunk size is 256 KB, maximum allowed chunk size is 1 GB. |
| chunked-upload-threshold |
The maximum size of the content which is uploaded at once. FlowFiles larger than this threshold are uploaded in chunks. |
| conflict-resolution-strategy |
Indicates what should happen when a file with the same name already exists in the specified Google Drive folder. |
| connect-timeout |
Maximum wait time for connection to Google Drive service. |
| file-name |
The name of the file to upload to the specified Google Drive folder. |
| folder-id |
The ID of the shared folder. Please see Additional Details to set up access to Google Drive and obtain Folder ID. |
| gcp-credentials-provider-service |
The Controller Service used to obtain Google Cloud Platform credentials. |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
| read-timeout |
Maximum wait time for response from Google Drive service. |
## Relationships
| Name |
Description |
| failure |
Files that could not be written to Google Drive for some reason are transferred to this relationship. |
| success |
Files that have been successfully written to Google Drive are transferred to this relationship. |
## Writes attributes
| Name |
Description |
| drive.id |
The id of the file |
| filename |
The name of the file |
| mime.type |
The MIME type of the file |
| drive.size |
The size of the file. Set to 0 when the file size is not available (e.g. externally stored files). |
| drive.size.available |
Indicates if the file size is known / available |
| drive.timestamp |
The last modified time or created time (whichever is greater) of the file. The reason for this is that the original modified date of a file is preserved when uploaded to Google Drive. 'Created time' takes the time when the upload occurs. However uploaded files can still be modified later. |
| drive.created.time |
The file's creation time |
| drive.modified.time |
The file's last modification time |
| error.code |
The error code returned by Google Drive |
| error.message |
The error message returned by Google Drive |
## See also
- [org.apache.nifi.processors.gcp.drive.FetchGoogleDrive](/user-guide/data-integration/openflow/processors/fetchgoogledrive)
- [org.apache.nifi.processors.gcp.drive.ListGoogleDrive](/user-guide/data-integration/openflow/processors/listgoogledrive)
---
title: PutGridFS 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putgridfs.md
section: Loading & Unloading Data
---
# PutGridFS 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-mongodb-nar
## Description
Writes a file to a GridFS bucket.
## Tags
file, gridfs, mongo, put, store
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| gridfs-bucket-name |
The GridFS bucket where the files will be stored. If left blank, it will use the default value 'fs' that the MongoDB client driver uses. |
| gridfs-client-service |
The MongoDB client service to use for database connections. |
| gridfs-database-name |
The name of the database to use |
| gridfs-file-name |
The name of the file in the bucket that is the target of this processor. GridFS file names do not include path information because GridFS does not sort files into folders within a bucket. |
| putgridfs-chunk-size |
Controls the maximum size of each chunk of a file uploaded into GridFS. |
| putgridfs-enforce-uniqueness |
When enabled, this option will ensure that uniqueness is enforced on the bucket. It will do so by creating a MongoDB index that matches your selection. It should ideally be configured once when the bucket is created for the first time because it could take a long time to build on an existing bucket wit a lot of data. |
| putgridfs-hash-attribute |
If uniquness enforcement is enabled and the file hash is part of the constraint, this must be set to an attribute that exists on all incoming flowfiles. |
| putgridfs-properties-prefix |
Attributes that have this prefix will be added to the file stored in GridFS as metadata. |
## Relationships
| Name |
Description |
| duplicate |
Flowfiles that fail the duplicate check are sent to this relationship. |
| failure |
When there is a failure processing the flowfile, it goes to this relationship. |
| success |
When the operation succeeds, the flowfile is sent to this relationship. |
---
title: PutHubSpot 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/puthubspot.md
section: Loading & Unloading Data
---
# PutHubSpot 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-hubspot-processors-nar
## Description
Upsert a HubSpot object.
## Tags
Preview, hubspot
## Input Requirement
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Associated Object ID Property |
Target HubSpot property used to uniquely identify the object to associate to from the configured object. |
| Associated Object ID Value |
Target HubSpot property value for the 'Associated Object ID Property' to associate to from the configured object. |
| Associated Object Type |
Target HubSpot object type to associate to from the configured object. |
| Association Type ID |
The HubSpot defined association id from the 'Object ID Value' to the 'Associated Object ID Value'. |
| HubSpot Service |
HubSpot Client Service. |
| Inverse Association Type ID |
The HubSpot defined association id from the 'Associated Object ID Value' to the 'Object ID Value'. |
| Missing HubSpot Property Policy |
What to action to take if HubSpot does not have a matching property. |
| Object ID Property |
HubSpot property used to uniquely identify the object. |
| Object ID Value |
Matching HubSpot property value to search for. |
| Object Override Properties |
Comma-delimited list of NiFi attributes, which if exist, will be added as object properties. Any existing properties in HubSpot will be overridden. |
| Object Set Properties |
Comma-delimited list of NiFi attributes, which if exist, will be added as object properties if the current object property in HubSpot is empty. |
| Object Type |
HubSpot object type |
## Relationships
| Name |
Description |
| failure |
HubSpot fail relationship |
| retry |
HubSpot retry relationship. FlowFiles that failed to process due to a server timeout or rate limit related error. FlowFiles routed here should be routed back into the processor. |
| success |
HubSpot success relationship |
## See also
- [com.snowflake.openflow.runtime.processors.hubspot.GetHubSpotObject](/user-guide/data-integration/openflow/processors/gethubspotobject)
- [com.snowflake.openflow.runtime.processors.hubspot.GetHubSpotSchema](/user-guide/data-integration/openflow/processors/gethubspotschema)
- [com.snowflake.openflow.runtime.processors.hubspot.ListArchivedHubSpotData](/user-guide/data-integration/openflow/processors/listarchivedhubspotdata)
- [com.snowflake.openflow.runtime.processors.hubspot.ListHubSpotObjects](/user-guide/data-integration/openflow/processors/listhubspotobjects)
---
title: PutIcebergTable 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/puticebergtable.md
section: Loading & Unloading Data
---
# PutIcebergTable 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-iceberg-processors-nar
## Description
Store records in Iceberg using configurable Catalog for managing namespaces and tables.
## Tags
analytics, iceberg, openflow, parquet, polaris, s3
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Iceberg Catalog |
Provider Service for Iceberg Catalog |
| Iceberg Writer |
Provider Service for Iceberg Row Writers responsible for producing formatted Iceberg Data Files |
| Namespace |
Iceberg Namespace containing Tables |
| Record Reader |
Record Reader for incoming FlowFiles |
| Table Name |
Iceberg Table Name |
## Relationships
| Name |
Description |
| failure |
FlowFiles not transferred to Iceberg |
| success |
FlowFiles transferred to Iceberg |
---
title: PutKinesisFirehose 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putkinesisfirehose.md
section: Loading & Unloading Data
---
# PutKinesisFirehose 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-aws-nar
## Description
Sends the contents to a specified Amazon Kinesis Firehose. In order to send data to firehose, the firehose delivery stream name has to be specified.
## Tags
amazon, aws, firehose, kinesis, put, stream
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| AWS Credentials Provider service |
The Controller Service that is used to obtain AWS credentials provider |
| Amazon Kinesis Firehose Delivery Stream Name |
The name of kinesis firehose delivery stream |
| Batch Size |
Batch size for messages (1-500). |
| Communications Timeout |
|
| Endpoint Override URL |
Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. |
| Max message buffer size |
Max message buffer |
| Region |
|
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
FlowFiles are routed to failure relationship |
| success |
FlowFiles are routed to success relationship |
## Writes attributes
| Name |
Description |
| aws.kinesis.firehose.error.message |
Error message on posting message to AWS Kinesis Firehose |
| aws.kinesis.firehose.error.code |
Error code for the message when posting to AWS Kinesis Firehose |
| aws.kinesis.firehose.record.id |
Record id of the message posted to Kinesis Firehose |
---
title: PutKinesisStream 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putkinesisstream.md
section: Loading & Unloading Data
---
# PutKinesisStream 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-aws-nar
## Description
Sends the contents to a specified Amazon Kinesis. In order to send data to Kinesis, the stream name has to be specified.
## Tags
amazon, aws, kinesis, put, stream
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| AWS Credentials Provider service |
The Controller Service that is used to obtain AWS credentials provider |
| Communications Timeout |
|
| Endpoint Override URL |
Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. |
| Max Message Buffer Size |
Max message buffer size defined with standard data size units |
| Message Batch Size |
Batch size for messages (1-500). |
| Region |
|
| Stream Name |
The name of Kinesis Stream |
| Stream Partition Key |
The partition key attribute. If it is not set, a random value is used |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
FlowFiles are routed to failure relationship |
| success |
FlowFiles are routed to success relationship |
## Writes attributes
| Name |
Description |
| aws.kinesis.error.message |
Error message on posting message to AWS Kinesis |
| aws.kinesis.error.code |
Error code for the message when posting to AWS Kinesis |
| aws.kinesis.sequence.number |
Sequence number for the message when posting to AWS Kinesis |
| aws.kinesis.shard.id |
Shard id of the message posted to AWS Kinesis |
## See also
- [org.apache.nifi.processors.aws.kinesis.stream.ConsumeKinesisStream](/user-guide/data-integration/openflow/processors/consumekinesisstream)
---
title: PutLambda 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putlambda.md
section: Loading & Unloading Data
---
# PutLambda 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-aws-nar
## Description
Sends the contents to a specified Amazon Lambda Function. The AWS credentials used for authentication must have permissions execute the Lambda function (lambda:InvokeFunction).The FlowFile content must be JSON.
## Tags
amazon, aws, lambda, put
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| AWS Credentials Provider service |
The Controller Service that is used to obtain AWS credentials provider |
| Amazon Lambda Name |
The Lambda Function Name |
| Amazon Lambda Qualifier (version) |
The Lambda Function Version |
| Communications Timeout |
|
| Endpoint Override URL |
Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. |
| Region |
|
| SSL Context Service |
Specifies an optional SSL Context Service that, if provided, will be used to create connections |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
FlowFiles are routed to failure relationship |
| success |
FlowFiles are routed to success relationship |
## Writes attributes
| Name |
Description |
| aws.lambda.result.function.error |
Function error message in result on posting message to AWS Lambda |
| aws.lambda.result.status.code |
Status code in the result for the message when posting to AWS Lambda |
| aws.lambda.result.payload |
Payload in the result from AWS Lambda |
| aws.lambda.result.log |
Log in the result of the message posted to Lambda |
| aws.lambda.exception.message |
Exception message on invoking from AWS Lambda |
| aws.lambda.exception.cause |
Exception cause on invoking from AWS Lambda |
| aws.lambda.exception.error.code |
Exception error code on invoking from AWS Lambda |
| aws.lambda.exception.request.id |
Exception request id on invoking from AWS Lambda |
| aws.lambda.exception.status.code |
Exception status code on invoking from AWS Lambda |
---
title: PutMongo 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putmongo.md
section: Loading & Unloading Data
---
# PutMongo 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-mongodb-nar
## Description
Writes the contents of a FlowFile to MongoDB
## Tags
insert, mongodb, put, update, write
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Character Set |
The Character Set in which the data is encoded |
| Mode |
Indicates whether the processor should insert or update content |
| Mongo Collection Name |
The name of the collection to use |
| Mongo Database Name |
The name of the database to use |
| Update Method |
MongoDB method for running collection update operations, such as updateOne or updateMany |
| Update Query Key |
One or more comma-separated document key names used to build the update query criteria, such as _id |
| Upsert |
When true, inserts a document if no document matches the update query criteria; this property is valid only when using update mode, otherwise it is ignored |
| mongo-client-service |
If configured, this property will use the assigned client service for connection pooling. |
| put-mongo-update-mode |
Choose an update mode. You can either supply a JSON document to use as a direct replacement or specify a document that contains update operators like $set, $unset, and $inc. When Operators mode is enabled, the flowfile content is expected to be the operator part for example: \{$set:\{"key": "value"\},$inc:\{"count":1234\}\} and the update query will come from the configured Update Query property. |
| putmongo-update-query |
Specify a full MongoDB query to be used for the lookup query to do an update/upsert. NOTE: this field is ignored if the 'Update Query Key' value is not empty. |
## Relationships
| Name |
Description |
| failure |
All FlowFiles that cannot be written to MongoDB are routed to this relationship |
| success |
All FlowFiles that are written to MongoDB are routed to this relationship |
## Writes attributes
| Name |
Description |
| mongo.put.update.match.count |
The match count from result if update/upsert is performed, otherwise not set. |
| mongo.put.update.modify.count |
The modify count from result if update/upsert is performed, otherwise not set. |
| mongo.put.upsert.id |
The '_id' hex value if upsert is performed, otherwise not set. |
---
title: PutMongoBulkOperations 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putmongobulkoperations.md
section: Loading & Unloading Data
---
# PutMongoBulkOperations 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-mongodb-nar
## Description
Writes the contents of a FlowFile to MongoDB as bulk-update
## Tags
bulk, insert, mongodb, put, update, write
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Character Set |
The Character Set in which the data is encoded |
| Mongo Collection Name |
The name of the collection to use |
| Mongo Database Name |
The name of the database to use |
| Ordered |
Ordered execution of bulk-writes and break on error - otherwise arbitrary order and continue on error |
| mongo-client-service |
If configured, this property will use the assigned client service for connection pooling. |
## Relationships
| Name |
Description |
| failure |
All FlowFiles that cannot be written to MongoDB are routed to this relationship |
| success |
All FlowFiles that are written to MongoDB are routed to this relationship |
---
title: PutMongoRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putmongorecord.md
section: Loading & Unloading Data
---
# PutMongoRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-mongodb-nar
## Description
This processor is a record-aware processor for inserting/upserting data into MongoDB. It uses a configured record reader and schema to read an incoming record set from the body of a flowfile and then inserts/upserts batches of those records into a configured MongoDB collection. This processor does not support deletes. The number of documents to insert/upsert at a time is controlled by the "Batch Size" configuration property. This value should be set to a reasonable size to ensure that MongoDB is not overloaded with too many operations at once.
## Tags
insert, mongodb, put, record, update, upsert
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Mongo Collection Name |
The name of the collection to use |
| Mongo Database Name |
The name of the database to use |
| bypass-validation |
Enable or disable bypassing document schema validation during insert or update operations. Bypassing document validation is a Privilege Action in MongoDB. Enabling this property can result in authorization errors for users with limited privileges. |
| insert_count |
The number of records to group together for one single insert/upsert operation against MongoDB. |
| mongo-client-service |
If configured, this property will use the assigned client service for connection pooling. |
| ordered |
Perform ordered or unordered operations |
| record-reader |
Specifies the Controller Service to use for parsing incoming data and determining the data's schema |
| update-key-fields |
Comma separated list of fields based on which to identify documents that need to be updated. If this property is set NiFi will attempt an upsert operation on all documents. If this property is not set all documents will be inserted. |
| update-mode |
Choose between updating a single document or multiple documents per incoming record. |
## Relationships
| Name |
Description |
| failure |
All FlowFiles that cannot be written to MongoDB are routed to this relationship |
| success |
All FlowFiles that are written to MongoDB are routed to this relationship |
---
title: PutRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putrecord.md
section: Loading & Unloading Data
---
# PutRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
The PutRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file, and sends them to a destination specified by a Record Destination Service (i.e. record sink).
## Tags
put, record, sink
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| put-record-include-zero-record-results |
If no records are read from the incoming FlowFile, this property specifies whether or not an empty record set will be transmitted. The original FlowFile will still be routed to success, but if no transmission occurs, no provenance SEND event will be generated. |
| put-record-reader |
Specifies the Controller Service to use for reading incoming data |
| put-record-sink |
Specifies the Controller Service to use for writing out the query result records to some destination. |
## Relationships
| Name |
Description |
| failure |
A FlowFile is routed to this relationship if the records could not be transmitted and retrying the operation will also fail |
| retry |
The original FlowFile is routed to this relationship if the records could not be transmitted but attempting the operation again may succeed |
| success |
The original FlowFile will be routed to this relationship if the records were transmitted successfully |
---
title: PutRedisHashRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putredishashrecord.md
section: Loading & Unloading Data
---
# PutRedisHashRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-redis-nar
## Description
Puts record field data into Redis using a specified hash value, which is determined by a RecordPath to a field in each record containing the hash value. The record fields and values are stored as key/value pairs associated by the hash value. NOTE: Neither the evaluated hash value nor any of the field values can be null. If the hash value is null, the FlowFile will be routed to failure. For each of the field values, if the value is null that field will be not set in Redis.
## Tags
hash, put, record, redis
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| charset |
Specifies the character set to use when storing record field values as strings. All fields will be converted to strings using this character set before being stored in Redis. |
| data-record-path |
This property denotes a RecordPath that will be evaluated against each incoming Record and the Record that results from evaluating the RecordPath will be sent to Redis instead of sending the entire incoming Record. The property defaults to the root '/' which corresponds to a 'flat' record (all fields/values at the top level of the Record. |
| hash-value-record-path |
Specifies a RecordPath to evaluate against each Record in order to determine the hash value associated with all the record fields/values (see 'hset' in Redis documentation for more details). The RecordPath must point at exactly one field or an error will occur. |
| record-reader |
Specifies the Controller Service to use for parsing incoming data and determining the data's schema |
| redis-connection-pool |
|
## Relationships
| Name |
Description |
| failure |
FlowFiles containing Records with processing errors will be routed to this relationship |
| success |
FlowFiles having all Records stored in Redis will be routed to this relationship |
## Writes attributes
| Name |
Description |
| redis.success.record.count |
Number of records written to Redis |
---
title: PutS3Object 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/puts3object.md
section: Loading & Unloading Data
---
# PutS3Object 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-aws-nar
## Description
Writes the contents of a FlowFile as an S3 Object to an Amazon S3 Bucket.
## Tags
AWS, Amazon, Archive, Put, S3
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| AWS Credentials Provider service |
The Controller Service that is used to obtain AWS credentials provider |
| Bucket |
The S3 Bucket to interact with |
| Cache Control |
Sets the Cache-Control HTTP header indicating the caching directives of the associated object. Multiple directives are comma-separated. |
| Canned ACL |
Amazon Canned ACL for an object, one of: BucketOwnerFullControl, BucketOwnerRead, LogDeliveryWrite, AuthenticatedRead, PublicReadWrite, PublicRead, Private; will be ignored if any other ACL/permission/owner property is specified |
| Communications Timeout |
The amount of time to wait in order to establish a connection to AWS or receive data from AWS before timing out. |
| Content Disposition |
Sets the Content-Disposition HTTP header indicating if the content is intended to be displayed inline or should be downloaded. Possible values are 'inline' or 'attachment'. If this property is not specified, object 's content-disposition will be set to filename. When' attachment 'is selected,'; filename='plus object key are automatically appended to form final value' attachment; filename="filename.jpg"'. |
| Content Type |
Sets the Content-Type HTTP header indicating the type of content stored in the associated object. The value of this header is a standard MIME type. AWS S3 Java client will attempt to determine the correct content type if one hasn't been set yet. Users are responsible for ensuring a suitable content type is set when uploading streams. If no content type is provided and cannot be determined by the filename, the default content type "application/octet-stream" will be used. |
| Custom Signer Class Name |
Fully qualified class name of the custom signer class. The signer must implement com.amazonaws.auth. Signer interface. |
| Custom Signer Module Location |
Comma-separated list of paths to files and/or directories which contain the custom signer's JAR file and its dependencies (if any). |
| Encryption Service |
Specifies the Encryption Service Controller used to configure requests. PutS3Object: For backward compatibility, this value is ignored when 'Server Side Encryption' is set. FetchS3Object: Only needs to be configured in case of Server-side Customer Key, Client-side KMS and Client-side Customer Key encryptions. |
| Endpoint Override URL |
Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. |
| Expiration Time Rule |
|
| File Resource Service |
File Resource Service providing access to the local resource to be transferred |
| FullControl User List |
A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Full Control for an object |
| Multipart Part Size |
Specifies the part size for use when the PutS3Multipart Upload API is used. Flow files will be broken into chunks of this size for the upload process, but the last part sent can be smaller since it is not padded. The valid range is 50MB to 5GB. |
| Multipart Threshold |
Specifies the file size threshold for switch from the PutS3Object API to the PutS3MultipartUpload API. Flow files bigger than this limit will be sent using the stateful multipart process. The valid range is 50MB to 5GB. |
| Multipart Upload AgeOff Interval |
Specifies the interval at which existing multipart uploads in AWS S3 will be evaluated for ageoff. When processor is triggered it will initiate the ageoff evaluation if this interval has been exceeded. |
| Multipart Upload Max Age Threshold |
Specifies the maximum age for existing multipart uploads in AWS S3. When the ageoff process occurs, any upload older than this threshold will be aborted. |
| Object Key |
The S3 Object Key to use. This is analogous to a filename for traditional file systems. |
| Object Tags Prefix |
Specifies the prefix which would be scanned against the incoming FlowFile 's attributes and the matching attribute's name and value would be considered as the outgoing S3 object 's Tag name and Tag value respectively. For Ex: If the incoming FlowFile carries the attributes tagS3country, tagS3PII, the tag prefix to be specified would be' tagS3' |
| Owner |
The Amazon ID to use for the object's owner |
| Read ACL User List |
A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have permissions to read the Access Control List for an object |
| Read Permission User List |
A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Read Access for an object |
| Region |
The AWS Region to connect to. |
| Remove Tag Prefix |
If set to 'True', the value provided for 'Object Tags Prefix' will be removed from the attribute(s) and then considered as the Tag name. For ex: If the incoming FlowFile carries the attributes tagS3country, tagS3PII and the prefix is set to 'tagS3' then the corresponding tag values would be 'country' and 'PII' |
| Resource Transfer Source |
The source of the content to be transferred |
| SSL Context Service |
Specifies an optional SSL Context Service that, if provided, will be used to create connections |
| Server Side Encryption |
Specifies the algorithm used for server side encryption. |
| Signer Override |
The AWS S3 library uses Signature Version 4 by default but this property allows you to specify the Version 2 signer to support older S3-compatible services or even to plug in your own custom signer implementation. |
| Storage Class |
|
| Temporary Directory Multipart State |
Directory in which, for multipart uploads, the processor will locally save the state tracking the upload ID and parts uploaded which must both be provided to complete the upload. |
| Use Chunked Encoding |
Enables / disables chunked encoding for upload requests. Set it to false only if your endpoint does not support chunked uploading. |
| Use Path Style Access |
Path-style access can be enforced by setting this property to true. Set it to true if your endpoint does not support virtual-hosted-style requests, only path-style requests. |
| Write ACL User List |
A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have permissions to change the Access Control List for an object |
| Write Permission User List |
A comma-separated list of Amazon User ID's or E-mail addresses that specifies who should have Write Access for an object |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
If the Processor is unable to process a given FlowFile, it will be routed to this Relationship. |
| success |
FlowFiles are routed to this Relationship after they have been successfully processed. |
## Writes attributes
| Name |
Description |
| s3.url |
The URL that can be used to access the S3 object |
| s3.bucket |
The S3 bucket where the Object was put in S3 |
| s3.key |
The S3 key within where the Object was put in S3 |
| s3.contenttype |
The S3 content type of the S3 Object that put in S3 |
| s3.version |
The version of the S3 Object that was put to S3 |
| s3.exception |
The class name of the exception thrown during processor execution |
| s3.additionalDetails |
The S3 supplied detail from the failed operation |
| s3.statusCode |
The HTTP error code (if available) from the failed operation |
| s3.errorCode |
The S3 moniker of the failed operation |
| s3.errorMessage |
The S3 exception message from the failed operation |
| s3.etag |
The ETag of the S3 Object |
| s3.contentdisposition |
The content disposition of the S3 Object that put in S3 |
| s3.cachecontrol |
The cache-control header of the S3 Object |
| s3.uploadId |
The uploadId used to upload the Object to S3 |
| s3.expiration |
A human-readable form of the expiration date of the S3 object, if one is set |
| s3.sseAlgorithm |
The server side encryption algorithm of the object |
| s3.usermetadata |
A human-readable form of the User Metadata of the S3 object, if any was set |
| s3.encryptionStrategy |
The name of the encryption strategy, if any was set |
## See also
- [org.apache.nifi.processors.aws.s3.CopyS3Object](/user-guide/data-integration/openflow/processors/copys3object)
- [org.apache.nifi.processors.aws.s3.DeleteS3Object](/user-guide/data-integration/openflow/processors/deletes3object)
- [org.apache.nifi.processors.aws.s3.FetchS3Object](/user-guide/data-integration/openflow/processors/fetchs3object)
- [org.apache.nifi.processors.aws.s3.GetS3ObjectMetadata](/user-guide/data-integration/openflow/processors/gets3objectmetadata)
- [org.apache.nifi.processors.aws.s3.GetS3ObjectTags](/user-guide/data-integration/openflow/processors/gets3objecttags)
- [org.apache.nifi.processors.aws.s3.ListS3](/user-guide/data-integration/openflow/processors/lists3)
- [org.apache.nifi.processors.aws.s3.TagS3Object](/user-guide/data-integration/openflow/processors/tags3object)
---
title: PutSalesforceObject 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsalesforceobject.md
section: Loading & Unloading Data
---
# PutSalesforceObject 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-salesforce-nar
## Description
Creates new records for the specified Salesforce sObject. The type of the Salesforce object must be set in the input flowfile 's' objectType' attribute. This processor cannot update existing records.
## Tags
put, salesforce, sobject
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| oauth2-access-token-provider |
Service providing OAuth2 Access Tokens for authenticating using the HTTP Authorization Header |
| read-timeout |
Maximum time allowed for reading a response from the Salesforce REST API |
| record-reader |
Specifies the Controller Service to use for parsing incoming data and determining the data's schema |
| salesforce-api-version |
The version number of the Salesforce REST API appended to the URL after the services/data path. See Salesforce documentation for supported versions |
| salesforce-url |
The URL of the Salesforce instance including the domain without additional path information, such as [https://MyDomainName.my.salesforce.com](https://MyDomainName.my.salesforce.com) |
## Relationships
| Name |
Description |
| failure |
For FlowFiles created as a result of an execution error. |
| success |
For FlowFiles created as a result of a successful execution. |
## Writes attributes
| Name |
Description |
| error.message |
The error message returned by Salesforce. |
## See also
- [org.apache.nifi.processors.salesforce.QuerySalesforceObject](/user-guide/data-integration/openflow/processors/querysalesforceobject)
---
title: PutSFTP 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsftp.md
section: Loading & Unloading Data
---
# PutSFTP 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Sends FlowFiles to an SFTP Server
## Tags
archive, copy, egress, files, put, remote, sftp
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Algorithm Negotiation |
Configuration strategy for SSH algorithm negotiation |
| Batch Size |
The maximum number of FlowFiles to send in a single connection |
| Ciphers Allowed |
A comma-separated list of Ciphers allowed for SFTP connections. Leave unset to allow all. Available options are: 3des-cbc, aes128-cbc, aes128-ctr, [aes128-gcm@openssh.com](mailto:aes128-gcm@openssh.com), aes192-cbc, aes192-ctr, aes256-cbc, aes256-ctr, [aes256-gcm@openssh.com](mailto:aes256-gcm@openssh.com), arcfour128, arcfour256, blowfish-cbc, [chacha20-poly1305@openssh.com](mailto:chacha20-poly1305@openssh.com), none |
| Conflict Resolution |
Determines how to handle the problem of filename collisions |
| Connection Timeout |
Amount of time to wait before timing out while creating a connection |
| Create Directory |
Specifies whether or not the remote directory should be created if it does not exist. |
| Data Timeout |
When transferring a file between the local and remote system, this value specifies how long is allowed to elapse without any data being transferred between systems |
| Disable Directory Listing |
If set to 'true', directory listing is not performed prior to create missing directories. By default, this processor executes a directory listing command to see target directory existence before creating missing directories. However, there are situations that you might need to disable the directory listing such as the following. Directory listing might fail with some permission setups (e.g. chmod 100) on a directory. Also, if any other SFTP client created the directory after this processor performed a listing and before a directory creation request by this processor is finished, then an error is returned because the directory already exists. |
| Dot Rename |
If true, then the filename of the sent file is prepended with a "." and then renamed back to the original once the file is completely sent. Otherwise, there is no rename. This property is ignored if the Temporary Filename property is set. |
| Host Key File |
If supplied, the given file will be used as the Host Key; otherwise, if 'Strict Host Key Checking' property is applied (set to true) then uses the 'known_hosts' and 'known_hosts2' files from ~/.ssh directory else no host key file will be used |
| Hostname |
The fully qualified hostname or IP address of the remote system |
| Key Algorithms Allowed |
A comma-separated list of Key Algorithms allowed for SFTP connections. Leave unset to allow all. Available options are: ecdsa-sha2-nistp256, [ecdsa-sha2-nistp256-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp256-cert-v01@openssh.com), ecdsa-sha2-nistp384, [ecdsa-sha2-nistp384-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp384-cert-v01@openssh.com), ecdsa-sha2-nistp521, [ecdsa-sha2-nistp521-cert-v01@openssh.com](mailto:ecdsa-sha2-nistp521-cert-v01@openssh.com), rsa-sha2-256, [rsa-sha2-256-cert-v01@openssh.com](mailto:rsa-sha2-256-cert-v01@openssh.com), rsa-sha2-512, [rsa-sha2-512-cert-v01@openssh.com](mailto:rsa-sha2-512-cert-v01@openssh.com), [sk-ecdsa-sha2-nistp256@openssh.com](mailto:sk-ecdsa-sha2-nistp256@openssh.com), [sk-ssh-ed25519@openssh.com](mailto:sk-ssh-ed25519@openssh.com), ssh-dss, [ssh-dss-cert-v01@openssh.com](mailto:ssh-dss-cert-v01@openssh.com), ssh-ed25519, [ssh-ed25519-cert-v01@openssh.com](mailto:ssh-ed25519-cert-v01@openssh.com), ssh-rsa, [ssh-rsa-cert-v01@openssh.com](mailto:ssh-rsa-cert-v01@openssh.com) |
| Key Exchange Algorithms Allowed |
A comma-separated list of Key Exchange Algorithms allowed for SFTP connections. Leave unset to allow all. Available options are: curve25519-sha256, [curve25519-sha256@libssh.org](mailto:curve25519-sha256@libssh.org), curve448-sha512, diffie-hellman-group-exchange-sha1, diffie-hellman-group-exchange-sha256, diffie-hellman-group1-sha1, diffie-hellman-group14-sha1, diffie-hellman-group14-sha256, diffie-hellman-group15-sha512, diffie-hellman-group16-sha512, diffie-hellman-group17-sha512, diffie-hellman-group18-sha512, ecdh-sha2-nistp256, ecdh-sha2-nistp384, ecdh-sha2-nistp521, mlkem1024nistp384-sha384, mlkem768nistp256-sha256, mlkem768x25519-sha256, sntrup761x25519-sha512, [sntrup761x25519-sha512@openssh.com](mailto:sntrup761x25519-sha512@openssh.com) |
| Last Modified Time |
The lastModifiedTime to assign to the file after transferring it. If not set, the lastModifiedTime will not be changed. Format must be yyyy-MM-dd 'T'HH:mm:ssZ. You may also use expression language such as $\{file.lastModifiedTime\}. If the value is invalid, the processor will not be invalid but will fail to change lastModifiedTime of the file. |
| Message Authentication Codes Allowed |
A comma-separated list of Message Authentication Codes allowed for SFTP connections. Leave unset to allow all. Available options are: hmac-md5, hmac-md5-96, hmac-sha1, hmac-sha1-96, [hmac-sha1-etm@openssh.com](mailto:hmac-sha1-etm@openssh.com), hmac-sha2-256, [hmac-sha2-256-etm@openssh.com](mailto:hmac-sha2-256-etm@openssh.com), hmac-sha2-512, [hmac-sha2-512-etm@openssh.com](mailto:hmac-sha2-512-etm@openssh.com) |
| Password |
Password for the user account |
| Permissions |
The permissions to assign to the file after transferring it. Format must be either UNIX rwxrwxrwx with a - in place of denied permissions (e.g. rw-r–r–) or an octal number (e.g. 644). If not set, the permissions will not be changed. You may also use expression language such as $\{file.permissions\}. If the value is invalid, the processor will not be invalid but will fail to change permissions of the file. |
| Port |
The port that the remote system is listening on for file transfers |
| Private Key Passphrase |
Password for the private key |
| Private Key Path |
The fully qualified path to the Private Key file |
| Reject Zero-Byte Files |
Determines whether or not Zero-byte files should be rejected without attempting to transfer |
| Remote Group |
Integer value representing the Group ID to set on the file after transferring it. If not set, the group will not be set. You may also use expression language such as $\{file.group\}. If the value is invalid, the processor will not be invalid but will fail to change the group of the file. |
| Remote Owner |
Integer value representing the User ID to set on the file after transferring it. If not set, the owner will not be set. You may also use expression language such as $\{file.owner\}. If the value is invalid, the processor will not be invalid but will fail to change the owner of the file. |
| Remote Path |
The path on the remote system from which to pull or push files |
| Send Keep Alive On Timeout |
Send a Keep Alive message every 5 seconds up to 5 times for an overall timeout of 25 seconds. |
| Strict Host Key Checking |
Indicates whether or not strict enforcement of hosts keys should be applied |
| Temporary Filename |
If set, the filename of the sent file will be equal to the value specified during the transfer and after successful completion will be renamed to the original filename. If this value is set, the Dot Rename property is ignored. |
| Use Compression |
Indicates whether or not ZLIB compression should be used when transferring files |
| Username |
Username |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
FlowFiles that failed to send to the remote system; failure is usually looped back to this processor |
| reject |
FlowFiles that were rejected by the destination system |
| success |
FlowFiles that are successfully sent will be routed to success |
## See also
- [org.apache.nifi.processors.standard.GetSFTP](/user-guide/data-integration/openflow/processors/getsftp)
---
title: PutSmbFile 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsmbfile.md
section: Loading & Unloading Data
---
# PutSmbFile 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-smb-nar
## Description
Writes the contents of a FlowFile to a samba network location. Use this processor instead of a cifs mounts if share access control is important. Configure the Hostname, Share and Directory accordingly: \[Hostname][Share][pathtoDirectory]
## Tags
samba, smb, cifs, files, put
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Batch Size |
The maximum number of files to put in each iteration |
| Conflict Resolution Strategy |
Indicates what should happen when a file with the same name already exists in the output directory |
| Create Missing Directories |
If true, then missing destination directories will be created. If false, flowfiles are penalized and sent to failure. |
| Directory |
The network folder to which files should be written. This is the remaining relative path after the share: \hostnameshare[dir1dir2]. You may use expression language. |
| Domain |
The domain used for authentication. Optional, in most cases username and password is sufficient. |
| Hostname |
The network host to which files should be written. |
| Password |
The password used for authentication. Required if Username is set. |
| Share |
The network share to which files should be written. This is the "first folder"after the hostname: \hostname[share]dir1dir2 |
| Share Access Strategy |
Indicates which shared access are granted on the file during the write. None is the most restrictive, but the safest setting to prevent corruption. |
| Temporary Suffix |
A temporary suffix which will be apended to the filename while it's transfering. After the transfer is complete, the suffix will be removed. |
| Username |
The username used for authentication. If no username is set then anonymous authentication is attempted. |
| enable-dfs |
Enables accessing Distributed File System (DFS) and following DFS links during SMB operations. |
| smb-dialect |
The SMB dialect is negotiated between the client and the server by default to the highest common version supported by both end. In some rare cases, the client-server communication may fail with the automatically negotiated dialect. This property can be used to set the dialect explicitly (e.g. to downgrade to a lower version), when those situations would occur. |
| timeout |
Timeout for read and write operations. |
| use-encryption |
Turns on/off encrypted communication between the client and the server. The property's behavior is SMB dialect dependent: SMB 2.x does not support encryption and the property has no effect. In case of SMB 3.x, it is a hint/request to the server to turn encryption on if the server also supports it. |
## Relationships
| Name |
Description |
| failure |
Files that could not be written to the output network path for some reason are transferred to this relationship |
| success |
Files that have been successfully written to the output network path are transferred to this relationship |
## See also
- [org.apache.nifi.processors.smb.FetchSmb](/user-guide/data-integration/openflow/processors/fetchsmb)
- [org.apache.nifi.processors.smb.GetSmbFile](/user-guide/data-integration/openflow/processors/getsmbfile)
- [org.apache.nifi.processors.smb.ListSmb](/user-guide/data-integration/openflow/processors/listsmb)
---
title: PutSnowflakeInternalStageFile 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsnowflakeinternalstagefile.md
section: Loading & Unloading Data
---
# PutSnowflakeInternalStageFile 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-snowflake-processors-nar
## Description
Puts files into a Snowflake internal stage. The internal stage must be created in the Snowflake account beforehand.
## Tags
connection, database, jdbc, openflow, snowflake, snowpipe
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Compression Enabled |
Set true to compress data before uploading the file |
| Database |
The database to use by default. The same as passing 'db=DATABASE_NAME' to the connection string. |
| File Name |
Destination file name to use. |
| File Prefix |
Path prefix under which the data should be uploaded on the stage. |
| Internal Stage Type |
The type of internal stage to use |
| Schema |
The schema to use by default. The same as passing 'schema=SCHEMA' to the connection string. |
| Snowflake Connection Service |
Database Connection Service for accessing Snowflake |
| Stage |
The name of the internal stage in the Snowflake account to put files into. |
| Table |
The name of the table in the Snowflake account. |
## Relationships
| Name |
Description |
| failure |
For FlowFiles of failed PUT operation |
| success |
For FlowFiles of successful PUT operation |
## Writes attributes
| Name |
Description |
| snowflake.staged.file.path |
Staged file path |
---
title: PutSnowpipeStreaming 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsnowpipestreaming.md
section: Loading & Unloading Data
---
# PutSnowpipeStreaming 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-snowpipe-processors-nar
## Description
Streams records into a Snowflake table. The table must be created in the Snowflake account beforehand.
## Tags
connection, database, jdbc, openflow, snowflake, snowpipe streaming
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Account |
Snowflake Account Identifier with Organization Name and Account Name formatted as [organization-name]-[account-name] |
| Authentication Strategy |
Strategy for authenticating Snowflake connections |
| Client Lag |
The maximum amount of time that the client will wait before flushing records to Snowflake. A larger value can increase latency while sending to Snowflake, but for tables that are not constantly updated it can result in queries that are faster and more cost efficient. |
| Concurrency Group |
Allows specifying a 'Concurrency Group' that a given FlowFile belongs to, so that the number of Concurrent Tasks that write to tables in a given group can be limited. |
| Connection Strategy |
Strategy for connecting to Snowflake Snowpipe Streaming services |
| Database |
Snowflake Database destination for processed records |
| Delivery Guarantee |
Specifies the delivery guarantee for the records being sent to Snowflake. |
| Iceberg Enabled |
Specifies whether the processor ingests data into an Iceberg table. The processor fails if this property doesn’t match the actual table type. |
| Max Batch Size |
Maximum number of records to ingest in a single call. Multiple ingest calls will be made if the number of records exceeds the max batch size. Current guidance recommends batch sizes less than 16MB. The Max Batch Size can be tuned based on the average record size such that batches are generally less than 16MB. |
| Max Tasks Per Group |
The maximum number of channels to create for a given Snowpipe Channel Prefix. This allows limiting the number of concurrent tasks that can be writing to a given Snowflake table. |
| Private Key Service |
RSA Private Key Service for authenticating connections |
| Record Offset |
The Expression Language expression to use to determine the offset of the first record in a FlowFile. |
| Record Offset Record Path |
The Record Path expression to use to determine the offset of the first record in a FlowFile. |
| Record Offset Strategy |
Specifies the strategy for determining the offset of each record. |
| Record Reader |
The Record Reader to use for reading the input |
| Role |
Snowflake Role the user will assume when authenticating connections |
| Schema |
Snowflake Schema destination for processed records |
| Snowpipe Channel Index |
The index to use for the Snowpipe channel name. The full channel name will be constructed as openflow.[prefix].[index]. This is necessary in order to provide Exactly Once delivery to Snowflake, as any retry must be tried against the same channel as was previously used. |
| Snowpipe Channel Prefix |
The prefix to use for the Snowpipe channel name. The full channel name will be constructed as openflow.[prefix].[index]. The default value is $\{hostname(false)\}, which ensures that each NiFi node in the cluster writes to a unique channel by incorporating the hostname of the NiFi instance into the channel name. |
| Table |
Snowflake Table destination for processed records |
| User |
Snowflake User for authenticating connections |
## Relationships
| Name |
Description |
| failure |
For FlowFiles that failed to upload to Snowflake |
| success |
For FlowFiles successfully uploaded to Snowflake |
## Use cases
| Write record-oriented data to a Snowflake table as fast as possible, accepting the possible of occasional duplicates. |
| --------------------------------------------------------------------------------------------------------------------- |
---
title: PutSnowpipeStreaming2 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsnowpipestreaming2.md
section: Loading & Unloading Data
---
# PutSnowpipeStreaming2 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-snowpipe-streaming-2-processors-nar
## Description
Send Records formatted as Newline Delimited JSON to Snowflake Database Pipes using Snowpipe Streaming Version 2.
## Tags
NDJSON, Preview, Snowflake, Snowpipe Streaming
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Account |
Snowflake Account Identifier with Organization Name and Account Name formatted as [organization-name]-[account-name] |
| Authentication Strategy |
Strategy for authenticating Snowflake connections |
| Channel Group |
Group for managing distinct Snowpipe Streaming Channels with partitioning |
| Channel Insert Timeout |
Maximum duration to retry inserting records before failing with an upper bound of 5 minutes |
| Database |
Snowflake Database destination for processed records |
| File Fragment Count |
Maximum number of File Fragments sent to object storage for Snowpipe Streaming ingestion from input FlowFiles. Must be between 1 and 100. |
| File Fragment Size |
Maximum size in bytes for each File Fragment sent to object storage for Snowpipe Streaming ingestion. Must be between 1 KB and 256 MB |
| Offset Token End Expression |
Expression Language definition to produce the highest offset token for a FlowFile as a monotonically increasing number |
| Offset Token Record Pointer |
JSON Pointer to offset token in each record required when the last committed offset token is between start and end boundaries |
| Offset Token Start Expression |
Expression Language definition to produce the lowest offset token for a FlowFile as a monotonically increasing number |
| Offset Tracking Timeout |
Maximum duration to poll channel status for committed offset tokens |
| Pipe |
Snowflake Pipe destination for processed records |
| Private Key Service |
RSA Private Key Service for authenticating connections |
| Schema |
Snowflake Schema destination for processed records |
| Transfer Strategy |
Strategy for transferring records to Snowpipe Streaming |
| User |
Snowflake User for authenticating connections |
| Web Client Service Provider |
Web Client Service Provider supporting HTTP request and response handling |
## Relationships
| Name |
Description |
| failure |
FlowFiles that failed to upload to Snowflake |
| invalid |
FlowFiles that Snowflake identified as containing one or more invalid rows resulting in partial transmission |
| success |
FlowFiles successfully uploaded to Snowflake |
---
title: PutSNS 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsns.md
section: Loading & Unloading Data
---
# PutSNS 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-aws-nar
## Description
Sends the content of a FlowFile as a notification to the Amazon Simple Notification Service
## Tags
amazon, aws, publish, pubsub, put, sns, topic
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| ARN Type |
The type of Amazon Resource Name that is being used. |
| AWS Credentials Provider service |
The Controller Service that is used to obtain AWS credentials provider |
| Amazon Resource Name (ARN) |
The name of the resource to which notifications should be published |
| Character Set |
The character set in which the FlowFile's content is encoded |
| Communications Timeout |
|
| Deduplication Message ID |
The token used for deduplication of sent messages |
| E-mail Subject |
The optional subject to use for any subscribers that are subscribed via E-mail |
| Endpoint Override URL |
Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. |
| Message Group ID |
If using FIFO, the message group to which the flowFile belongs |
| Region |
|
| SSL Context Service |
Specifies an optional SSL Context Service that, if provided, will be used to create connections |
| Use JSON Structure |
If true, the contents of the FlowFile must be JSON with a top-level element named 'default'. Additional elements can be used to send different messages to different protocols. See the Amazon SNS Documentation for more information. |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
FlowFiles are routed to failure relationship |
| success |
FlowFiles are routed to success relationship |
## See also
- [org.apache.nifi.processors.aws.sqs.GetSQS](/user-guide/data-integration/openflow/processors/getsqs)
- [org.apache.nifi.processors.aws.sqs.PutSQS](/user-guide/data-integration/openflow/processors/putsqs)
---
title: PutSplunk 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsplunk.md
section: Loading & Unloading Data
---
# PutSplunk 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-splunk-nar
## Description
Sends logs to Splunk Enterprise over TCP, TCP + TLS/SSL, or UDP. If a Message Delimiter is provided, then this processor will read messages from the incoming FlowFile based on the delimiter, and send each message to Splunk. If a Message Delimiter is not provided then the content of the FlowFile will be sent directly to Splunk as if it were a single message.
## Tags
logs, splunk, tcp, udp
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Character Set |
Specifies the character set of the data being sent. |
| Hostname |
Destination hostname or IP address |
| Idle Connection Expiration |
The amount of time a connection should be held open without being used before closing the connection. A value of 0 seconds will disable this feature. |
| Max Size of Socket Send Buffer |
The maximum size of the socket send buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped. |
| Message Delimiter |
Specifies the delimiter to use for splitting apart multiple messages within a single FlowFile. If not specified, the entire content of the FlowFile will be used as a single message. If specified, the contents of the FlowFile will be split on this delimiter and each section sent as a separate message. Note that if messages are delimited and some messages for a given FlowFile are transferred successfully while others are not, the messages will be split into individual FlowFiles, such that those messages that were successfully sent are routed to the 'success' relationship while other messages are sent to the 'failure' relationship. |
| Port |
Destination port number |
| Protocol |
The protocol for communication. |
| SSL Context Service |
Specifies the SSL Context Service to enable TLS socket communication |
| Timeout |
The timeout for connecting to and communicating with the destination. Does not apply to UDP |
## Relationships
| Name |
Description |
| failure |
FlowFiles that failed to send to the destination are sent out this relationship. |
| success |
FlowFiles that are sent successfully to the destination are sent out this relationship. |
---
title: PutSplunkHTTP 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsplunkhttp.md
section: Loading & Unloading Data
---
# PutSplunkHTTP 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-splunk-nar
## Description
Sends flow file content to the specified Splunk server over HTTP or HTTPS. Supports HEC Index Acknowledgement.
## Tags
http, logs, splunk
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Hostname |
The ip address or hostname of the Splunk server. |
| Owner |
The owner to pass to Splunk. |
| Password |
The password to authenticate to Splunk. |
| Port |
The HTTP Event Collector HTTP Port Number. |
| Scheme |
The scheme for connecting to Splunk. |
| Security Protocol |
The security protocol to use for communicating with Splunk. |
| Token |
HTTP Event Collector token starting with the string Splunk. For example 'Splunk 1234578-abcd-1234-abcd-1234abcd' |
| Username |
The username to authenticate to Splunk. |
| character-set |
The name of the character set. |
| content-type |
The media type of the event sent to Splunk. If not set, "mime.type" flow file attribute will be used. In case of neither of them is specified, this information will not be sent to the server. |
| host |
Specify with the host query string parameter. Sets a default for all events when unspecified. |
| index |
Index name. Specify with the index query string parameter. Sets a default for all events when unspecified. |
| request-channel |
Identifier of the used request channel. |
| source |
User-defined event source. Sets a default for all events when unspecified. |
| source-type |
User-defined event sourcetype. Sets a default for all events when unspecified. |
## Relationships
| Name |
Description |
| failure |
FlowFiles that failed to send to the destination are sent to this relationship. |
| success |
FlowFiles that are sent successfully to the destination are sent to this relationship. |
## Writes attributes
| Name |
Description |
| splunk.acknowledgement.id |
The indexing acknowledgement id provided by Splunk. |
| splunk.responded.at |
The time of the response of put request for Splunk. |
## See also
- [org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus](/user-guide/data-integration/openflow/processors/querysplunkindexingstatus)
---
title: PutSQL 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsql.md
section: Loading & Unloading Data
---
# PutSQL 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Executes a SQL UPDATE or INSERT command. The content of an incoming FlowFile is expected to be the SQL command to execute. The SQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args. N.type and sql.args. N.value, where N is a positive integer. The sql.args. N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format.
## Tags
database, insert, put, rdbms, relational, sql, update
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Batch Size |
The preferred number of FlowFiles to put to the database in a single transaction |
| JDBC Connection Pool |
Specifies the JDBC Connection Pool to use in order to convert the JSON message to a SQL statement. The Connection Pool is necessary in order to determine the appropriate database column types. |
| Obtain Generated Keys |
If true, any key that is automatically generated by the database will be added to the FlowFile that generated it using the sql.generate.key attribute. This may result in slightly slower performance and is not supported by all databases. |
| Rollback On Failure |
Specify how to handle error. By default (false), if an error occurs while processing a FlowFile, the FlowFile will be routed to 'failure' or 'retry' relationship based on error type, and processor can continue with next FlowFile. Instead, you may want to rollback currently processed FlowFiles and stop further processing immediately. In that case, you can do so by enabling this 'Rollback On Failure' property. If enabled, failed FlowFiles will stay in the input relationship without penalizing it and being processed repeatedly until it gets processed successfully or removed by other means. It is important to set adequate 'Yield Duration' to avoid retrying too frequently. |
| Support Fragmented Transactions |
If true, when a FlowFile is consumed by this Processor, the Processor will first check the fragment.identifier and fragment.count attributes of that FlowFile. If the fragment.count value is greater than 1, the Processor will not process any FlowFile with that fragment.identifier until all are available; at that point, it will process all FlowFiles with that fragment.identifier as a single transaction, in the order specified by the FlowFiles 'fragment.index attributes. This Provides atomicity of those SQL statements. Once any statement of this transaction throws exception when executing, this transaction will be rolled back. When transaction rollback happened, none of these FlowFiles would be routed to'success '. If the <Rollback On Failure> is set true, these FlowFiles will stay in the input relationship. When the <Rollback On Failure> is set false,, if any of these FlowFiles will be routed to' retry ', all of these FlowFiles will be routed to' retry '.Otherwise, they will be routed to' failure'. If this value is false, these attributes will be ignored and the updates will occur independent of one another. |
| Transaction Timeout |
If the <Support Fragmented Transactions> property is set to true, specifies how long to wait for all FlowFiles for a particular fragment.identifier attribute to arrive before just transferring all of the FlowFiles with that identifier to the 'failure' relationship |
| database-session-autocommit |
The autocommit mode to set on the database connection being used. If set to false, the operation(s) will be explicitly committed or rolled back (based on success or failure respectively), if set to true the driver/database handles the commit/rollback. |
| putsql-sql-statement |
The SQL statement to execute. The statement can be empty, a constant value, or built from attributes using Expression Language. If this property is specified, it will be used regardless of the content of incoming FlowFiles. If this property is empty, the content of the incoming FlowFile is expected to contain a valid SQL statement, to be issued by the processor to the database. |
## Relationships
| Name |
Description |
| failure |
A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, such as an invalid query or an integrity constraint violation |
| retry |
A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed |
| success |
A FlowFile is routed to this relationship after the database is successfully updated |
## Writes attributes
| Name |
Description |
| sql.generated.key |
If the database generated a key for an INSERT statement and the Obtain Generated Keys property is set to true, this attribute will be added to indicate the generated key, if possible. This feature is not supported by all database vendors. |
---
title: PutSQS 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsqs.md
section: Loading & Unloading Data
---
# PutSQS 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-aws-nar
## Description
Publishes a message to an Amazon Simple Queuing Service Queue
## Tags
AWS, Amazon, Publish, Put, Queue, SQS
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| AWS Credentials Provider service |
The Controller Service that is used to obtain AWS credentials provider |
| Communications Timeout |
|
| Deduplication Message ID |
The token used for deduplication of sent messages |
| Delay |
The amount of time to delay the message before it becomes available to consumers |
| Endpoint Override URL |
Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints. |
| Message Group ID |
If using FIFO, the message group to which the FlowFile belongs |
| Queue URL |
The URL of the queue to act upon |
| Region |
|
| SSL Context Service |
Specifies an optional SSL Context Service that, if provided, will be used to create connections |
| proxy-configuration-service |
Specifies the Proxy Configuration Controller Service to proxy network requests. |
## Relationships
| Name |
Description |
| failure |
FlowFiles are routed to failure relationship |
| success |
FlowFiles are routed to success relationship |
## See also
- [org.apache.nifi.processors.aws.sqs.DeleteSQS](/user-guide/data-integration/openflow/processors/deletesqs)
- [org.apache.nifi.processors.aws.sqs.GetSQS](/user-guide/data-integration/openflow/processors/getsqs)
---
title: PutSyslog 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putsyslog.md
section: Loading & Unloading Data
---
# PutSyslog 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Sends Syslog messages to a given host and port over TCP or UDP. Messages are constructed from the "Message ___" properties of the processor which can use expression language to generate messages from incoming FlowFiles. The properties are used to construct messages of the form: (<PRIORITY>)(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is optional. The constructed messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The timestamp can be an RFC5424 timestamp with a format of "yyyy-MM-dd 'T'HH:mm:ss. S 'Z'" or "yyyy-MM-dd 'T'HH:mm:ss. S+hh:mm", or it can be an RFC3164 timestamp with a format of "MMM d HH:mm:ss". If a message is constructed that does not form a valid Syslog message according to the above description, then it is routed to the invalid relationship. Valid messages are sent to the Syslog server and successes are routed to the success relationship, failures routed to the failure relationship.
## Tags
logs, put, syslog, tcp, udp
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Batch Size |
The number of incoming FlowFiles to process in a single execution of this processor. |
| Character Set |
Specifies the character set of the Syslog messages. Note that Expression language is not evaluated per FlowFile. |
| Hostname |
The IP address or hostname of the Syslog server. |
| Idle Connection Expiration |
The amount of time a connection should be held open without being used before closing the connection. |
| Max Size of Socket Send Buffer |
The maximum size of the socket send buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped. |
| Message Body |
The body for the Syslog messages. |
| Message Hostname |
The hostname for the Syslog messages. |
| Message Priority |
The priority for the Syslog messages, excluding < >. |
| Message Timestamp |
The timestamp for the Syslog messages. The timestamp can be an RFC5424 timestamp with a format of "yyyy-MM-dd 'T'HH:mm:ss. S 'Z'" or "yyyy-MM-dd 'T'HH:mm:ss. S+hh:mm", " or it can be an RFC3164 timestamp with a format of "MMM d HH:mm:ss". |
| Message Version |
The version for the Syslog messages. |
| Port |
The port for Syslog communication. Note that Expression language is not evaluated per FlowFile. |
| Protocol |
The protocol for Syslog communication. |
| SSL Context Service |
The Controller Service to use in order to obtain an SSL Context. If this property is set, syslog messages will be sent over a secure connection. |
| Timeout |
The timeout for connecting to and communicating with the syslog server. Does not apply to UDP. Note that Expression language is not evaluated per FlowFile. |
## Relationships
| Name |
Description |
| failure |
FlowFiles that failed to send to Syslog are sent out this relationship. |
| invalid |
FlowFiles that do not form a valid Syslog message are sent out this relationship. |
| success |
FlowFiles that are sent successfully to Syslog are sent out this relationship. |
## See also
- [org.apache.nifi.processors.standard.ListenSyslog](/user-guide/data-integration/openflow/processors/listensyslog)
- [org.apache.nifi.processors.standard.ParseSyslog](/user-guide/data-integration/openflow/processors/parsesyslog)
---
title: PutTCP 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/puttcp.md
section: Loading & Unloading Data
---
# PutTCP 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Sends serialized FlowFiles or Records over TCP to a configurable destination with optional support for TLS
## Tags
egress, put, remote, tcp
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Character Set |
Specifies the character set of the data being sent. |
| Connection Per FlowFile |
Specifies whether to send each FlowFile's content on an individual connection. |
| Hostname |
Destination hostname or IP address |
| Idle Connection Expiration |
The amount of time a connection should be held open without being used before closing the connection. A value of 0 seconds will disable this feature. |
| Max Size of Socket Send Buffer |
The maximum size of the socket send buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped. |
| Outgoing Message Delimiter |
Specifies the delimiter to use when sending messages out over the same TCP stream. The delimiter is appended to each FlowFile message that is transmitted over the stream so that the receiver can determine when one message ends and the next message begins. Users should ensure that the FlowFile content does not contain the delimiter character to avoid errors. In order to use a new line character you can enter 'n'. For a tab character use 't'. Finally for a carriage return use 'r'. |
| Port |
Destination port number |
| Record Reader |
Specifies the Controller Service to use for reading Records from input FlowFiles |
| Record Writer |
Specifies the Controller Service to use for writing Records to the configured socket address |
| SSL Context Service |
Specifies the SSL Context Service to enable TLS socket communication |
| Timeout |
The timeout for connecting to and communicating with the destination. Does not apply to UDP |
| Transmission Strategy |
Specifies the strategy used for reading input FlowFiles and transmitting messages to the destination socket address |
## Relationships
| Name |
Description |
| failure |
FlowFiles that failed to send to the destination are sent out this relationship. |
| success |
FlowFiles that are sent successfully to the destination are sent out this relationship. |
## Writes attributes
| Name |
Description |
| record.count.transmitted |
Count of records transmitted to configured destination address |
## See also
- [org.apache.nifi.processors.standard.ListenTCP](/user-guide/data-integration/openflow/processors/listentcp)
- [org.apache.nifi.processors.standard.PutUDP](/user-guide/data-integration/openflow/processors/putudp)
---
title: PutUDP 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putudp.md
section: Loading & Unloading Data
---
# PutUDP 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
The PutUDP processor receives a FlowFile and packages the FlowFile content into a single UDP datagram packet which is then transmitted to the configured UDP server. The user must ensure that the FlowFile content being fed to this processor is not larger than the maximum size for the underlying UDP transport. The maximum transport size will vary based on the platform setup but is generally just under 64KB. FlowFiles will be marked as failed if their content is larger than the maximum transport size.
## Tags
egress, put, remote, udp
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Hostname |
Destination hostname or IP address |
| Idle Connection Expiration |
The amount of time a connection should be held open without being used before closing the connection. A value of 0 seconds will disable this feature. |
| Max Size of Socket Send Buffer |
The maximum size of the socket send buffer that should be used. This is a suggestion to the Operating System to indicate how big the socket buffer should be. If this value is set too low, the buffer may fill up before the data can be read, and incoming data will be dropped. |
| Port |
Destination port number |
| Timeout |
The timeout for connecting to and communicating with the destination. Does not apply to UDP |
## Relationships
| Name |
Description |
| failure |
FlowFiles that failed to send to the destination are sent out this relationship. |
| success |
FlowFiles that are sent successfully to the destination are sent out this relationship. |
## See also
- [org.apache.nifi.processors.standard.ListenUDP](/user-guide/data-integration/openflow/processors/listenudp)
- [org.apache.nifi.processors.standard.PutTCP](/user-guide/data-integration/openflow/processors/puttcp)
---
title: PutUnityCatalogFile 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putunitycatalogfile.md
section: Loading & Unloading Data
---
# PutUnityCatalogFile 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-databricks-processors-nar
## Description
Write FlowFile content with max size of 5 GiB to Unity Catalog.
## Tags
databricks, openflow, unity catalog
## Input Requirement
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Databricks Client |
Databricks Client Service. |
| Unity Catalog File Path |
Unity Catalog file path e.g. /Volumes/catalog/schema/volume_name/file.txt |
## Relationships
| Name |
Description |
| failure |
Databricks failure relationship |
| success |
Databricks success relationship |
## Writes attributes
| Name |
Description |
| error.code |
The error code for the SQL statement if an error occurred. |
| error.message |
The error message for the SQL statement if an error occurred. |
---
title: PutVectaraDocument 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putvectaradocument.md
section: Loading & Unloading Data
---
# PutVectaraDocument 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-vectara-processors-nar
## Description
Generate and upload a JSON document to Vectara's upload endpoint. The input text can be JSON Object, JSON Array, or JSONL format.
## Tags
ai, llm, openflow, rag, vectara
## Input Requirement
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Corpus ID |
Identifier of the Vectara corpus |
| Document Attributes |
A comma delimited list of NiFi attributes fields, which if present will be included in the document metadata. |
| Document Author |
Author of the document |
| Document Creation Time |
Timestamp in epoch seconds when the document was created |
| Document Date |
Date of document creation |
| Document Description |
Description of the document |
| Document ID |
A unique identifier for the document constructed either from the source path of the document or a hash of the document's content. |
| Document Source URL |
Source URL for document |
| Document Title |
Document Title |
| Index Input Format |
Input format for indexing service. JSON Object: Load FlowFile content directly as JSON payload. JSON Lines: Create a new section for each line of JSON. JSON Array: Load FlowFile content as a JSON array and create a new section for each element in the JSON array. |
| Section Custom Dimensions |
A comma delimited list of metadata fields, which if present in the metadata path will be included as a section's custom dimension. The values for custom dimensions must be valid numbers. |
| Section Filter Attributes |
A comma delimited list of metadata fields, which if present in the metadata path will be included as a section metadata filter. |
| Section ID Attribute |
The field for setting section id, which is populated if present in the metadata path. |
| Section Metadata Attributes |
A comma delimited list of metadata fields, which if present in the metadata path will be included will be included in the section metadata. |
| Section Metadata JSON Path |
A JSON Path expression to a metadata JSON Object. The JSON Object needs to contain the list of metadata fields. These fields will be included in Section metadata. |
| Section Text JSON Path |
A JSON Path expression to the text field. |
| Section Title Attribute |
The field for setting the section title, which is populated if present in the metadata path. |
| Vectara Client |
Vectara Client Service. |
## Relationships
| Name |
Description |
| failure |
Vectara failure relationship |
| original |
Original relationship |
| success |
Vectara success relationship |
## Use Cases Involving Other Components
| Publish a PDF file to a Vectara corpus. |
| --------------------------------------- |
## See also
- [com.snowflake.openflow.runtime.processors.vectara.PutVectaraFile](/user-guide/data-integration/openflow/processors/putvectarafile)
---
title: PutVectaraFile 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putvectarafile.md
section: Loading & Unloading Data
---
# PutVectaraFile 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-vectara-processors-nar
## Description
Upload a FlowFile content to Vectara's index endpoint. Document filter attributes and metadata attributes can be set by referencing FlowFile attributes.
## Tags
ai, llm, openflow, rag, vectara
## Input Requirement
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Corpus ID |
Identifier of the Vectara corpus |
| Document Filter Attributes |
A comma delimited list of metadata fields, which if present in the FlowFile attributes will be included in as a document metadata filter. |
| Document ID |
A unique identifier for the document constructed either from the source path of the document or a hash of the document's content. |
| Document Metadata Attributes |
A comma delimited list of metadata fields, which if present in the FlowFile attributes will be included will be included in the document metadata. |
| Vectara Client |
Vectara Client Service. |
## Relationships
| Name |
Description |
| failure |
Vectara failure relationship |
| original |
Original relationship |
| success |
Vectara success relationship |
## See also
- [com.snowflake.openflow.runtime.processors.vectara.PutVectaraDocument](/user-guide/data-integration/openflow/processors/putvectaradocument)
---
title: PutWebSocket 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putwebsocket.md
section: Loading & Unloading Data
---
# PutWebSocket 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-websocket-processors-nar
## Description
Sends messages to a WebSocket remote endpoint using a WebSocket session that is established by either ListenWebSocket or ConnectWebSocket.
## Tags
WebSocket, publish, send
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| websocket-controller-service-id |
A NiFi Expression to retrieve the id of a WebSocket ControllerService. |
| websocket-endpoint-id |
A NiFi Expression to retrieve the endpoint id of a WebSocket ControllerService. |
| websocket-message-type |
The type of message content: TEXT or BINARY |
| websocket-session-id |
A NiFi Expression to retrieve the session id. If not specified, a message will be sent to all connected WebSocket peers for the WebSocket controller service endpoint. |
## Relationships
| Name |
Description |
| failure |
FlowFiles that failed to send to the destination are transferred to this relationship. |
| success |
FlowFiles that are sent successfully to the destination are transferred to this relationship. |
## Writes attributes
| Name |
Description |
| websocket.controller.service.id |
WebSocket Controller Service id. |
| websocket.session.id |
Established WebSocket session id. |
| websocket.endpoint.id |
WebSocket endpoint id. |
| websocket.message.type |
TEXT or BINARY. |
| websocket.local.address |
WebSocket server address. |
| websocket.remote.address |
WebSocket client address. |
| websocket.failure.detail |
Detail of the failure. |
---
title: PutZendeskTicket 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/putzendeskticket.md
section: Loading & Unloading Data
---
# PutZendeskTicket 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-zendesk-nar
## Description
Create Zendesk tickets using the Zendesk API.
## Tags
zendesk, ticket
## Input Requirement
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| web-client-service-provider |
Controller service for HTTP client operations. |
| zendesk-authentication-type-name |
Type of authentication to Zendesk API. |
| zendesk-authentication-value-name |
Password or authentication token for Zendesk login user. |
| zendesk-comment-body |
The content or the path to the comment body in the incoming record. |
| zendesk-priority |
The content or the path to the priority in the incoming record. |
| zendesk-record-reader |
Specifies the Controller Service to use for parsing incoming data and determining the data's schema. |
| zendesk-subdomain |
Name of the Zendesk subdomain. |
| zendesk-subject |
The content or the path to the subject in the incoming record. |
| zendesk-type |
The content or the path to the type in the incoming record. |
| zendesk-user |
Login user to Zendesk subdomain. |
## Relationships
| Name |
Description |
| failure |
A FlowFile is routed to this relationship if the operation failed and retrying the operation will also fail, such as an invalid data or schema. |
| success |
For FlowFiles created as a result of a successful HTTP request. |
## Writes attributes
| Name |
Description |
| record.count |
The number of records processed. |
| error.code |
The error code of from the response. |
| error.message |
The error message of from the response. |
---
title: QueryAzureDataExplorer 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/queryazuredataexplorer.md
section: Loading & Unloading Data
---
# QueryAzureDataExplorer 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-azure-nar
## Description
Query Azure Data Explorer and stream JSON results to output FlowFiles
## Tags
ADX, Azure, Data, Explorer, Kusto
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Database Name |
Azure Data Explorer Database Name for querying |
| Kusto Query Service |
Azure Data Explorer Kusto Query Service |
| Query |
Query to be run against Azure Data Explorer |
## Relationships
| Name |
Description |
| failure |
FlowFiles containing original input associated with a failed Query |
| success |
FlowFiles containing results of a successful Query |
## Writes attributes
| Name |
Description |
| query.error.message |
Azure Data Explorer query error message on failures |
| query.executed |
Azure Data Explorer query executed |
| mime.type |
Content Type set to application/json |
---
title: QueryDatabaseTable 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/querydatabasetable.md
section: Loading & Unloading Data
---
# QueryDatabaseTable 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. Query result will be converted to Avro format. Expression Language is supported for several properties, but no incoming connections are permitted. The Environment/System properties may be used to provide values for any property containing Expression Language. If it is desired to leverage flow file attributes to perform these queries, the GenerateTableFetch and/or ExecuteSQL processors can be used for this purpose. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer or cron expression, using the standard scheduling methods. This processor is intended to be run on the Primary Node only. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected.
## Tags
database, jdbc, query, select, sql
## Input Requirement
FORBIDDEN
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Columns to Return |
A comma-separated list of column names to be used in the query. If your database requires special treatment of the names (quoting, e.g.), each name should include such treatment. If no column names are supplied, all columns in the specified table will be returned. NOTE: It is important to use consistent column names for a given table for incremental fetch to work properly. |
| Database Connection Pooling Service |
The Controller Service that is used to obtain a connection to the database. |
| Database Dialect Service |
Database Dialect Service for generating statements specific to a particular service or vendor. |
| Default Decimal Precision |
When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'precision' denoting number of available digits is required. Generally, precision is defined by column data type definition or database engines default. However undefined precision (0) can be returned from some database engines. 'Default Decimal Precision' is used when writing those undefined precision numbers. |
| Default Decimal Scale |
When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'scale' denoting number of available decimal digits is required. Generally, scale is defined by column data type definition or database engines default. However when undefined precision (0) is returned, scale can also be uncertain with some database engines. 'Default Decimal Scale' is used when writing those undefined numbers. If a value has more decimals than specified scale, then the value will be rounded-up, e.g. 1.53 becomes 2 with scale 0, and 1.5 with scale 1. |
| Fetch Size |
The number of result rows to be fetched from the result set at a time. This is a hint to the database driver and may not be honored and/or exact. If the value specified is zero, then the hint is ignored. If using PostgreSQL, then 'Set Auto Commit' must be equal to 'false' to cause 'Fetch Size' to take effect. |
| Max Wait Time |
The maximum amount of time allowed for a running SQL select query , zero means there is no limit. Max time less than 1 second will be equal to zero. |
| Maximum-value Columns |
A comma-separated list of column names. The processor will keep track of the maximum value for each column that has been returned since the processor started running. Using multiple columns implies an order to the column list, and each column 's values are expected to increase more slowly than the previous columns' values. Thus, using multiple columns implies a hierarchical structure of columns, which is usually used for partitioning tables. This processor can be used to retrieve only those rows that have been added/updated since the last retrieval. Note that some JDBC types such as bit/boolean are not conducive to maintaining maximum value, so columns of these types should not be listed in this property, and will result in error(s) during processing. If no columns are provided, all rows from the table will be considered, which could have a performance impact. NOTE: It is important to use consistent max-value column names for a given table for incremental fetch to work properly. |
| Normalize Table and Column Names |
Whether to change non-Avro-compatible characters in column names to Avro-compatible characters. For example, colons and periods will be changed to underscores in order to build a valid Avro record. |
| Set Auto Commit |
Allows enabling or disabling the auto commit functionality of the DB connection. Default value is 'No value set'. 'No value set' will leave the db connection 's auto commit mode unchanged. For some JDBC drivers such as PostgreSQL driver, it is required to disable the auto commit functionality to get the'Fetch Size 'setting to take effect. When auto commit is enabled, PostgreSQL driver ignores'Fetch Size'setting and loads all rows of the result set to memory at once. This could lead for a large amount of memory usage when executing queries which fetch large data sets. More Details of this behaviour in PostgreSQL driver can be found in [https://jdbc.postgresql.org//documentation/head/query.html](https://jdbc.postgresql.org//documentation/head/query.html). |
| Table Name |
The name of the database table to be queried. When a custom query is used, this property is used to alias the query and appears as an attribute on the FlowFile. |
| Use Avro Logical Types |
Whether to use Avro Logical Types for DECIMAL/NUMBER, DATE, TIME and TIMESTAMP columns. If disabled, written as string. If enabled, Logical types are used and written as its underlying type, specifically, DECIMAL/NUMBER as logical 'decimal': written as bytes with additional precision and scale meta data, DATE as logical 'date-millis': written as int denoting days since Unix epoch (1970-01-01), TIME as logical 'time-millis': written as int denoting milliseconds since Unix epoch, and TIMESTAMP as logical 'timestamp-millis': written as long denoting milliseconds since Unix epoch. If a reader of written Avro records also knows these logical types, then these values can be deserialized with more context depending on reader implementation. |
| db-fetch-db-type |
Database Type for generating statements specific to a particular service or vendor. The Generic Type supports most cases but selecting a specific type enables optimal processing or additional features. |
| db-fetch-sql-query |
A custom SQL query used to retrieve data. Instead of building a SQL query from other properties, this query will be wrapped as a sub-query. Query must have no ORDER BY statement. |
| db-fetch-where-clause |
A custom clause to be added in the WHERE condition when building SQL queries. |
| initial-load-strategy |
How to handle existing rows in the database table when the processor is started for the first time (or its state has been cleared). The property will be ignored, if any 'initial.maxvalue.*' dynamic property has also been configured. |
| qdbt-max-frags |
The maximum number of fragments. If the value specified is zero, then all fragments are returned. This prevents OutOfMemoryError when this processor ingests huge table. NOTE: Setting this property can result in data loss, as the incoming results are not ordered, and fragments may end at arbitrary boundaries where rows are not included in the result set. |
| qdbt-max-rows |
The maximum number of result rows that will be included in a single FlowFile. This will allow you to break up very large result sets into multiple FlowFiles. If the value specified is zero, then all rows are returned in a single FlowFile. |
| qdbt-output-batch-size |
The number of output FlowFiles to queue before committing the process session. When set to zero, the session will be committed when all result set rows have been processed and the output FlowFiles are ready for transfer to the downstream relationship. For large result sets, this can cause a large burst of FlowFiles to be transferred at the end of processor execution. If this property is set, then when the specified number of FlowFiles are ready for transfer, then the session will be committed, thus releasing the FlowFiles to the downstream relationship. NOTE: The maxvalue.* and fragment.count attributes will not be set on FlowFiles when this property is set. |
| transaction-isolation-level |
This setting will set the transaction isolation level for the database connection for drivers that support this setting |
## State management
| Scopes |
Description |
| CLUSTER |
After performing a query on the specified table, the maximum values for the specified column(s) will be retained for use in future executions of the query. This allows the Processor to fetch only those records that have max values greater than the retained values. This can be used for incremental fetching, fetching of newly added rows, etc. To clear the maximum values, clear the state of the processor per the State Management documentation |
## Relationships
| Name |
Description |
| success |
Successfully created FlowFile from SQL query result set. |
## Writes attributes
| Name |
Description |
| tablename |
Name of the table being queried |
| querydbtable.row.count |
The number of rows selected by the query |
| fragment.identifier |
If 'Max Rows Per Flow File' is set then all FlowFiles from the same query result set will have the same value for the fragment.identifier attribute. This can then be used to correlate the results. |
| fragment.count |
If 'Max Rows Per Flow File' is set then this is the total number of FlowFiles produced by a single ResultSet. This can be used in conjunction with the fragment.identifier attribute in order to know how many FlowFiles belonged to the same incoming ResultSet. If Output Batch Size is set, then this attribute will not be populated. |
| fragment.index |
If 'Max Rows Per Flow File' is set then the position of this FlowFile in the list of outgoing FlowFiles that were all derived from the same result set FlowFile. This can be used in conjunction with the fragment.identifier attribute to know which FlowFiles originated from the same query result set and in what order FlowFiles were produced |
| maxvalue.* |
Each attribute contains the observed maximum value of a specified 'Maximum-value Column'. The suffix of the attribute is the name of the column. If Output Batch Size is set, then this attribute will not be populated. |
## See also
- [org.apache.nifi.processors.standard.ExecuteSQL](/user-guide/data-integration/openflow/processors/executesql)
- [org.apache.nifi.processors.standard.GenerateTableFetch](/user-guide/data-integration/openflow/processors/generatetablefetch)
---
title: QueryDatabaseTableRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/querydatabasetablerecord.md
section: Loading & Unloading Data
---
# QueryDatabaseTableRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. Query result will be converted to the format specified by the record writer. Expression Language is supported for several properties, but no incoming connections are permitted. The Environment/System properties may be used to provide values for any property containing Expression Language. If it is desired to leverage flow file attributes to perform these queries, the GenerateTableFetch and/or ExecuteSQL processors can be used for this purpose. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer or cron expression, using the standard scheduling methods. This processor is intended to be run on the Primary Node only. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected.
## Tags
database, jdbc, query, record, select, sql
## Input Requirement
FORBIDDEN
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Columns to Return |
A comma-separated list of column names to be used in the query. If your database requires special treatment of the names (quoting, e.g.), each name should include such treatment. If no column names are supplied, all columns in the specified table will be returned. NOTE: It is important to use consistent column names for a given table for incremental fetch to work properly. |
| Database Connection Pooling Service |
The Controller Service that is used to obtain a connection to the database. |
| Database Dialect Service |
Database Dialect Service for generating statements specific to a particular service or vendor. |
| Default Decimal Precision |
When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'precision' denoting number of available digits is required. Generally, precision is defined by column data type definition or database engines default. However undefined precision (0) can be returned from some database engines. 'Default Decimal Precision' is used when writing those undefined precision numbers. |
| Default Decimal Scale |
When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'scale' denoting number of available decimal digits is required. Generally, scale is defined by column data type definition or database engines default. However when undefined precision (0) is returned, scale can also be uncertain with some database engines. 'Default Decimal Scale' is used when writing those undefined numbers. If a value has more decimals than specified scale, then the value will be rounded-up, e.g. 1.53 becomes 2 with scale 0, and 1.5 with scale 1. |
| Fetch Size |
The number of result rows to be fetched from the result set at a time. This is a hint to the database driver and may not be honored and/or exact. If the value specified is zero, then the hint is ignored. If using PostgreSQL, then 'Set Auto Commit' must be equal to 'false' to cause 'Fetch Size' to take effect. |
| Max Wait Time |
The maximum amount of time allowed for a running SQL select query , zero means there is no limit. Max time less than 1 second will be equal to zero. |
| Maximum-value Columns |
A comma-separated list of column names. The processor will keep track of the maximum value for each column that has been returned since the processor started running. Using multiple columns implies an order to the column list, and each column 's values are expected to increase more slowly than the previous columns' values. Thus, using multiple columns implies a hierarchical structure of columns, which is usually used for partitioning tables. This processor can be used to retrieve only those rows that have been added/updated since the last retrieval. Note that some JDBC types such as bit/boolean are not conducive to maintaining maximum value, so columns of these types should not be listed in this property, and will result in error(s) during processing. If no columns are provided, all rows from the table will be considered, which could have a performance impact. NOTE: It is important to use consistent max-value column names for a given table for incremental fetch to work properly. |
| Set Auto Commit |
Allows enabling or disabling the auto commit functionality of the DB connection. Default value is 'No value set'. 'No value set' will leave the db connection 's auto commit mode unchanged. For some JDBC drivers such as PostgreSQL driver, it is required to disable the auto commit functionality to get the'Fetch Size 'setting to take effect. When auto commit is enabled, PostgreSQL driver ignores'Fetch Size'setting and loads all rows of the result set to memory at once. This could lead for a large amount of memory usage when executing queries which fetch large data sets. More Details of this behaviour in PostgreSQL driver can be found in [https://jdbc.postgresql.org//documentation/head/query.html](https://jdbc.postgresql.org//documentation/head/query.html). |
| Table Name |
The name of the database table to be queried. When a custom query is used, this property is used to alias the query and appears as an attribute on the FlowFile. |
| Use Avro Logical Types |
Whether to use Avro Logical Types for DECIMAL/NUMBER, DATE, TIME and TIMESTAMP columns. If disabled, written as string. If enabled, Logical types are used and written as its underlying type, specifically, DECIMAL/NUMBER as logical 'decimal': written as bytes with additional precision and scale meta data, DATE as logical 'date-millis': written as int denoting days since Unix epoch (1970-01-01), TIME as logical 'time-millis': written as int denoting milliseconds since Unix epoch, and TIMESTAMP as logical 'timestamp-millis': written as long denoting milliseconds since Unix epoch. If a reader of written Avro records also knows these logical types, then these values can be deserialized with more context depending on reader implementation. |
| db-fetch-db-type |
Database Type for generating statements specific to a particular service or vendor. The Generic Type supports most cases but selecting a specific type enables optimal processing or additional features. |
| db-fetch-sql-query |
A custom SQL query used to retrieve data. Instead of building a SQL query from other properties, this query will be wrapped as a sub-query. Query must have no ORDER BY statement. |
| db-fetch-where-clause |
A custom clause to be added in the WHERE condition when building SQL queries. |
| initial-load-strategy |
How to handle existing rows in the database table when the processor is started for the first time (or its state has been cleared). The property will be ignored, if any 'initial.maxvalue.*' dynamic property has also been configured. |
| qdbt-max-frags |
The maximum number of fragments. If the value specified is zero, then all fragments are returned. This prevents OutOfMemoryError when this processor ingests huge table. NOTE: Setting this property can result in data loss, as the incoming results are not ordered, and fragments may end at arbitrary boundaries where rows are not included in the result set. |
| qdbt-max-rows |
The maximum number of result rows that will be included in a single FlowFile. This will allow you to break up very large result sets into multiple FlowFiles. If the value specified is zero, then all rows are returned in a single FlowFile. |
| qdbt-output-batch-size |
The number of output FlowFiles to queue before committing the process session. When set to zero, the session will be committed when all result set rows have been processed and the output FlowFiles are ready for transfer to the downstream relationship. For large result sets, this can cause a large burst of FlowFiles to be transferred at the end of processor execution. If this property is set, then when the specified number of FlowFiles are ready for transfer, then the session will be committed, thus releasing the FlowFiles to the downstream relationship. NOTE: The maxvalue.* and fragment.count attributes will not be set on FlowFiles when this property is set. |
| qdbtr-normalize |
Whether to change characters in column names when creating the output schema. For example, colons and periods will be changed to underscores. |
| qdbtr-record-writer |
Specifies the Controller Service to use for writing results to a FlowFile. The Record Writer may use Inherit Schema to emulate the inferred schema behavior, i.e. an explicit schema need not be defined in the writer, and will be supplied by the same logic used to infer the schema from the column types. |
## State management
| Scopes |
Description |
| CLUSTER |
After performing a query on the specified table, the maximum values for the specified column(s) will be retained for use in future executions of the query. This allows the Processor to fetch only those records that have max values greater than the retained values. This can be used for incremental fetching, fetching of newly added rows, etc. To clear the maximum values, clear the state of the processor per the State Management documentation |
## Relationships
| Name |
Description |
| success |
Successfully created FlowFile from SQL query result set. |
## Writes attributes
| Name |
Description |
| tablename |
Name of the table being queried |
| querydbtable.row.count |
The number of rows selected by the query |
| fragment.identifier |
If 'Max Rows Per Flow File' is set then all FlowFiles from the same query result set will have the same value for the fragment.identifier attribute. This can then be used to correlate the results. |
| fragment.count |
If 'Max Rows Per Flow File' is set then this is the total number of FlowFiles produced by a single ResultSet. This can be used in conjunction with the fragment.identifier attribute in order to know how many FlowFiles belonged to the same incoming ResultSet. If Output Batch Size is set, then this attribute will not be populated. |
| fragment.index |
If 'Max Rows Per Flow File' is set then the position of this FlowFile in the list of outgoing FlowFiles that were all derived from the same result set FlowFile. This can be used in conjunction with the fragment.identifier attribute to know which FlowFiles originated from the same query result set and in what order FlowFiles were produced |
| maxvalue.* |
Each attribute contains the observed maximum value of a specified 'Maximum-value Column'. The suffix of the attribute is the name of the column. If Output Batch Size is set, then this attribute will not be populated. |
| mime.type |
Sets the mime.type attribute to the MIME Type specified by the Record Writer. |
| record.count |
The number of records output by the Record Writer. |
## Use cases
| Retrieve all rows from a database table. |
| -------------------------------------------------------------------------------------------------------------- |
| Perform an incremental load of a single database table, fetching only new rows as they are added to the table. |
## Use Cases Involving Other Components
| Perform an incremental load of multiple database tables, fetching only new rows as they are added to the tables. |
| ---------------------------------------------------------------------------------------------------------------- |
## See also
- [org.apache.nifi.processors.standard.ExecuteSQL](/user-guide/data-integration/openflow/processors/executesql)
- [org.apache.nifi.processors.standard.GenerateTableFetch](/user-guide/data-integration/openflow/processors/generatetablefetch)
---
title: QueryMilvus 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/querymilvus.md
section: Loading & Unloading Data
---
# QueryMilvus 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-milvus-processors-nar
## Description
Queries a given collection in a Milvus database using vectors. Results of query are added to current record under the results record path for each vector searched.
## Tags
chatbot, embeddings, gen ai, genai, generative ai, llm, metadata, milvus, openflow, publish, query, search, text, vector
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Collection Name |
The name of the Milvus collection name to use |
| Max Query Batch Size |
This is the number of vectors that are contained in a single request to Milvus during a query. Milvus is unable to support batch queries of more then 10 vectors at a time. |
| Maximum Results |
The maximum number of results to return (i.e., Top K) |
| Milvus Connection Service |
Connection Service for accessing Milvus Database |
| Output Search Fields |
Comma separated list of additional fields to return from a search against the Milvus database. Milvus will return the score and id fields by default. |
| Partition |
Partition of the vector database that you want to perform operations in. If the database has only one partition leave empty. |
| Record Reader |
The Record Reader to use for reading the FlowFile |
| Record Writer |
The Record Writer to use for writing the results |
| Reranking Smoothing Parameter |
Smoothing Parameter of the Reciprocal Rank Fusion (RRFRanker) during Hybrid Search |
| Results Record Path |
Specifies where in the record to place the results. |
| Sparse Vector Field Name |
The name of the field to use for storing the sparse vectors. |
| Sparse Vector Indices Path |
If, Sparse Vectors are to be provided, this RecordPath points to the indices of the sparse data to use. |
| Sparse Vector Values Path |
If, Sparse Vectors are to be provided, this RecordPath points to the values of the sparse data to use. |
| Vector Field Name |
The name of the field in Milvus to use for storing the vectors. |
| Vector Record Path |
The path to the vector field in the record |
## Relationships
| Name |
Description |
| failure |
FlowFiles that cannot be sent to Milvus, and for which a retry is not expected to be successful, are routed to this relationship |
| retry |
FlowFiles that fail to be sent to Milvus, but for which a retry may help, are routed to this relationship |
| success |
FlowFiles that are successfully sent to Milvus are routed to this relationship |
## See also
- [com.snowflake.openflow.runtime.processors.milvus.UpsertMilvus](/user-guide/data-integration/openflow/processors/upsertmilvus)
---
title: QueryPinecone 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/querypinecone.md
section: Loading & Unloading Data
---
# QueryPinecone 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-pinecone-nar
## Description
Queries Pinecone for vectors that are similar to the input vector, or retrieves a vector by ID.
## Tags
chatbot, gen ai, generative ai, llm, openflow, pinecone, query, similarity, vector
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| ID Record Path |
The path to the ID field in the record |
| Include Metadata |
Specifies whether to include metadata in the results |
| Include Vectors |
Specifies whether to include vectors in the results |
| Number of Results |
The number of results to return (i.e., Top K) |
| Pinecone API Key |
The API key for the Pinecone service |
| Pinecone Index |
The name of the Pinecone index to use |
| Pinecone Namespace |
The name of the Pinecone namespace to use |
| Query Filter |
A JSON representation of the query filter to use |
| Query Strategy |
The strategy to use for querying Pinecone |
| Record Reader |
The Record Reader to use for reading the FlowFile |
| Record Writer |
The Record Writer to use for writing the results |
| Results Record Path |
Specifies where in the record to place the results. |
| Sparse Dense Vector Weighting |
Ranges from 0.0 to 1.0. Weight to apply on dense and sparse vectors when doing an hybrid search. (1 - weight) will be applied to the values of the sparse vector and (weight) will be applied to the dense vector. |
| Sparse Vector Indices Path |
If, Sparse Vectors are to be provided, this RecordPath points to the indices of the sparse data to use. |
| Sparse Vector Values Path |
If, Sparse Vectors are to be provided, this RecordPath points to the values of the sparse data to use. |
| Vector Record Path |
The path to the vector field in the record |
| Web Client Service |
The Web Client Service to use for communicating with Pinecone |
## Relationships
| Name |
Description |
| failure |
FlowFiles that cannot be sent to Pinecone, and for which a retry is not expected to be successful, are routed to this relationship |
| retry |
FlowFiles that fail to be sent to Pinecone, but for which a retry may help, are routed to this relationship |
| success |
FlowFiles that are successfully sent to Pinecone are routed to this relationship |
## Use Cases Involving Other Components
| Query Pinecone for vectors that are similar to some input text |
| -------------------------------------------------------------- |
---
title: QueryRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/queryrecord.md
section: Loading & Unloading Data
---
# QueryRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Evaluates one or more SQL queries against the contents of a FlowFile. The result of the SQL query then becomes the content of the output FlowFile. This can be used, for example, for field-specific filtering, transformation, and row-level filtering. Columns can be renamed, simple calculations and aggregations performed, etc. The Processor is configured with a Record Reader Controller Service and a Record Writer service so as to allow flexibility in incoming and outgoing data formats. The Processor must be configured with at least one user-defined property. The name of the Property is the Relationship to route data to, and the value of the Property is a SQL SELECT statement that is used to specify how input data should be transformed/filtered. The SQL statement must be valid ANSI SQL and is powered by Apache Calcite. If the transformation fails, the original FlowFile is routed to the 'failure' relationship. Otherwise, the data selected will be routed to the associated relationship. If the Record Writer chooses to inherit the schema from the Record, it is important to note that the schema that is inherited will be from the ResultSet, rather than the input Record. This allows a single instance of the QueryRecord processor to have multiple queries, each of which returns a different set of columns and aggregations. As a result, though, the schema that is derived will have no schema name, so it is important that the configured Record Writer not attempt to write the Schema Name as an attribute if inheriting the Schema from the Record. See the Processor Usage documentation for more information.
## Tags
aggregate, avro, calcite, csv, etl, filter, json, logs, modify, query, record, route, select, sql, text, transform, update
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Default Decimal Precision |
When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'precision' denoting number of available digits is required. Generally, precision is defined by column data type definition or database engines default. However undefined precision (0) can be returned from some database engines. 'Default Decimal Precision' is used when writing those undefined precision numbers. |
| Default Decimal Scale |
When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'scale' denoting number of available decimal digits is required. Generally, scale is defined by column data type definition or database engines default. However when undefined precision (0) is returned, scale can also be uncertain with some database engines. 'Default Decimal Scale' is used when writing those undefined numbers. If a value has more decimals than specified scale, then the value will be rounded-up, e.g. 1.53 becomes 2 with scale 0, and 1.5 with scale 1. |
| include-zero-record-flowfiles |
When running the SQL statement against an incoming FlowFile, if the result has no data, this property specifies whether or not a FlowFile will be sent to the corresponding relationship |
| record-reader |
Specifies the Controller Service to use for parsing incoming data and determining the data's schema |
| record-writer |
Specifies the Controller Service to use for writing results to a FlowFile |
## Relationships
| Name |
Description |
| failure |
If a FlowFile fails processing for any reason (for example, the SQL statement contains columns not present in input data), the original FlowFile it will be routed to this relationship |
| original |
The original FlowFile is routed to this relationship |
## Writes attributes
| Name |
Description |
| mime.type |
Sets the mime.type attribute to the MIME Type specified by the Record Writer |
| record.count |
The number of records selected by the query |
| QueryRecord.Route |
The relation to which the FlowFile was routed |
## Use cases
| Filter out records based on the values of the records' fields |
| ---------------------------------------------------------------------------------------- |
| Keep only specific records |
| Keep only specific fields in a a Record, where the names of the fields to keep are known |
| Route record-oriented data for processing based on its contents |
---
title: QuerySalesforceObject 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/querysalesforceobject.md
section: Loading & Unloading Data
---
# QuerySalesforceObject 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-salesforce-nar
## Description
Retrieves records from a Salesforce sObject. Users can add arbitrary filter conditions by setting the 'Custom WHERE Condition' property. The processor can also run a custom query, although record processing is not supported in that case. Supports incremental retrieval: users can define a field in the 'Age Field' property that will be used to determine when the record was created. When this property is set the processor will retrieve new records. Incremental loading and record-based processing are only supported in property-based queries. It 's also possible to define an initial cutoff value for the age, filtering out all older records even for the first run. In case of'Property Based Query 'this processor should run on the Primary Node only. FlowFile attribute' record.count 'indicates how many records were retrieved and written to the output. The processor can accept an optional input FlowFile and reference the FlowFile attributes in the query. When'Include Deleted Records 'is true, the processor will include deleted records (soft-deletes) in the results by using the' queryAll 'API. The'IsDeleted' field will be automatically included in the results when querying deleted records.
## Tags
query, salesforce, sobject, soql
## Input Requirement
ALLOWED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| age-delay |
The ending timestamp of the time window will be adjusted earlier by the amount configured in this property. For example, with a property value of 10 seconds, an ending timestamp of 12:30:45 would be changed to 12:30:35. |
| age-field |
The name of a TIMESTAMP field that will be used to filter records using a bounded time window. The processor will return only those records with a timestamp value newer than the timestamp recorded after the last processor run. |
| create-zero-record-files |
Specifies whether or not to create a FlowFile when the Salesforce REST API does not return any records |
| custom-soql-query |
Specify the SOQL query to run. |
| custom-where-condition |
A custom expression to be added in the WHERE clause of the query |
| field-names |
Comma-separated list of field names requested from the sObject to be queried. When this field is left empty, all fields are queried. |
| include-deleted-records |
If true, the processor will include deleted records (IsDeleted = true) in the query results. When enabled, the processor will use the 'queryAll' API. |
| initial-age-filter |
This property specifies the start time that the processor applies when running the first query. |
| oauth2-access-token-provider |
Service providing OAuth2 Access Tokens for authenticating using the HTTP Authorization Header |
| query-type |
Choose to provide the query by parameters or a full custom query. |
| read-timeout |
Maximum time allowed for reading a response from the Salesforce REST API |
| record-writer |
Service used for writing records returned from the Salesforce REST API |
| salesforce-api-version |
The version number of the Salesforce REST API appended to the URL after the services/data path. See Salesforce documentation for supported versions |
| salesforce-url |
The URL of the Salesforce instance including the domain without additional path information, such as [https://MyDomainName.my.salesforce.com](https://MyDomainName.my.salesforce.com) |
| sobject-name |
The Salesforce sObject to be queried |
## State management
| Scopes |
Description |
| CLUSTER |
When 'Age Field' is set, after performing a query the time of execution is stored. Subsequent queries will be augmented with an additional condition so that only records that are newer than the stored execution time (adjusted with the optional value of 'Age Delay') will be retrieved. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node can pick up where the previous node left off, without duplicating the data. |
## Relationships
| Name |
Description |
| failure |
The input flowfile gets sent to this relationship when the query fails. |
| original |
The input flowfile gets sent to this relationship when the query succeeds. |
| success |
For FlowFiles created as a result of a successful query. |
## Writes attributes
| Name |
Description |
| mime.type |
Sets the mime.type attribute to the MIME Type specified by the Record Writer. |
| record.count |
Sets the number of records in the FlowFile. |
| total.record.count |
Sets the total number of records in the FlowFile. |
## See also
- [org.apache.nifi.processors.salesforce.PutSalesforceObject](/user-guide/data-integration/openflow/processors/putsalesforceobject)
---
title: QuerySplunkIndexingStatus 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/querysplunkindexingstatus.md
section: Loading & Unloading Data
---
# QuerySplunkIndexingStatus 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-splunk-nar
## Description
Queries Splunk server in order to acquire the status of indexing acknowledgement.
## Tags
acknowledgement, http, logs, splunk
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Hostname |
The ip address or hostname of the Splunk server. |
| Owner |
The owner to pass to Splunk. |
| Password |
The password to authenticate to Splunk. |
| Port |
The HTTP Event Collector HTTP Port Number. |
| Scheme |
The scheme for connecting to Splunk. |
| Security Protocol |
The security protocol to use for communicating with Splunk. |
| Token |
HTTP Event Collector token starting with the string Splunk. For example 'Splunk 1234578-abcd-1234-abcd-1234abcd' |
| Username |
The username to authenticate to Splunk. |
| max-query-size |
The maximum number of acknowledgement identifiers the outgoing query contains in one batch. It is recommended not to set it too low in order to reduce network communication. |
| request-channel |
Identifier of the used request channel. |
| ttl |
The maximum time the processor tries to acquire acknowledgement confirmation for an index, from the point of registration. After the given amount of time, the processor considers the index as not acknowledged and transfers the FlowFile to the "unacknowledged" relationship. |
## Relationships
| Name |
Description |
| failure |
A FlowFile is transferred to this relationship when the acknowledgement was not successful due to errors during the communication. FlowFiles are timing out or unknown by the Splunk server will transferred to "undetermined" relationship. |
| success |
A FlowFile is transferred to this relationship when the acknowledgement was successful. |
| unacknowledged |
A FlowFile is transferred to this relationship when the acknowledgement was not successful. This can happen when the acknowledgement did not happened within the time period set for Maximum Waiting Time. FlowFiles with acknowledgement id unknown for the Splunk server will be transferred to this relationship after the Maximum Waiting Time is reached. |
| undetermined |
A FlowFile is transferred to this relationship when the acknowledgement state is not determined. FlowFiles transferred to this relationship might be penalized. This happens when Splunk returns with HTTP 200 but with false response for the acknowledgement id in the flow file attribute. |
## See also
- [org.apache.nifi.processors.splunk.PutSplunkHTTP](/user-guide/data-integration/openflow/processors/putsplunkhttp)
---
title: ReaderLookup
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/readerlookup.md
section: Loading & Unloading Data
---
# ReaderLookup
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Provides a RecordReaderFactory that can be used to dynamically select another RecordReaderFactory. This will allow multiple RecordReaderFactories to be defined and registered, and then selected dynamically at runtime by referencing a FlowFile attribute in the Service to Use property.
## Tags
lookup, parse, reader, record, row
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Service to Use * |
Service to Use |
$\{recordreader.name\} |
|
Specifies the name of the user-defined property whose associated Controller Service should be used. |
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: RecordSetWriterLookup
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/recordsetwriterlookup.md
section: Loading & Unloading Data
---
# RecordSetWriterLookup
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Provides a RecordSetWriterFactory that can be used to dynamically select another RecordSetWriterFactory. This will allow multiple RecordSetWriterFactory's to be defined and registered, and then selected dynamically at runtime by tagging FlowFiles with the attributes and referencing those attributes in the Service to Use property.
## Tags
lookup, record, recordset, result, row, serializer, set, writer
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Service to Use * |
Service to Use |
$\{recordsetwriter.name\} |
|
Specifies the name of the user-defined property whose associated Controller Service should be used. |
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: RecordSinkServiceLookup
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/recordsinkservicelookup.md
section: Loading & Unloading Data
---
# RecordSinkServiceLookup
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Provides a RecordSinkService that can be used to dynamically select another RecordSinkService. This service requires an attribute named 'record.sink.name' to be passed in when asking for a connection, and will throw an exception if the attribute is missing. The value of 'record.sink.name' will be used to select the RecordSinkService that has been registered with that name. This will allow multiple RecordSinkServices to be defined and registered, and then selected dynamically at runtime by tagging flow files with the appropriate 'record.sink.name' attribute. Note that this controller service is not intended for use in reporting tasks that employ RecordSinkService instances, such as QueryNiFiReportingTask.
## Tags
lookup, record, sink
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: RedisConnectionPoolService
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/redisconnectionpoolservice.md
section: Loading & Unloading Data
---
# RedisConnectionPoolService
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
A service that provides connections to Redis.
## Tags
cache, redis
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Cluster Max Redirects * |
Cluster Max Redirects |
5 |
|
The maximum number of redirects that can be performed when clustered. |
| Communication Timeout * |
Communication Timeout |
10 seconds |
|
The timeout to use when attempting to communicate with Redis. |
| Connection String * |
Connection String |
|
|
The connection string for Redis. In a standalone instance this value will be of the form hostname:port. In a sentinel instance this value will be the comma-separated list of sentinels, such as host1:port1,host2:port2,host3:port3. In a clustered instance this value will be the comma-separated list of cluster masters, such as host1:port,host2:port,host3:port. |
| Database Index * |
Database Index |
0 |
|
The database index to be used by connections created from this connection pool. See the databases property in redis.conf, by default databases 0-15 will be available. |
| Password |
Password |
|
|
The password used to authenticate to the Redis server. See the 'requirepass' property in redis.conf. |
| Pool - Block When Exhausted * |
Pool - Block When Exhausted |
true |
- true
- false
|
Whether or not clients should block and wait when trying to obtain a connection from the pool when the pool has no available connections. Setting this to false means an error will occur immediately when a client requests a connection and none are available. |
| Pool - Max Idle * |
Pool - Max Idle |
8 |
|
The maximum number of idle connections that can be held in the pool, or a negative value if there is no limit. |
| Pool - Max Total * |
Pool - Max Total |
8 |
|
The maximum number of connections that can be allocated by the pool (checked out to clients, or idle awaiting checkout). A negative value indicates that there is no limit. |
| Pool - Max Wait Time * |
Pool - Max Wait Time |
10 seconds |
|
The amount of time to wait for an available connection when Block When Exhausted is set to true. |
| Pool - Min Evictable Idle Time * |
Pool - Min Evictable Idle Time |
60 seconds |
|
The minimum amount of time an object may sit idle in the pool before it is eligible for eviction. |
| Pool - Min Idle * |
Pool - Min Idle |
0 |
|
The target for the minimum number of idle connections to maintain in the pool. If the configured value of Min Idle is greater than the configured value for Max Idle, then the value of Max Idle will be used instead. |
| Pool - Num Tests Per Eviction Run * |
Pool - Num Tests Per Eviction Run |
-1 |
|
The number of connections to tests per eviction attempt. A negative value indicates to test all connections. |
| Pool - Test On Borrow * |
Pool - Test On Borrow |
false |
- true
- false
|
Whether or not connections should be tested upon borrowing from the pool. |
| Pool - Test On Create * |
Pool - Test On Create |
false |
- true
- false
|
Whether or not connections should be tested upon creation. |
| Pool - Test On Return * |
Pool - Test On Return |
false |
- true
- false
|
Whether or not connections should be tested upon returning to the pool. |
| Pool - Test While Idle * |
Pool - Test While Idle |
true |
- true
- false
|
Whether or not connections should be tested while idle. |
| Pool - Time Between Eviction Runs * |
Pool - Time Between Eviction Runs |
30 seconds |
|
The amount of time between attempting to evict idle connections from the pool. |
| Redis Mode * |
Redis Mode |
Standalone |
- Standalone
- Sentinel
- Cluster
|
The type of Redis being communicated with - standalone, sentinel, or clustered. |
| SSL Context Service |
SSL Context Service |
|
|
If specified, this service will be used to create an SSL Context that will be used to secure communications; if not specified, communications will not be secure |
| Sentinel Master |
Sentinel Master |
|
|
The name of the sentinel master, require when Mode is set to Sentinel |
| Sentinel Password |
Sentinel Password |
|
|
The password used to authenticate to the Redis Sentinel server. See the 'requirepass' and 'sentinel sentinel-pass' properties in sentinel.conf. |
| Sentinel Username |
Sentinel Username |
|
|
The username used to authenticate to the Redis sentinel server. |
| Username |
Username |
|
|
The username used to authenticate to the Redis server. |
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: RedisDistributedMapCacheClientService
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/redisdistributedmapcacheclientservice.md
section: Loading & Unloading Data
---
# RedisDistributedMapCacheClientService
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
An implementation of DistributedMapCacheClient that uses Redis as the backing cache. This service relies on the WATCH, MULTI, and EXEC commands in Redis, which are not fully supported when Redis is clustered. As a result, this service can only be used with a Redis Connection Pool that is configured for standalone or sentinel mode. Sentinel mode can be used to provide high-availability configurations.
## Tags
cache, distributed, map, redis
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| TTL * |
redis-cache-ttl |
0 secs |
|
Indicates how long the data should exist in Redis. Setting '0 secs' would mean the data would exist forever |
| Redis Connection Pool * |
redis-connection-pool |
|
|
|
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: RemoveFieldRecordReader
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/removefieldrecordreader.md
section: Loading & Unloading Data
---
# RemoveFieldRecordReader
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
A wrapper for a RecordReaderFactory that supports filtering out specified fields from NiFi Records. It allows users to specify a list of field names that should be ignored when reading records from the record reader returned from the wrapped RecordReaderFactory. The ignored record fields are specified as dynamic properties. At least one dynamic property must be set. The dynamic property name is used as a description of the field to remove, and the dynamic property value is a RecordPath that identifies the field to be removed. Nested paths are supported. Record paths targeting the root path ("/") are not allowed and will result in a validation error. This service should be used when all of the following criteria are met: - your delegate RecordReaderFactory is configured to infer the schema from the data - you do not have or do not want to define a static schema for the data you 're reading - the fields you set to be ignored should not be serialized to the NiFi content repository for security or performance reasons If any of the above criteria are not met, consider using the RecordFieldRemover processor instead. NOTE: The RecordReader returned by this implementation is hardcoded to drop unknown fields rather than ignoring them. Even when the RecordReader's nextRecord(coerceTypes, dropUnknownFields) method is called with dropUnknownFields set to false, the RecordReader will still drop unknown fields.
## Tags
delete, field, filter, reader, record, remove
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Record Reader * |
Record Reader |
|
|
The underlying RecordReaderFactory service that will be used to read records before filtering is applied. |
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: RemoveRecordField 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/removerecordfield.md
section: Loading & Unloading Data
---
# RemoveRecordField 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Modifies the contents of a FlowFile that contains Record-oriented data (i.e. data that can be read via a RecordReader and written by a RecordWriter) by removing selected fields. This Processor requires that at least one user-defined Property be added. The name of the property is ignored by the processor, but could be a meaningful identifier for the user. The value of the property should indicate a RecordPath that determines the field to be removed. The processor executes the removal in the order in which these properties are added to the processor. Set the "Record Writer" to "Inherit Record Schema" in order to use the updated Record Schema modified when removing Fields.
## Tags
avro, csv, delete, freeform, generic, json, record, remove, schema, text, update
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Record Reader |
Specifies the Controller Service to use for reading incoming data |
| Record Writer |
Specifies the Controller Service to use for writing out the records |
## Relationships
| Name |
Description |
| failure |
If a FlowFile cannot be transformed from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship |
| success |
FlowFiles that are successfully transformed will be routed to this relationship |
## Writes attributes
| Name |
Description |
| record.error.message |
This attribute provides on failure the error message encountered by the Reader or Writer. |
## Use cases
| Remove one or more fields from a Record, where the names of the fields to remove are known. |
| ------------------------------------------------------------------------------------------- |
## See also
- [org.apache.nifi.processors.standard.UpdateRecord](/user-guide/data-integration/openflow/processors/updaterecord)
---
title: RenameRecordField 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/renamerecordfield.md
section: Loading & Unloading Data
---
# RenameRecordField 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Renames one or more fields in each Record of a FlowFile. This Processor requires that at least one user-defined Property be added. The name of the Property should indicate a RecordPath that determines the field that should be updated. The value of the Property is the new name to assign to the Record Field that matches the RecordPath. The property value may use Expression Language to reference FlowFile attributes as well as the variables *field.name*, *field.value*, *field.type*, and *record.index*
## Tags
avro, csv, field, generic, json, log, logs, record, rename, schema, update
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Record Reader |
Specifies the Controller Service to use for reading incoming data |
| Record Writer |
Specifies the Controller Service to use for writing out the records |
## Relationships
| Name |
Description |
| failure |
If a FlowFile cannot be transformed from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship |
| success |
FlowFiles that are successfully transformed will be routed to this relationship |
## Writes attributes
| Name |
Description |
| record.index |
This attribute provides the current row index and is only available inside the literal value expression. |
## Use cases
| Rename a field in each Record to a specific, known name. |
| ---------------------------------------------------------------------------------------- |
| Rename a field in each Record to a name that is derived from a FlowFile attribute. |
| Rename a field in each Record to a new name that is derived from the current field name. |
## See also
- [org.apache.nifi.processors.standard.RemoveRecordField](/user-guide/data-integration/openflow/processors/removerecordfield)
- [org.apache.nifi.processors.standard.UpdateRecord](/user-guide/data-integration/openflow/processors/updaterecord)
---
title: ReplaceText 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/replacetext.md
section: Loading & Unloading Data
---
# ReplaceText 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Updates the content of a FlowFile by searching for some textual value in the FlowFile content (via Regular Expression/regex, or literal value) and replacing the section of the content that matches with some alternate value. It can also be used to append or prepend text to the contents of a FlowFile.
## Tags
Change, Modify, Regex, Regular Expression, Replace, Text, Update
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Character Set |
The Character Set in which the file is encoded |
| Evaluation Mode |
Run the 'Replacement Strategy' against each line separately (Line-by-Line) or buffer the entire file into memory (Entire Text) and run against that. |
| Line-by-Line Evaluation Mode |
Run the 'Replacement Strategy' against each line separately (Line-by-Line) for all lines in the FlowFile, First Line (Header) alone, Last Line (Footer) alone, Except the First Line (Header) or Except the Last Line (Footer). |
| Maximum Buffer Size |
Specifies the maximum amount of data to buffer (per file or per line, depending on the Evaluation Mode) in order to apply the replacement. If 'Entire Text' (in Evaluation Mode) is selected and the FlowFile is larger than this value, the FlowFile will be routed to 'failure'. In 'Line-by-Line' Mode, if a single line is larger than this value, the FlowFile will be routed to 'failure'. A default value of 1 MB is provided, primarily for 'Entire Text' mode. In 'Line-by-Line' Mode, a value such as 8 KB or 16 KB is suggested. This value is ignored if the <Replacement Strategy> property is set to one of: Append, Prepend, Always Replace |
| Regular Expression |
The Search Value to search for in the FlowFile content. Only used for 'Literal Replace' and 'Regex Replace' matching strategies |
| Replacement Strategy |
The strategy for how and what to replace within the FlowFile's text content. |
| Replacement Value |
The value to insert using the 'Replacement Strategy'. Using "Regex Replace" back-references to Regular Expression capturing groups are supported, but back-references that reference capturing groups that do not exist in the regular expression will be treated as literal value. Back References may also be referenced using the Expression Language, as '$1', '$2', etc. The single-tick marks MUST be included, as these variables are not "Standard" attribute names (attribute names must be quoted unless they contain only numbers, letters, and _). |
| Text to Append |
The text to append to the end of the FlowFile, or each line, depending on the configured value of the Evaluation Mode property |
| Text to Prepend |
The text to prepend to the start of the FlowFile, or each line, depending on the configured value of the Evaluation Mode property |
## Relationships
| Name |
Description |
| failure |
FlowFiles that could not be updated are routed to this relationship |
| success |
FlowFiles that have been successfully processed are routed to this relationship. This includes both FlowFiles that had text replaced and those that did not. |
## Use cases
| Append text to the end of every line in a FlowFile |
| ----------------------------------------------------------------------------------- |
| Prepend text to the beginning of every line in a FlowFile |
| Replace every occurrence of a literal string in the FlowFile with a different value |
| Transform every occurrence of a literal string in a FlowFile |
| Completely replace the contents of a FlowFile to a specific text |
---
title: ReplaceTextWithMapping 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/replacetextwithmapping.md
section: Loading & Unloading Data
---
# ReplaceTextWithMapping 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Updates the content of a FlowFile by evaluating a Regular Expression against it and replacing the section of the content that matches the Regular Expression with some alternate value provided in a mapping file.
## Tags
Change, Mapping, Modify, Regex, Regular Expression, Replace, Text, Update
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Character Set |
The Character Set in which the file is encoded |
| Mapping File |
The name of the file (including the full path) containing the Mappings. |
| Mapping File Refresh Interval |
The polling interval to check for updates to the mapping file. The default is 60s. |
| Matching Group |
The number of the matching group of the provided regex to replace with the corresponding value from the mapping file (if it exists). |
| Maximum Buffer Size |
Specifies the maximum amount of data to buffer (per file) in order to apply the regular expressions. If a FlowFile is larger than this value, the FlowFile will be routed to 'failure' |
| Regular Expression |
The Regular Expression to search for in the FlowFile content |
## Relationships
| Name |
Description |
| failure |
FlowFiles that could not be updated are routed to this relationship |
| success |
FlowFiles that have been successfully updated are routed to this relationship, as well as FlowFiles whose content does not match the given Regular Expression |
---
title: RestLookupService
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/restlookupservice.md
section: Loading & Unloading Data
---
# RestLookupService
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Use a REST service to look up values.
## Tags
http, json, lookup, rest, xml
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Proxy Configuration Service |
proxy-configuration-service |
|
|
Specifies the Proxy Configuration Controller Service to proxy network requests. In case of SOCKS, it is not guaranteed that the selected SOCKS Version will be used by the processor. |
| Authentication Strategy * |
rest-lookup-authentication-strategy |
NONE |
- None
- Basic
- OAuth2
|
Authentication strategy to use with REST service. |
| Basic Authentication Password |
rest-lookup-basic-auth-password |
|
|
The password to be used by the client to authenticate against the Remote URL. |
| Basic Authentication Username |
rest-lookup-basic-auth-username |
|
|
The username to be used by the client to authenticate against the Remote URL. Cannot include control characters (0-31), ':', or DEL (127). |
| Connection Timeout * |
rest-lookup-connection-timeout |
5 secs |
|
Max wait time for connection to remote service. |
| Use Digest Authentication |
rest-lookup-digest-auth |
false |
- true
- false
|
Whether to communicate with the website using Digest Authentication. 'Basic Authentication Username' and 'Basic Authentication Password' are used for authentication. |
| OAuth2 Access Token Provider * |
rest-lookup-oauth2-access-token-provider |
|
|
Enables managed retrieval of OAuth2 Bearer Token applied to HTTP requests using the Authorization Header. |
| Read Timeout * |
rest-lookup-read-timeout |
15 secs |
|
Max wait time for response from remote service. |
| Record Path |
rest-lookup-record-path |
|
|
An optional record path that can be used to define where in a record to get the real data to merge into the record set to be enriched. See documentation for examples of when this might be useful. |
| Record Reader * |
rest-lookup-record-reader |
|
|
The record reader to use for loading the payload and handling it as a record set. |
| Response Handling Strategy * |
rest-lookup-response-handling-strategy |
RETURNED |
- Returned
- Evaluated
|
Whether to return all responses or throw errors for unsuccessful HTTP status codes. |
| SSL Context Service |
rest-lookup-ssl-context-service |
|
|
The SSL Context Service used to provide client certificate information for TLS/SSL connections. |
| URL * |
rest-lookup-url |
|
|
The URL for the REST endpoint. Expression language is evaluated against the lookup key/value pairs, not flowfile attributes. |
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: RetryFlowFile 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/retryflowfile.md
section: Loading & Unloading Data
---
# RetryFlowFile 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
FlowFiles passed to this Processor have a 'Retry Attribute' value checked against a configured 'Maximum Retries' value. If the current attribute value is below the configured maximum, the FlowFile is passed to a retry relationship. The FlowFile may or may not be penalized in that condition. If the FlowFile 's attribute value exceeds the configured maximum, the FlowFile will be passed to a' retries_exceeded 'relationship. WARNING: If the incoming FlowFile has a non-numeric value in the configured'Retry Attribute 'attribute, it will be reset to'1 '. You may choose to fail the FlowFile instead of performing the reset. Additional dynamic properties can be defined for any attributes you wish to add to the FlowFiles transferred to' retries_exceeded'. These attributes support attribute expression language.
## Tags
FlowFile, Retry
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Fail on Non-numerical Overwrite |
If the FlowFile already has the attribute defined in 'Retry Attribute' that is *not* a number, fail the FlowFile instead of resetting that value to '1' |
| maximum-retries |
The maximum number of times a FlowFile can be retried before being passed to the 'retries_exceeded' relationship |
| penalize-retries |
If set to 'true', this Processor will penalize input FlowFiles before passing them to the 'retry' relationship. This does not apply to the 'retries_exceeded' relationship. |
| retry-attribute |
The name of the attribute that contains the current retry count for the FlowFile. WARNING: If the name matches an attribute already on the FlowFile that does not contain a numerical value, the processor will either overwrite that attribute with '1' or fail based on configuration. |
| reuse-mode |
Defines how the Processor behaves if the retry FlowFile has a different retry UUID than the instance that received the FlowFile. This generally means that the attribute was not reset after being successfully retried by a previous instance of this processor. |
## Relationships
| Name |
Description |
| failure |
The processor is configured such that a non-numerical value on 'Retry Attribute' results in a failure instead of resetting that value to '1'. This will immediately terminate the limited feedback loop. Might also include when 'Maximum Retries' contains attribute expression language that does not resolve to an Integer. |
| retries_exceeded |
Input FlowFile has exceeded the configured maximum retry count, do not pass this relationship back to the input Processor to terminate the limited feedback loop. |
| retry |
Input FlowFile has not exceeded the configured maximum retry count, pass this relationship back to the input Processor to create a limited feedback loop. |
## Writes attributes
| Name |
Description |
| Retry Attribute |
User defined retry attribute is updated with the current retry count |
| Retry Attribute .uuid |
User defined retry attribute with .uuid that determines what processor retried the FlowFile last |
---
title: RouteOnAttribute 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/routeonattribute.md
section: Loading & Unloading Data
---
# RouteOnAttribute 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Routes FlowFiles based on their Attributes using the Attribute Expression Language
## Tags
Attribute Expression Language, Expression Language, Regular Expression, attributes, detect, filter, find, regex, regexp, routing, search, string, text
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Routing Strategy |
Specifies how to determine which relationship to use when evaluating the Expression Language |
## Relationships
| Name |
Description |
| unmatched |
FlowFiles that do not match any user-define expression will be routed here |
## Writes attributes
| Name |
Description |
| RouteOnAttribute.Route |
The relation to which the FlowFile was routed |
## Use cases
| Route data to one or more relationships based on its attributes using the NiFi Expression Language. |
| --------------------------------------------------------------------------------------------------- |
| Keep data only if its attributes meet some criteria, such as its filename ends with .txt. |
| Discard or drop a file based on attributes, such as filename. |
## Use Cases Involving Other Components
| Route record-oriented data based on whether or not the record's values meet some criteria |
| ----------------------------------------------------------------------------------------- |
---
title: RouteOnContent 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/routeoncontent.md
section: Loading & Unloading Data
---
# RouteOnContent 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Applies Regular Expressions to the content of a FlowFile and routes a copy of the FlowFile to each destination whose Regular Expression matches. Regular Expressions are added as User-Defined Properties where the name of the property is the name of the relationship and the value is a Regular Expression to match against the FlowFile content. User-Defined properties do support the Attribute Expression Language, but the results are interpreted as literal values, not Regular Expressions
## Tags
content, detect, filter, find, regex, regexp, regular expression, route, search, string, text
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Character Set |
The Character Set in which the file is encoded |
| Content Buffer Size |
Specifies the maximum amount of data to buffer in order to apply the regular expressions. If the size of the FlowFile exceeds this value, any amount of this value will be ignored |
| Match Requirement |
Specifies whether the entire content of the file must match the regular expression exactly, or if any part of the file (up to Content Buffer Size) can contain the regular expression in order to be considered a match |
## Relationships
| Name |
Description |
| unmatched |
FlowFiles that do not match any of the user-supplied regular expressions will be routed to this relationship |
---
title: RouteText 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/routetext.md
section: Loading & Unloading Data
---
# RouteText 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Routes textual data based on a set of user-defined rules. Each line in an incoming FlowFile is compared against the values specified by user-defined Properties. The mechanism by which the text is compared to these user-defined properties is defined by the 'Matching Strategy'. The data is then routed according to these rules, routing each line of the text individually.
## Tags
Expression Language, Regular Expression, attributes, csv, delimited, detect, filter, find, logs, regex, regexp, routing, search, string, text
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Character Set |
The Character Set in which the incoming text is encoded |
| Grouping Regular Expression |
Specifies a Regular Expression to evaluate against each line to determine which Group the line should be placed in. The Regular Expression must have at least one Capturing Group that defines the line's Group. If multiple Capturing Groups exist in the Regular Expression, the values from all Capturing Groups will be concatenated together. Two lines will not be placed into the same FlowFile unless they both have the same value for the Group (or neither line matches the Regular Expression). For example, to group together all lines in a CSV File by the first column, we can set this value to "(.*?),.*". Two lines that have the same Group but different Relationships will never be placed into the same FlowFile. |
| Ignore Case |
If true, capitalization will not be taken into account when comparing values. E.g., matching against 'HELLO' or 'hello' will have the same result. This property is ignored if the 'Matching Strategy' is set to 'Satisfies Expression'. |
| Ignore Leading/Trailing Whitespace |
Indicates whether or not the whitespace at the beginning and end of the lines should be ignored when evaluating the line. |
| Matching Strategy |
Specifies how to evaluate each line of incoming text against the user-defined properties. |
| Routing Strategy |
Specifies how to determine which Relationship(s) to use when evaluating the lines of incoming text against the 'Matching Strategy' and user-defined properties. |
## Relationships
| Name |
Description |
| original |
The original input file will be routed to this destination when the lines have been successfully routed to 1 or more relationships |
| unmatched |
Data that does not satisfy the required user-defined rules will be routed to this Relationship |
## Writes attributes
| Name |
Description |
| RouteText.Route |
The name of the relationship to which the FlowFile was routed. |
| RouteText.Group |
The value captured by all capturing groups in the 'Grouping Regular Expression' property. If this property is not set or contains no capturing groups, this attribute will not be added. |
## Use cases
| Drop blank or empty lines from the FlowFile's content. |
| -------------------------------------------------------------------------------------------------------------------------------- |
| Remove specific lines of text from a file, such as those containing a specific word or having a line length over some threshold. |
---
title: RunDatabricksJob 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/rundatabricksjob.md
section: Loading & Unloading Data
---
# RunDatabricksJob 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
com.snowflake.openflow.runtime | runtime-databricks-processors-nar
## Description
Triggers a pre-defined Databricks job to run with custom parameters. Job parameters can be set using dynamic properties
## Tags
databricks, jobs, openflow
## Input Requirement
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Databricks Client |
Databricks Client Service. |
| Job ID |
Databricks Job ID |
| Job Name |
Databricks Job Name |
| Wait for Job Completion |
Wait for the Databricks job to complete before transferring the FlowFile to success |
## Relationships
| Name |
Description |
| failure |
Databricks failure relationship |
| success |
Databricks success relationship |
## Writes attributes
| Name |
Description |
| job.run.id |
The run id assigned to the invoked job |
| job.result.state |
The result state for the invoked job |
| error.code |
The error code for the SQL statement if an error occurred. |
| error.message |
The error message for the SQL statement if an error occurred. |
---
title: RunMongoAggregation 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/runmongoaggregation.md
section: Loading & Unloading Data
---
# RunMongoAggregation 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-mongodb-nar
## Description
A processor that runs an aggregation query whenever a flowfile is received.
## Tags
aggregate, aggregation, mongo
## Input Requirement
ALLOWED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Batch Size |
The number of elements returned from the server in one batch. |
| Mongo Collection Name |
The name of the collection to use |
| Mongo Database Name |
The name of the database to use |
| allow-disk-use |
Set this to true to enable writing data to temporary files to prevent exceeding the maximum memory use limit during aggregation pipeline staged when handling large datasets. |
| json-type |
By default, MongoDB's Java driver returns "extended JSON". Some of the features of this variant of JSON may cause problems for other JSON parsers that expect only standard JSON types and conventions. This configuration setting controls whether to use extended JSON or provide a clean view that conforms to standard JSON. |
| mongo-agg-query |
The aggregation query to be executed. |
| mongo-charset |
Specifies the character set of the document data. |
| mongo-client-service |
If configured, this property will use the assigned client service for connection pooling. |
| mongo-date-format |
The date format string to use for formatting Date fields that are returned from Mongo. It is only applied when the JSON output format is set to Standard JSON. |
| mongo-query-attribute |
If set, the query will be written to a specified attribute on the output flowfiles. |
| results-per-flowfile |
How many results to put into a flowfile at once. The whole body will be treated as a JSON array of results. |
## Relationships
| Name |
Description |
| failure |
The input flowfile gets sent to this relationship when the query fails. |
| original |
The input flowfile gets sent to this relationship when the query succeeds. |
| results |
The result set of the aggregation will be sent to this relationship. |
---
title: S3FileResourceService
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/s3fileresourceservice.md
section: Loading & Unloading Data
---
# S3FileResourceService
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Provides an Amazon Web Services (AWS) S3 file resource for other components.
## Tags
AWS, Amazon, S3, file, resource
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| AWS Credentials Provider service * |
AWS Credentials Provider service |
|
|
The Controller Service that is used to obtain AWS credentials provider |
| Bucket * |
Bucket |
$\{s3.bucket\} |
|
The S3 Bucket to interact with |
| Object Key * |
Object Key |
$\{filename\} |
|
The S3 Object Key to use. This is analogous to a filename for traditional file systems. |
| Region * |
Region |
us-west-2 |
- AWS GovCloud (US)
- AWS GovCloud (US-East)
- US East (N. Virginia)
- US East (Ohio)
- US West (N. California)
- US West (Oregon)
- EU (Ireland)
- EU (London)
- EU (Paris)
- EU (Frankfurt)
- EU (Zurich)
- EU (Stockholm)
- EU (Milan)
- EU (Spain)
- Asia Pacific (Hong Kong)
- Asia Pacific (Taipei)
- Asia Pacific (Mumbai)
- Asia Pacific (Hyderabad)
- Asia Pacific (Singapore)
- Asia Pacific (Sydney)
- Asia Pacific (Jakarta)
- Asia Pacific (Melbourne)
- Asia Pacific (Malaysia)
- Asia Pacific (Thailand)
- Asia Pacific (Tokyo)
- Asia Pacific (Seoul)
- Asia Pacific (Osaka)
- South America (Sao Paulo)
- China (Beijing)
- China (Ningxia)
- Canada (Central)
- Canada West (Calgary)
- Middle East (UAE)
- Middle East (Bahrain)
- Africa (Cape Town)
- US ISO East
- US ISOB East (Ohio)
- US ISO West
- US ISOF East1 (California)
- US ISOF South1 (Alpine)
- Israel (Tel Aviv)
- Mexico (Central)
- EU ISOE West
- Use 's3.region' Attribute
|
The AWS Region to connect to. |
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: SalesforceDataCloudOAuthTokenProvider
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/salesforcedatacloudoauthtokenprovider.md
section: Loading & Unloading Data
---
# SalesforceDataCloudOAuthTokenProvider
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Retrieves an OAuth2 access token from Salesforce using the configured OAuth2 Access Token Provider and exchanges the token for a Data Cloud API token. The token is then used to authenticate with Salesforce Data Cloud APIs.
## Tags
preview, salesforce
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| OAuth2 Access Token Provider * |
OAuth2 Access Token Provider |
|
|
JWT Token Provider to use in order to retrieve an access token from Salesforce that will be exchanged for a Data Cloud API token. |
| Refresh Window * |
Refresh Window |
0 s |
|
The service will attempt to refresh tokens expiring within the refresh window, subtracting the configured duration from the token expiration. |
| Salesforce Instance * |
Salesforce Instance |
|
|
The hostname of the Salesforce instance including the domain such as MyDomainName.my.salesforce.com |
| Web Client Service * |
Web Client Service |
|
|
The Web Client Service to use for communicating with Salesforce |
## State management
This component does not store state.
## Restricted
This component is not restricted.
## System Resource Considerations
This component does not specify system resource considerations.
---
title: SampleRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/samplerecord.md
section: Loading & Unloading Data
---
# SampleRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Samples the records of a FlowFile based on a specified sampling strategy (such as Reservoir Sampling). The resulting FlowFile may be of a fixed number of records (in the case of reservoir-based algorithms) or some subset of the total number of records (in the case of probabilistic sampling), or a deterministic number of records (in the case of interval sampling).
## Tags
interval, range, record, reservoir, sample
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| record-reader |
Specifies the Controller Service to use for parsing incoming data and determining the data's schema |
| record-writer |
Specifies the Controller Service to use for writing results to a FlowFile |
| sample-record-interval |
Specifies the number of records to skip before writing a record to the outgoing FlowFile. This property is only used if Sampling Strategy is set to Interval Sampling. A value of zero (0) will cause no records to be included in theoutgoing FlowFile, a value of one (1) will cause all records to be included, and a value of two (2) will cause half the records to be included, and so on. |
| sample-record-probability |
Specifies the probability (as a percent from 0-100) of a record being included in the outgoing FlowFile. This property is only used if Sampling Strategy is set to Probabilistic Sampling. A value of zero (0) will cause no records to be included in theoutgoing FlowFile, and a value of 100 will cause all records to be included in the outgoing FlowFile.. |
| sample-record-random-seed |
Specifies a particular number to use as the seed for the random number generator (used by probabilistic strategies). Setting this property will ensure the same records are selected even when using probabilistic strategies. |
| sample-record-range |
Specifies the range of records to include in the sample, from 1 to the total number of records. An example is '3,6-8,20-' which includes the third record, the sixth, seventh and eighth records, and all records from the twentieth record on. Commas separate intervals that don't overlap, and an interval can be between two numbers (i.e. 6-8) or up to a given number (i.e. -5), or from a number to the number of the last record (i.e. 20-). If this property is unset, all records will be included. |
| sample-record-reservoir |
Specifies the number of records to write to the outgoing FlowFile. This property is only used if Sampling Strategy is set to reservoir-based strategies such as Reservoir Sampling. |
| sample-record-sampling-strategy |
Specifies which method to use for sampling records from the incoming FlowFile |
## Relationships
| Name |
Description |
| failure |
If a FlowFile fails processing for any reason (for example, any record is not valid), the original FlowFile will be routed to this relationship |
| original |
The original FlowFile is routed to this relationship if sampling is successful |
| success |
The FlowFile is routed to this relationship if the sampling completed successfully |
## Writes attributes
| Name |
Description |
| mime.type |
The MIME type indicated by the record writer |
| record.count |
The number of records in the resulting flow file |
---
title: SAP® BDC Connect for Snowflake
source: https://docs.snowflake.com/en/user-guide/data-integration/zero-copy/sap-sql/setup-sap-bdc.md
section: Loading & Unloading Data
---
# SAP® BDC Connect for Snowflake
- [](/user-guide/data-integration/zero-copy/about-sap-snowflake)
- [](/user-guide/data-integration/zero-copy/sap-sql/setup-tasks)
- [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-snowflake)
This topic describes the steps to set up an SAP® Business Data Cloud connection for use with an existing Snowflake account.
The Snowflake account must be Standard, Enterprise, or Business Critical edition and must be on
AWS commercial in a supported region as described in
[Supported Cloud Regions](/user-guide/intro-regions).
For more information see, [Provisioning SAP Business Data Cloud Connect](https://help.sap.com/docs/business-data-cloud/administering-sap-business-data-cloud/provision-sap-business-data-cloud-connector-for-supported-external-systems).
As an SAP® administrator, perform the following steps:
1. Obtain your Snowflake account URL and ensure it follows the format
https://orgName-accountName.snowflakecomputing.com.
Which should be all lower-case and replace _ (underscore) with - (dash) for RFC compliance.
2. Provision SAP Business Data Cloud Connect as documented here: [Provisioning SAP Business Data Cloud Connect](https://help.sap.com/docs/business-data-cloud/administering-sap-business-data-cloud/provision-sap-business-data-cloud-connector-for-supported-external-systems).
3. Follow steps 1-5 in the wizard
4. In wizard step 6: Configure Parameters:
- **External System Instance Identifier**: Enter your Snowflake account URL:
https://orgName-accountName.snowflakecomputing.com
- **Region**: Select the same region that you used for enabling SAP Business Data Cloud Core.
5. Complete wizard steps 7 and 8.
6. In step 9: Hover over the **View Tenant Notifications** button.
A pop-up window opens with an **Invitation Link** that can be used to complete the configuration in Snowflake.
7. Copy the Invitation Link
8. Log into your Snowflake account to complete the remainder of the configuration
to create a Zerocopy Connector as described in [](/user-guide/data-integration/zero-copy/sap-sql/setup).
## Next steps
In your [SAP for Me](https://me.sap.com/) environment, choose the Customer Landscape tab and, under the Formations tab, choose Include Systems to add the SAP BDC Connect instance to an existing formation.
Customers can create additional Zerocopy Connectors in the same Snowflake account and enroll them with the same or
different SAP® Business Data Cloud tenant. Each Zerocopy Connector requires a
new **Invitation Link** that can be obtained from [SAP for Me](https://me.sap.com/).
Each **Invitation Link** can be enrolled only once with SAP® Business Data Cloud.
To create a new formation, see [Creating SAP Business Data Cloud Formations](https://help.sap.com/docs/business-data-cloud/administering-sap-business-data-cloud/integrate-sap-business-data-cloud-provisioned-systems?locale=en-US&state=PRODUCTION&version=SHIP).
---
title: SAP® BDC Connect for Snowflake Zerocopy Connector — Security and Privileges
source: https://docs.snowflake.com/en/user-guide/data-integration/zero-copy/sap-sql/security.md
section: Loading & Unloading Data
---
# %sapbdc% Zerocopy Connector — Security and Privileges
- [](/user-guide/data-integration/zero-copy/sap-sql/setup)
- [](/user-guide/data-integration/zero-copy/sap-sql/explore-data-products)
This topic describes the privileges required to create and manage a Zerocopy
Connector and the catalog-linked databases created from it.
## Access Control Requirements
A [role](#label-access-control-overview-roles) used to execute this operation must have the following
[privileges](#label-access-control-overview-privileges) at a minimum:
| Privilege |
Object |
Notes |
| `CREATE ZEROCOPY CONNECTOR` |
Schema |
Required to create a Zerocopy Connector. By default, the schema owner has this privilege. |
| `OPERATE` |
Zerocopy Connector |
Required to connect or disconnect (`ALTER ... CONNECT` / `ALTER ... DISCONNECT`) and to publish a data product (`SYSTEM$SAP_PUBLISH_DATA_PRODUCT`). |
| `USAGE` |
Zerocopy Connector |
Required to create a catalog-linked database from the connector (also requires `CREATE DATABASE` on the account) and to add or remove a share from the connector (also requires `OWNERSHIP` on the share). |
| `MODIFY` |
Zerocopy Connector |
Required to set or unset properties (comment, share_back, etc.). |
| `MONITOR` |
Zerocopy Connector |
Any privilege on the connector (e.g. `MONITOR`) is sufficient to describe the connector, show connectors, or list shares. |
| `OWNERSHIP` |
Zerocopy Connector |
Required to rename or drop the connector. |
| `CREATE DATABASE` |
Account |
Required to create a catalog-linked database from a Zerocopy Connector (also requires `USAGE` on the connector). |
For instructions on creating a custom role with a specified set of privileges, see [](#label-security-custom-role).
For general information about roles and privilege grants for performing SQL actions on
[securable objects](#label-access-control-securable-objects), see [Overview of Access Control](/user-guide/security-access-control-overview).
## Connector States
A Zerocopy Connector transitions through the following states. Understanding
the state is important because some operations are only permitted in specific
states.
| State |
Description |
| `NEW` |
Initial state after the connector is created. No connection has been attempted yet. |
| `CONNECTING` |
A connection attempt is in progress. The connector enters this state immediately after `ALTER ... CONNECT` is issued. |
| `CONNECTED` |
The connection is established. Catalog-linked databases can only be created when the connector is in this state. Sharing data between Snowflake and SAP® BDC is only allowed when the connector is in this state. |
| `CONNECT_ERROR` |
The connection attempt failed. The error message is persisted on the connector. You can retry the connection from this state. |
| `DISCONNECTING` |
A disconnection is in progress. The connector enters this state immediately after `ALTER ... DISCONNECT` is issued. |
| `DISCONNECTED` |
The connection has been dropped. You can reconnect from this state. |
| `DISCONNECT_ERROR` |
The disconnection attempt failed. The error message is persisted on the connector. |
| `DELETED` |
The connector has been dropped. This state is permanent — Zerocopy Connectors do not support `UNDROP`. |
### State Transition Rules
- `ALTER ... CONNECT` is permitted when the connector is in `NEW`,
`CONNECT_ERROR`, or `DISCONNECTED` state.
- `ALTER ... DISCONNECT` is permitted when the connector is in `CONNECTED`
or `DISCONNECT_ERROR` state.
- Share-back must be disabled before disconnecting.
- All catalog-linked databases created from the connector must be dropped before disconnecting.
- `DROP ZEROCOPY CONNECTOR` is permitted when the connector is in `NEW`,
`CONNECT_ERROR`, `DISCONNECT_ERROR`, or `DISCONNECTED` state.
- Catalog-linked databases do not support `UNDROP`.
---
title: SAP® Snowflake
source: https://docs.snowflake.com/en/user-guide/data-integration/zero-copy/sap-sql/setup-sap-snowflake.md
section: Loading & Unloading Data
---
# SAP® Snowflake
- [](/user-guide/data-integration/zero-copy/about-sap-snowflake)
- [](/user-guide/data-integration/zero-copy/sap-sql/setup-tasks)
- [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-bdc)
This topic describes the steps to configure an instance for %sapsnowflake% for SAP customers without an existing Snowflake account.
The SAP® Snowflake account provisioned is the Business Critical edition.
As an SAP® administrator, perform the following steps:
1. Sign in to [SAP for Me](https://me.sap.com/) with an S-user ID or login name.
2. From the sidebar menu, choose **Portfolio & products**.
3. In the **My Product Packages** tab, select the **SAP Business Data Cloud** product.
4. Select the **Applications** tab and in the **SAP Snowflake** card, click **Start Provisioning**.
The **Provision SAP® Snowflake** wizard dialog displays and guides you through the provisioning process.
5. In the Provision SAP® Snowflake dialog, configure the following parameters and click **Next**:
- **Entitlement System**: Displays the ID of the SAP® Business Data Cloud Entitlement set. Cannot be changed.
- **Name**: Enter an appropriate name for the SAP solution.
- **Path**: Select or create a resource group under which to group the solution
components provisioned for SAP® Business Data Cloud.
Create it in the same location selected for the SAP® Business Data Cloud cockpit system.
- **Business Type**: Preset to Production.
6. In the **Select Application** step, SAP Snowflake is pre-selected.
The **Configure Parameters** step displays.
7. In the **Configure Parameters** step, configure the following parameters and click **Next**:
- **Region**: Choose an available region in the [SAP for Me](https://me.sap.com/) portal.
Snowflake recommends choosing the same region as the SAP® Business Data Cloud core for optimal performance.
- **Admin email**: Provide the email address of the user to be defined as the administrator of your SAP Snowflake system.
This user is responsible for adding additional users and for further configuration.
- **Admin First Name**: The first name of the administrator of your SAP Snowflake system.
- **Admin Last Name**: The last name of the administrator of your SAP Snowflake system.
Provisioning begins and SAP® notifies you that a provisioning request was sent to the specified owner's e-mail address.
8. Click **View in Resources** to view the tenant within the indicated resource group.
The **Resources** tab shows the current solution status, which should be `Processing`.
9. Select the tenant below the new solution and click **Details** to view the details of the tenant.
10. On top of the **details** view of the tenant, choose the **View Details** link.
A pop-up window opens that provides an activation link to the SAP Snowflake account.
If you are the SAP Snowflake system owner, select this link and complete the activation flow
in SAP Snowflake (see [Activating the SAP Snowflake Account](https://help.sap.com/docs/business-data-cloud/introducing-sap-snowflake/introducing-sap-snowflake)).
If not, share the activation link with the SAP Snowflake owner and ask them to complete the activation flow.
11. After the account has been activated in SAP for Me, the status for your SAP
Snowflake solution and tenant changes to `Ready`. In the details view of the SAP
Snowflake tenant, in the Path field, select the URL to open SAP Snowflake and log in.
## Next steps
The SAP® BDC admin may provision as many SAP® Snowflake accounts as they need with unique account names to help distinguish them.
Every SAP® Snowflake account will need to be activated as described in the note below.
After activation, the SAP® Snowflake is ready for you to share Data Products from SAP® BDC to
SAP® Snowflake. As part of the provisioning process, a Zerocopy Connector called
`DEFAULT_SAP_BDC_CONNECTOR` is automatically created under the `CONNECTORS.ZEROCOPY` schema
and enrolled with SAP® Business Data Cloud in the SAP® Snowflake account. You are ready to share
data products from SAP® BDC and consume them in SAP® Snowflake. For more information,
see [](/user-guide/data-integration/zero-copy/sap-sql/explore-data-products).
Customers can create additional Zerocopy Connectors in the same SAP® Snowflake account and enroll
them with the same or different SAP® Business Data Cloud tenant. Each Zerocopy Connector requires a
new Invitation Link that can be obtained from [SAP for Me](https://me.sap.com/). Each Invitation
Link can be enrolled only once with SAP® Business Data Cloud.
Customers can view the status of provisioning in the **Details** view.
After provisioning is complete, the customer can click the Snowflake activation link available in the Details view to activate their SAP® Snowflake account, login, change their username and reset their password, setup MFA, and perform other operations.
---
title: ScanAttribute 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/scanattribute.md
section: Loading & Unloading Data
---
# ScanAttribute 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Scans the specified attributes of FlowFiles, checking to see if any of their values are present within the specified dictionary of terms
## Tags
attributes, find, lookup, scan, search, text
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Attribute Pattern |
Regular Expression that specifies the names of attributes whose values will be matched against the terms in the dictionary |
| Dictionary File |
A new-line-delimited text file that includes the terms that should trigger a match. Empty lines are ignored. The contents of the text file are loaded into memory when the processor is scheduled and reloaded when the contents are modified. |
| Dictionary Filter Pattern |
A Regular Expression that will be applied to each line in the dictionary file. If the regular expression does not match the line, the line will not be included in the list of terms to search for. If a Matching Group is specified, only the portion of the term that matches that Matching Group will be used instead of the entire term. If not specified, all terms in the dictionary will be used and each term will consist of the text of the entire line in the file |
| Match Criteria |
If set to All Must Match, then FlowFiles will be routed to 'matched' only if all specified attributes 'values are found in the dictionary. If set to At Least 1 Must Match, FlowFiles will be routed to' matched' if any attribute specified is found in the dictionary |
## Relationships
| Name |
Description |
| matched |
FlowFiles whose attributes are found in the dictionary will be routed to this relationship |
| unmatched |
FlowFiles whose attributes are not found in the dictionary will be routed to this relationship |
---
title: ScanContent 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/scancontent.md
section: Loading & Unloading Data
---
# ScanContent 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Scans the content of FlowFiles for terms that are found in a user-supplied dictionary. If a term is matched, the UTF-8 encoded version of the term will be added to the FlowFile using the 'matching.term' attribute
## Tags
aho-corasick, byte sequence, content, dictionary, find, scan, search
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Dictionary Encoding |
Indicates how the dictionary is encoded. If 'text', dictionary terms are new-line delimited and UTF-8 encoded; if 'binary', dictionary terms are denoted by a 4-byte integer indicating the term length followed by the term itself |
| Dictionary File |
The filename of the terms dictionary |
## Relationships
| Name |
Description |
| matched |
FlowFiles that match at least one term in the dictionary are routed to this relationship |
| unmatched |
FlowFiles that do not match any term in the dictionary are routed to this relationship |
## Writes attributes
| Name |
Description |
| matching.term |
The term that caused the Processor to route the FlowFile to the 'matched' relationship; if FlowFile is routed to the 'unmatched' relationship, this attribute is not added |
---
title: ScriptedFilterRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/scriptedfilterrecord.md
section: Loading & Unloading Data
---
# ScriptedFilterRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-scripting-nar
## Description
This processor provides the ability to filter records out from FlowFiles using the user-provided script. Every record will be evaluated by the script which must return with a boolean value. Records with "true" result will be routed to the "matching" relationship in a batch. Other records will be filtered out.
## Tags
filter, groovy, record, script
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Module Directory |
Comma-separated list of paths to files and/or directories which contain modules required by the script. |
| Record Reader |
The Record Reader to use parsing the incoming FlowFile into Records |
| Record Writer |
The Record Writer to use for serializing Records after they have been transformed |
| Script Body |
Body of script to execute. Only one of Script File or Script Body may be used |
| Script Engine |
The Language to use for the script |
| Script File |
Path to script file to execute. Only one of Script File or Script Body may be used |
## Restrictions
| Required Permission |
Explanation |
| execute code |
Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. |
## Relationships
| Name |
Description |
| failure |
In case of any issue during processing the incoming FlowFile, the incoming FlowFile will be routed to this relationship. |
| original |
After successful procession, the incoming FlowFile will be transferred to this relationship. This happens regardless the number of filtered or remaining records. |
| success |
Matching records of the original FlowFile will be routed to this relationship. If there are no matching records, no FlowFile will be routed here. |
## Writes attributes
| Name |
Description |
| mime.type |
Sets the mime.type attribute to the MIME Type specified by the Record Writer |
| record.count |
The number of records within the flow file. |
| record.error.message |
This attribute provides on failure the error message encountered by the Reader or Writer. |
## See also
- [org.apache.nifi.processors.script.ScriptedPartitionRecord](/user-guide/data-integration/openflow/processors/scriptedpartitionrecord)
- [org.apache.nifi.processors.script.ScriptedTransformRecord](/user-guide/data-integration/openflow/processors/scriptedtransformrecord)
- [org.apache.nifi.processors.script.ScriptedValidateRecord](/user-guide/data-integration/openflow/processors/scriptedvalidaterecord)
---
title: ScriptedLookupService
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/scriptedlookupservice.md
section: Loading & Unloading Data
---
# ScriptedLookupService
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Allows the user to provide a scripted LookupService instance in order to enrich records from an incoming flow file.
## Tags
groovy, invoke, lookup, record, script
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Module Directory |
Module Directory |
|
|
Comma-separated list of paths to files and/or directories which contain modules required by the script. |
| Script Body |
Script Body |
|
|
Body of script to execute. Only one of Script File or Script Body may be used |
| Script Engine * |
Script Engine |
Groovy |
- Groovy |
Language Engine for executing scripts |
| Script File |
Script File |
|
|
Path to script file to execute. Only one of Script File or Script Body may be used |
## State management
This component does not store state.
## Restricted
## Restrictions
| Required Permission |
Explanation |
| execute code |
Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. |
## System Resource Considerations
This component does not specify system resource considerations.
---
title: ScriptedPartitionRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/scriptedpartitionrecord.md
section: Loading & Unloading Data
---
# ScriptedPartitionRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-scripting-nar
## Description
Receives Record-oriented data (i.e., data that can be read by the configured Record Reader) and evaluates the user provided script against each record in the incoming flow file. Each record is then grouped with other records sharing the same partition and a FlowFile is created for each groups of records. Two records shares the same partition if the evaluation of the script results the same return value for both. Those will be considered as part of the same partition.
## Tags
groovy, group, organize, partition, record, script, segment, split
## Input Requirement
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Module Directory |
Comma-separated list of paths to files and/or directories which contain modules required by the script. |
| Record Reader |
The Record Reader to use parsing the incoming FlowFile into Records |
| Record Writer |
The Record Writer to use for serializing Records after they have been transformed |
| Script Body |
Body of script to execute. Only one of Script File or Script Body may be used |
| Script Engine |
The Language to use for the script |
| Script File |
Path to script file to execute. Only one of Script File or Script Body may be used |
## Restrictions
| Required Permission |
Explanation |
| execute code |
Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. |
## Relationships
| Name |
Description |
| failure |
If a FlowFile cannot be partitioned from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship |
| original |
Once all records in an incoming FlowFile have been partitioned, the original FlowFile is routed to this relationship. |
| success |
FlowFiles that are successfully partitioned will be routed to this relationship |
## Writes attributes
| Name |
Description |
| partition |
The partition of the outgoing flow file. If the script indicates that the partition has a null value, the attribute will be set to the literal string "<null partition>" (without quotes). Otherwise, the attribute is set to the String representation of whatever value is returned by the script. |
| mime.type |
Sets the mime.type attribute to the MIME Type specified by the Record Writer |
| record.count |
The number of records within the flow file. |
| record.error.message |
This attribute provides on failure the error message encountered by the Reader or Writer. |
| fragment.index |
A one-up number that indicates the ordering of the partitioned FlowFiles that were created from a single parent FlowFile |
| fragment.count |
The number of partitioned FlowFiles generated from the parent FlowFile |
## See also
- [org.apache.nifi.processors.script.ScriptedFilterRecord](/user-guide/data-integration/openflow/processors/scriptedfilterrecord)
- [org.apache.nifi.processors.script.ScriptedTransformRecord](/user-guide/data-integration/openflow/processors/scriptedtransformrecord)
- [org.apache.nifi.processors.script.ScriptedValidateRecord](/user-guide/data-integration/openflow/processors/scriptedvalidaterecord)
---
title: ScriptedReader
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/scriptedreader.md
section: Loading & Unloading Data
---
# ScriptedReader
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Allows the user to provide a scripted RecordReaderFactory instance in order to read/parse/generate records from an incoming flow file.
## Tags
groovy, invoke, record, recordFactory, script
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Module Directory |
Module Directory |
|
|
Comma-separated list of paths to files and/or directories which contain modules required by the script. |
| Script Body |
Script Body |
|
|
Body of script to execute. Only one of Script File or Script Body may be used |
| Script Engine * |
Script Engine |
Groovy |
- Groovy |
Language Engine for executing scripts |
| Script File |
Script File |
|
|
Path to script file to execute. Only one of Script File or Script Body may be used |
## State management
This component does not store state.
## Restricted
## Restrictions
| Required Permission |
Explanation |
| execute code |
Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. |
## System Resource Considerations
This component does not specify system resource considerations.
---
title: ScriptedRecordSetWriter
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/scriptedrecordsetwriter.md
section: Loading & Unloading Data
---
# ScriptedRecordSetWriter
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Allows the user to provide a scripted RecordSetWriterFactory instance in order to write records to an outgoing flow file.
## Tags
groovy, invoke, record, script, writer
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Module Directory |
Module Directory |
|
|
Comma-separated list of paths to files and/or directories which contain modules required by the script. |
| Script Body |
Script Body |
|
|
Body of script to execute. Only one of Script File or Script Body may be used |
| Script Engine * |
Script Engine |
Groovy |
- Groovy |
Language Engine for executing scripts |
| Script File |
Script File |
|
|
Path to script file to execute. Only one of Script File or Script Body may be used |
## State management
This component does not store state.
## Restricted
## Restrictions
| Required Permission |
Explanation |
| execute code |
Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. |
## System Resource Considerations
This component does not specify system resource considerations.
---
title: ScriptedRecordSink
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/controllers/scriptedrecordsink.md
section: Loading & Unloading Data
---
# ScriptedRecordSink
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/controllers/index)
## Description
Allows the user to provide a scripted RecordSinkService instance in order to transmit records to the desired target. The script must set a variable 'recordSink' to an implementation of RecordSinkService.
## Tags
groovy, invoke, record, record sink, script
## Properties
In the list below required Properties are shown with an asterisk (*).
Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
| Display Name |
API Name |
Default Value |
Allowable Values |
Description |
| Module Directory |
Module Directory |
|
|
Comma-separated list of paths to files and/or directories which contain modules required by the script. |
| Script Body |
Script Body |
|
|
Body of script to execute. Only one of Script File or Script Body may be used |
| Script Engine * |
Script Engine |
Groovy |
- Groovy |
Language Engine for executing scripts |
| Script File |
Script File |
|
|
Path to script file to execute. Only one of Script File or Script Body may be used |
## State management
This component does not store state.
## Restricted
## Restrictions
| Required Permission |
Explanation |
| execute code |
Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. |
## System Resource Considerations
This component does not specify system resource considerations.
---
title: ScriptedTransformRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/scriptedtransformrecord.md
section: Loading & Unloading Data
---
# ScriptedTransformRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-scripting-nar
## Description
Provides the ability to evaluate a simple script against each record in an incoming FlowFile. The script may transform the record in some way, filter the record, or fork additional records. See Processor's Additional Details for more information.
## Tags
filter, groovy, modify, record, script, transform, update
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Module Directory |
Comma-separated list of paths to files and/or directories which contain modules required by the script. |
| Record Reader |
The Record Reader to use parsing the incoming FlowFile into Records |
| Record Writer |
The Record Writer to use for serializing Records after they have been transformed |
| Script Body |
Body of script to execute. Only one of Script File or Script Body may be used |
| Script Engine |
The Language to use for the script |
| Script File |
Path to script file to execute. Only one of Script File or Script Body may be used |
## Restrictions
| Required Permission |
Explanation |
| execute code |
Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. |
## Relationships
| Name |
Description |
| failure |
Any FlowFile that cannot be transformed will be routed to this Relationship |
| success |
Each FlowFile that were successfully transformed will be routed to this Relationship |
## Writes attributes
| Name |
Description |
| mime.type |
Sets the mime.type attribute to the MIME Type specified by the Record Writer |
| record.count |
The number of records in the FlowFile |
| record.error.message |
This attribute provides on failure the error message encountered by the Reader or Writer. |
## See also
- [org.apache.nifi.processors.jolt.JoltTransformRecord](/user-guide/data-integration/openflow/processors/jolttransformrecord)
- [org.apache.nifi.processors.script.ExecuteScript](/user-guide/data-integration/openflow/processors/executescript)
- [org.apache.nifi.processors.standard.LookupRecord](/user-guide/data-integration/openflow/processors/lookuprecord)
- [org.apache.nifi.processors.standard.QueryRecord](/user-guide/data-integration/openflow/processors/queryrecord)
- [org.apache.nifi.processors.standard.UpdateRecord](/user-guide/data-integration/openflow/processors/updaterecord)
---
title: ScriptedValidateRecord 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/scriptedvalidaterecord.md
section: Loading & Unloading Data
---
# ScriptedValidateRecord 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-scripting-nar
## Description
This processor provides the ability to validate records in FlowFiles using the user-provided script. The script is expected to have a record as incoming argument and return with a boolean value. Based on this result, the processor categorizes the records as "valid" or "invalid" and routes them to the respective relationship in batch. Additionally the original FlowFile will be routed to the "original" relationship or in case of unsuccessful processing, to the "failed" relationship.
## Tags
groovy, record, script, validate
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Module Directory |
Comma-separated list of paths to files and/or directories which contain modules required by the script. |
| Record Reader |
The Record Reader to use parsing the incoming FlowFile into Records |
| Record Writer |
The Record Writer to use for serializing Records after they have been transformed |
| Script Body |
Body of script to execute. Only one of Script File or Script Body may be used |
| Script Engine |
The Language to use for the script |
| Script File |
Path to script file to execute. Only one of Script File or Script Body may be used |
## Restrictions
| Required Permission |
Explanation |
| execute code |
Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. |
## Relationships
| Name |
Description |
| failure |
In case of any issue during processing the incoming flow file, the incoming FlowFile will be routed to this relationship. |
| invalid |
FlowFile containing the invalid records from the incoming FlowFile will be routed to this relationship. If there are no invalid records, no FlowFile will be routed to this Relationship. |
| original |
After successful procession, the incoming FlowFile will be transferred to this relationship. This happens regardless the FlowFiles might routed to "valid" and "invalid" relationships. |
| valid |
FlowFile containing the valid records from the incoming FlowFile will be routed to this relationship. If there are no valid records, no FlowFile will be routed to this Relationship. |
## Writes attributes
| Name |
Description |
| mime.type |
Sets the mime.type attribute to the MIME Type specified by the Record Writer |
| record.count |
The number of records within the flow file. |
| record.error.message |
This attribute provides on failure the error message encountered by the Reader or Writer. |
## See also
- [org.apache.nifi.processors.script.ScriptedFilterRecord](/user-guide/data-integration/openflow/processors/scriptedfilterrecord)
- [org.apache.nifi.processors.script.ScriptedPartitionRecord](/user-guide/data-integration/openflow/processors/scriptedpartitionrecord)
- [org.apache.nifi.processors.script.ScriptedTransformRecord](/user-guide/data-integration/openflow/processors/scriptedtransformrecord)
---
title: SearchElasticsearch 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/searchelasticsearch.md
section: Loading & Unloading Data
---
# SearchElasticsearch 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-elasticsearch-restapi-nar
## Description
A processor that allows the user to repeatedly run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. Search After/Point in Time queries must include a valid "sort" field. The processor will retrieve multiple pages of results until either no more results are available or the Pagination Keep Alive expiration is reached, after which the query will restart with the first page of results being retrieved.
## Tags
elasticsearch, elasticsearch7, elasticsearch8, elasticsearch9, json, page, query, scroll, search
## Input Requirement
FORBIDDEN
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Aggregation Results Format |
Format of Aggregation output. |
| Aggregation Results Split |
Output a flowfile containing all aggregations or one flowfile for each individual aggregation. |
| Aggregations |
One or more query aggregations (or "aggs"), in JSON syntax. Ex: \{"items": \{"terms": \{"field": "product", "size": 10\}\}\} |
| Client Service |
An Elasticsearch client service to use for running queries. |
| Fields |
Fields of indexed documents to be retrieved, in JSON syntax. Ex: ["user.id", "http.response.*", \{"field": "@timestamp", "format": "epoch_millis"\}] |
| Index |
The name of the index to use. |
| Max JSON Field String Length |
The maximum allowed length of a string value when parsing a JSON document or attribute. |
| Output No Hits |
Output a "hits" flowfile even if no hits found for query. If true, an empty "hits" flowfile will be output even if "aggregations" are output. |
| Pagination Keep Alive |
Pagination "keep_alive" period. Period Elasticsearch will keep the scroll/pit cursor alive in between requests (this is not the time expected for all pages to be returned, but the maximum allowed time for requests between page retrievals). |
| Pagination Type |
Pagination method to use. Not all types are available for all Elasticsearch versions, check the Elasticsearch docs to confirm which are applicable and recommended for your service. |
| Query |
A query in JSON syntax, not Lucene syntax. Ex: \{"query":\{"match":\{"somefield":"somevalue"\}\}\}. If the query is empty, a default JSON Object will be used, which will result in a "match_all" query in Elasticsearch. |
| Query Attribute |
If set, the executed query will be set on each result flowfile in the specified attribute. |
| Query Clause |
A "query" clause in JSON syntax, not Lucene syntax. Ex: \{"match":\{"somefield":"somevalue"\}\}. If the query is empty, a default JSON Object will be used, which will result in a "match_all" query in Elasticsearch. |
| Query Definition Style |
How the JSON Query will be defined for use by the processor. |
| Restart On Finish |
Whether the processor should start another search with the same query once a paginated search has completed. |
| Script Fields |
Fields to created using script evaluation at query runtime, in JSON syntax. Ex: \{"test1": \{"script": \{"lang": "painless", "source": "doc[ 'price'].value * 2"\}\}, "test2": \{"script": \{"lang": "painless", "source": "doc[ 'price'].value * params.factor", "params": \{"factor": 2.0\}\}\}\} |
| Search Results Format |
Format of Hits output. |
| Search Results Split |
Output a flowfile containing all hits or one flowfile for each individual hit or one flowfile containing all hits from all paged responses. |
| Size |
The maximum number of documents to retrieve in the query. If the query is paginated, this "size" applies to each page of the query, not the "size" of the entire result set. |
| Sort |
Sort results by one or more fields, in JSON syntax. Ex: [\{"price" : \{"order" : "asc", "mode" : "avg"\}\}, \{"post_date" : \{"format": "strict_date_optional_time_nanos"\}\}] |
| Type |
The type of this document (used by Elasticsearch for indexing and searching). |
## State management
| Scopes |
Description |
| LOCAL |
The pagination state (scrollId, searchAfter, pitId, hitCount, pageCount, pageExpirationTimestamp) is retained in between invocations of this processor until the Scroll/PiT has expired (when the current time is later than the last query execution plus the Pagination Keep Alive interval). |
## Relationships
| Name |
Description |
| aggregations |
Aggregations are routed to this relationship. |
| failure |
All flowfiles that fail for reasons unrelated to server availability go to this relationship. |
| hits |
Search hits are routed to this relationship. |
| retry |
All flowfiles that fail due to server/cluster availability go to this relationship. |
## Writes attributes
| Name |
Description |
| mime.type |
application/json |
| aggregation.name |
The name of the aggregation whose results are in the output flowfile |
| aggregation.number |
The number of the aggregation whose results are in the output flowfile |
| page.number |
The number of the page (request), starting from 1, in which the results were returned that are in the output flowfile |
| hit.count |
The number of hits that are in the output flowfile |
| elasticsearch.query.error |
The error message provided by Elasticsearch if there is an error querying the index. |
## See also
- [org.apache.nifi.processors.elasticsearch.ConsumeElasticsearch](/user-guide/data-integration/openflow/processors/consumeelasticsearch)
- [org.apache.nifi.processors.elasticsearch.PaginatedJsonQueryElasticsearch](/user-guide/data-integration/openflow/processors/paginatedjsonqueryelasticsearch)
---
title: SegmentContent 2025.10.9.21
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/processors/segmentcontent.md
section: Loading & Unloading Data
---
# SegmentContent 2025.10.9.21
This feature is not available in the People's Republic of China.
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
**Related Topics**
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/processors/index)
## Bundle
org.apache.nifi | nifi-standard-nar
## Description
Segments a FlowFile into multiple smaller segments on byte boundaries. Each segment is given the following attributes: fragment.identifier, fragment.index, fragment.count, segment.original.filename; these attributes can then be used by the MergeContent processor in order to reconstitute the original FlowFile
## Tags
segment, split
## Input Requirement
REQUIRED
## Supports Sensitive Dynamic Properties
false
## Properties
| Property |
Description |
| Segment Size |
The maximum data size in bytes for each segment |
## Relationships
| Name |
Description |
| original |
The original FlowFile will be sent to this relationship |
| segments |
All segments will be sent to this relationship. If the file was small enough that it was not segmented, a copy of the original is sent to this relationship as well as original |
## Writes attributes
| Name |
Description |
| fragment.identifier |
All segments produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute |
| fragment.index |
A one-up number that indicates the ordering of the segments that were created from a single parent FlowFile |
| fragment.count |
The number of segments generated from the parent FlowFile |
| segment.original.filename |
The filename of the parent FlowFile |
| segment.original.filename |
The filename will be updated to include the parent's filename, the segment index, and the segment count |
## See also
- [org.apache.nifi.processors.standard.MergeContent](/user-guide/data-integration/openflow/processors/mergecontent)
---
title: Set up and access Openflow
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-roles-login.md
section: Loading & Unloading Data
---
# Set up and access Openflow
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/setup-openflow-byoc)
- [](/user-guide/data-integration/openflow/setup-openflow-spcs)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/processors/index)
- [](/user-guide/data-integration/openflow/controllers/index)
To use Openflow, you must configure roles and permissions in your Snowflake account, and set up a database. This topic describes how to set up the necessary roles and permissions.
## Set up the Openflow admin roles
The **Openflow Admin role** is used by a deployment engineer to set up Openflow workflows. A Snowflake administrator adds this role by performing the following steps:
1. Sign in to %sf-web-interface-link%.
2. Open a SQL worksheet.
3. Create a role for the Openflow admin, allowing it the required permissions to manage integrations and compute pools required for deployments. In the SQL below, OPENFLOW_ADMIN is the default name for the Openflow admin, but you can choose any name.
```sql
USE ROLE ACCOUNTADMIN;
CREATE ROLE IF NOT EXISTS OPENFLOW_ADMIN;
GRANT CREATE ROLE ON ACCOUNT TO ROLE OPENFLOW_ADMIN;
GRANT CREATE OPENFLOW DATA PLANE INTEGRATION ON ACCOUNT
TO ROLE OPENFLOW_ADMIN;
GRANT CREATE OPENFLOW RUNTIME INTEGRATION ON ACCOUNT
TO ROLE OPENFLOW_ADMIN;
```
4. Grant the admin role and secondary roles to a user.
To prevent issues with login, when you create an Openflow user, Snowflake recommends that you also assign and set default secondary roles to that user. This is helpful because Openflow doesn't allow users with the following roles to log in: ACCOUNTADMIN, ORGADMIN, GLOBALORGADMIN, or SECURITYADMIN. While logged in, Openflow actions can be authorized by any of the authenticated user's roles, not just the default role.
Substitute <OPENFLOW_USER> with the appropriate username:
```sql
USE ROLE ACCOUNTADMIN;
GRANT ROLE OPENFLOW_ADMIN TO USER ;
ALTER USER SET DEFAULT_ROLE = OPENFLOW_ADMIN;
ALTER USER SET DEFAULT_SECONDARY_ROLES = ('ALL');
```
## Accept the Openflow terms of service
This step is only required once for your organization.
1. Sign in to Snowflake as a user with the ORGADMIN role.
2. In the navigation menu, select **Ingestion** %raa% **Openflow**.
3. Review the agreement and select **Accept**.
## Start Openflow
Log in to Openflow by performing the following steps:
1. Sign in to %sf-web-interface-link%.
2. In the navigation menu, select **Ingestion** %raa% **Openflow**.
3. Select **Launch Openflow**.
### Troubleshooting login issues
- If you can log into Snowflake but can't log into Openflow, try the following:
- Try changing your role to something other than ACCOUNTADMIN, ORGADMIN, GLOBALORGADMIN, or SECURITYADMIN.
- Try adding default secondary roles to the account:
```sql
USE ROLE ACCOUNTADMIN;
ALTER USER SET DEFAULT_SECONDARY_ROLES = ('ALL');
```
---
title: Set up Openflow - BYOC
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-byoc.md
section: Loading & Unloading Data
---
# Set up Openflow - BYOC
This feature is not available in the People's Republic of China.
Openflow BYOC deployments are available to all accounts in AWS [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/about-byoc)
- [](/user-guide/data-integration/openflow/setup-openflow-byoc-custom-ingress)
- [](/user-guide/data-integration/openflow/setup-openflow-byoc-encrypted-volumes)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/monitor)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
This topic describes the steps to set up Openflow.
Setting up Openflow involves the following steps:
- [Create a deployment in your cloud](#create-a-deployment-in-your-cloud)
- [Create a Runtime environment in your cloud](#create-a-runtime-environment-in-your-cloud)
## Prerequisites
The prerequisites to be completed on your Snowflake and AWS accounts are as follows:
### Snowflake account
You'll need to first define privileges at the Snowflake account level.
1. Run the following SQL commands to grant the required privileges to the Openflow admin role:
```sql
USE ROLE ACCOUNTADMIN;
GRANT CREATE OPENFLOW DATA PLANE INTEGRATION ON ACCOUNT TO ROLE $openflow_admin_role;
GRANT CREATE OPENFLOW RUNTIME INTEGRATION ON ACCOUNT TO ROLE $openflow_admin_role;
```
The new privileges are assigned to the ACCOUNTADMIN role as part of the default set of privileges, and that role can grant the privileges to a role of their choosing for the Openflow admin role, denoted as $openflow_admin_role in the code.
2. Next, set `default_secondary_roles` to `ALL` for all Openflow users:
1. Sign in to Snowflake with a role that your ACCOUNTADMIN assigned for using Openflow.
This may not be any of the following roles: ACCOUNTADMIN, ORGADMIN, GLOBALORGADMIN, or SECURITYADMIN.
If you see a blank screen or the error "message: Invalid consent request" when logging into Openflow, change your role to a role that is not one of these listed roles.
For more information, see [Prerequisites](#prerequisites).
2. Run the following code, replacing $openflow_user for each Openflow user:
```sql
USE ROLE ACCOUNTADMIN;
ALTER USER $openflow_user SET DEFAULT_SECONDARY_ROLES = ('ALL');
```
This setting is required because Openflow actions are authorized by using any of the authenticated user's roles, and not just the default role.
#### Deployment integration privileges
The deployment integration object represents a set of resources provisioned to deploy one or more Snowflake Openflow runtimes. For organizations bringing
their own cloud resources, the deployment integration object represents a managed Kubernetes cluster along with its associated nodes.
Users with the CREATE DATA PLANE INTEGRATION privilege on the Snowflake account can create and delete the deployment integration objects.
Additional privileges can be defined on deployment integration objects directly to support differentiation of access.
You can grant the following privileges on a deployment integration object:
- OWNERSHIP: Enables full control over deployment actions objects, including deletion of the deployment.
- USAGE: Enables creation of runtime child objects.
#### Runtime privileges
The runtime object represents a cluster of one or more Snowflake Openflow runtime servers, provisioned to run flow definitions. For Kubernetes deployments, the runtime object represents a stateful set of Snowflake Openflow runtime containers deployed in a namespace, along with supporting components.
Users with the OWNERSHIP privilege on the parent deployment integration object and the CREATE RUNTIME INTEGRATION account-level privilege can create runtime integration objects. Additional privileges can be defined on runtime integration objects directly to support differentiation of access.
You can grant the following privileges on a runtime integration object:
- OWNERSHIP: Enables full control over runtime actions, including deletion of the associated runtime and modification of runtime flow definitions.
- USAGE: Enables read access to the deployed runtime for observing health and status, without making any changes.
#### Snowflake role
A Snowflake role is a Snowflake role that is associated with a specific Openflow runtime and used for the following tasks:
- Grant access to Snowflake resources.
- Grant access to connector-specific resources
Snowflake roles are linked to Openflow Snowflake Managed Token, avoiding the need for customers to create separate service users and key pairs for authentication to Snowflake.
<RUNTIMENAME> denotes the name of the associated runtime.
To create a Snowflake role:
1. Create the required Snowflake role.
```sql
USE ROLE ACCOUNTADMIN;
CREATE ROLE IF NOT EXISTS OPENFLOW_RUNTIME_ROLE_
```
2. Grant the Snowflake role access to a warehouse.
Snowflake recommends using a dedicated warehouse for data ingestion.
This warehouse should be used when configuring your connectors for runtimes where you will be using this Snowflake role.
```sql
GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE OPENFLOW_RUNTIME_ROLE_;
```
3. Allow the Snowflake role to use, create or otherwise access Snowflake objects.
Depending on the Openflow connector being created the required underlying objects will vary.
The example below is for illustration purposes only.
```sql
GRANT USAGE ON DATABASE TO ROLE OPENFLOW_RUNTIME_ROLE_;
GRANT USAGE ON SCHEMA TO ROLE OPENFLOW_RUNTIME_ROLE_;
```
4. Allow the user to use the Snowflake role
```sql
GRANT ROLE OPENFLOW_RUNTIME_ROLE_ TO USER ;
```
#### Example for role setup
Consider a scenario where the following roles should be set up:
- **accountadmin:** Out-of-the box role from Snowflake, which has these two CREATE privileges:
- CREATE OPENFLOW DATA PLANE INTEGRATION
- CREATE OPENFLOW RUNTIME INTEGRATION
- **deployment_manager:** Can create, manage, and delete deployments.
- **deployment1_runtime_manager_1:** Can create a runtime only within deployment 1. It can modify and delete a runtime that it created within deployment 1, but not a runtime created by deployment1_runtime_manager_2.
- **deployment1_runtime_manager_2:** Can create a runtime only within deployment 1. It can modify and delete a runtime that it created within deployment 1, but not a runtime created by deployment1_runtime_manager_1.
- **deployment1_runtime_viewer_1:** Can view a runtime canvas within deployment 1 that was created by deployment1_runtime_manager_1.
- **deployment1_runtime_viewer_2:** Can view a runtime canvas within deployment 1 that was created by deployment1_runtime_manager_2.
- **deployment2_runtime_manager:** Can create a runtime only within deployment 2.
- **deployment2_runtime_viewer:** Can view a runtime canvas within deployment 2.
To set up Openflow with these roles, follow these steps:
1. Create new roles and assign the relevant privileges:
```sql
use role ACCOUNTADMIN;
create role if not exists deployment_manager;
create role if not exists deployment1_runtime_manager_1;
create role if not exists deployment1_runtime_manager_2;
create role if not exists deployment1_runtime_viewer_1;
create role if not exists deployment1_runtime_viewer_2;
create role if not exists deployment2_runtime_manager;
create role if not exists deployment2_runtime_viewer;
-- Assign create deployment privilege to roles. (This privilege cannot be granted in Openflow UI.)
grant create openflow data plane integration on account to role deployment_manager;
-- Assign create runtime privilege to roles. (This privilege cannot be granted in the Control Pane UI.)
grant create openflow runtime integration on account to role deployment1_runtime_manager_1;
grant create openflow runtime integration on account to role deployment1_runtime_manager_2;
grant create openflow runtime integration on account to role deployment2_runtime_manager;
-- Grant roles to users. (Repeat this step for each user.)
grant role to user ;
```
2. To create a deployment, follow these steps:
1. Sign in to Snowsight as deployment_manager.
2. In the navigation menu, select **Ingestion** %raa% **Openflow**.
3. To create deployment 1, select **Create a deployment**, and grant the USAGE privilege to deployment1_runtime_manager_1 and deployment1_runtime_manager_2.
4. To create deployment 2, select **Create a deployment**, and grant the USAGE privilege to deployment2_runtime_manager.
3. To create a runtime in deployment 1, follow these steps:
1. Log in as deployment1_runtime_manager_1.
2. Create a runtime as described in the following sections. deployment1_runtime_manager_1 should be able to create runtimes and manage any runtimes it created within this deployment.
3. In the Openflow UI, select deployment1_runtime_viewer_1 and grant it the USAGE privilege.
### AWS account
Ensure the following on your AWS account:
- You have an AWS account with permissions required to create a CloudFormation stack.
- An AWS administrator in your organization can execute CloudFormation script to set up Amazon Elastic Kubernetes Service (EKS) inside a new VPC (created by
CloudFormation) or an existing VPC. See [Prerequisites for BYO-VPC (existing VPC)](#prerequisites-for-byo-vpc-existing-vpc).
To learn about how the Openflow installation happens in your AWS account and the permissions that are configured by the CloudFormation template, see [Installation process](#installation-process).
#### Prerequisites for BYO-VPC (existing VPC)
If you want to use an existing VPC and your own subnets, ensure that you have the following:
- For Snowflake managed ingress, two public subnets with:
- Different availability zones
- At least /27 CIDR ranges with 32 available IPs.
- Routes for destination 0.0.0.0/0 and target internet gateway or some other egress routing to the internet.
- A tag that allows Openflow to create a load balancer:
- Key: `kubernetes.io/role/elb`
- Value: `1`
- If your public subnets are used by other EKS clusters, a tag that allows Openflow to create a load balancer alongside other load balancers:
- Key: `kubernetes.io/cluster/{deployment-key}`
- Value: `1`
Managing your own ingress eliminates the need for public subnets, but requires additional configuration in your AWS account.
For more information, see [](/user-guide/data-integration/openflow/setup-openflow-byoc-custom-ingress).
- Two private subnets with:
- Different availability zones
- At least /24 CIDR ranges with 255 available IPs. This limits the number and
scale of runtimes you can create, so it may be more appropriate to use a larger range for the deployment.
- Connectivity to Snowflake and AWS services from Private Subnet 1 where the Openflow deployment runs.
- Among many options, you can connect using route tables with a NAT Gateway, a Transit Gateway, or PrivateLink VPC Endpoints.
- Without this connectivity, the Openflow deployment will not initialize or set up properly and no infrastructure will be provisioned.
- For Snowflake managed ingress, egress connectivity to [LetsEncrypt.org](https://letsencrypt.org), which will provision a TLS certificate.
## Accept the Openflow terms of service
This step is only required once for your organization.
1. Sign in to Snowflake as a user with the ORGADMIN role.
2. In the navigation menu, select **Ingestion** %raa% **Openflow**.
3. Accept Openflow terms of services.
## Create a deployment in your cloud
### Configure the deployment in your Snowflake account
Sign in to Snowflake with a role that your ACCOUNTADMIN assigned for using Openflow.
This may not be any of the following roles: ACCOUNTADMIN, ORGADMIN, GLOBALORGADMIN, or SECURITYADMIN.
If you see a blank screen, or the error: "message: Invalid consent request", when logging into Openflow, change your role to a role that is not one of these listed roles.
For more information, see [Prerequisites](#prerequisites).
1. In the navigation menu, select **Ingestion** %raa% **Openflow**.
2. Select **Launch Openflow**.
3. In the Openflow UI, select **Create a deployment**.
4. On the **Deployments** tab, select **Create a deployment**.
The **Creating a deployment** wizard opens.
5. In the **Prerequisites** step, ensure that you meet all the requirements, and then select **Next**.
6. In the **Deployment location** step, select **Amazon Web Services** as the deployment location, enter a name for your deployment, and then select **Next**.
7. In the **Configuration** step, select one of the following options:
- **Managed VPC**: Choose this option if you want your VPC to be managed by Snowflake.
- **Bring your own VPC**: Choose this option if you want to use an existing VPC.
1. In the **PrivateLink** step, you can select if you want to establish communication with Snowflake over the private link.
Enabling this option requires additional setup in your AWS account. For more information, see [](/user-guide/admin-security-privatelink).
- If the **PrivateLink** option is enabled, the **End user authentication over PrivateLink** step displays.
- If enabled, browser-based authentication redirects use PrivateLink endpoints.
- If disabled, end-user authentication uses public Snowflake URLs.
Regardless of this setting, Deployment communications to Snowflake will use PrivateLink.
If you access %sf-web-interface% through a PrivateLink URL, ensure it is enabled.
If you access %sf-web-interface% through a non-PrivateLink URL, leave it disabled.
2. In the **Custom Ingress** step, you can choose to manage your own ingress configuration for the Openflow deployment, such as specifying custom security groups, load balancer settings, or other network controls.
Enabling this option requires additional setup in your AWS account. For more information, see [](/user-guide/data-integration/openflow/setup-openflow-byoc-custom-ingress).
3. Select **Create Deployment**.
4. Once your deployment is configured, a dialog box appears that lets you download the CloudFormation template to complete the setup process in your AWS account. Download this template. Note that Openflow doesn't support modifying the CloudFormation template. Don't modify any values after downloading the template, other than choosing drop-down options.
5. (Optional) To encrypt EBS volumes for your Openflow BYOC deployment, see [](/user-guide/data-integration/openflow/setup-openflow-byoc-encrypted-volumes).
### Apply the CloudFormation template in your AWS account
1. In your AWS account, create a new CloudFormation Stack using the template. After the Openflow deployment agent's Amazon Elastic Compute Cloud (EC2) instance is created, it completes the rest of the [Installation process](#installation-process) using infrastructure as code scripts.
You can track the installation progress as described in [Track the installation progress](#track-the-installation-progress).
If you're using an existing VPC, upon uploading the CloudFormation template, select the respective values in the drop-down lists for the two private subnets and your VPC.
### Create a network rule for Openflow in your Snowflake account
This step is required only if you're using network policies to control access to Snowflake. A network policy is a set of rules that control which IP addresses can access your Snowflake account.
1. Navigate to your Snowflake account.
2. Identify the NAT gateway public IP address that was created as part of the CloudFormation stack. You can find this either by searching for NAT Gateway on AWS console or checking the output of the CloudFormation stack.
The NAT gateway is responsible for Openflow egress for both the Data Plane Agent (DPA) and EKS. Both DPA and EKS run in the Private Subnet 1 of the installation.
3. Create a network rule for Openflow and add it to your existing network policy. Replace \{$NAT_GATEWAY_PUBLIC_IP\} in the following code snippet with the NAT gateway public IP address that was created as part of the CloudFormation stack.
```sql
USE ROLE ACCOUNTADMIN;
USE DATABASE {REPLACE_WITH_YOUR_DB_NAME};
CREATE NETWORK RULE allow_openflow_deployment
MODE = INGRESS
TYPE = IPV4
VALUE_LIST = ('{$NAT_GATEWAY_PUBLIC_IP}/32');
```
4. Find your currently active network policy.
```sql
SHOW PARAMETERS LIKE 'NETWORK_POLICY' IN ACCOUNT;
```
5. Copy the value column from the output, and use it to create a network rule:
```sql
ALTER NETWORK POLICY {ENTER_YOUR_ACTIVE_NETWORK_POLICY_NAME} ADD ALLOWED_NETWORK_RULE_LIST = (allow_openflow_deployment);
```
### Set up an event table to log Openflow events (required)
Use one of the following options to set up an event table:
- Create a new Openflow-specific event table (recommended):
```sql
USE ROLE ACCOUNTADMIN;
CREATE DATABASE IF NOT EXISTS openflow;
USE openflow;
CREATE SCHEMA IF NOT EXISTS openflow;
USE SCHEMA openflow;
GRANT CREATE EVENT TABLE
ON SCHEMA openflow.openflow
TO ROLE $role_of_deployment_owner;
USE ROLE $role_of_deployment_owner;
CREATE EVENT TABLE IF NOT EXISTS openflow.openflow.openflow_events;
-- Find the Data Plane Integrations
SHOW OPENFLOW DATA PLANE INTEGRATIONS;
ALTER OPENFLOW DATA PLANE INTEGRATION
$openflow_dataplane_name
SET EVENT_TABLE = 'openflow.openflow.openflow_events';
```
- Create an account-specific event table:
```sql
USE DATABASE openflow;
CREATE SCHEMA IF NOT EXISTS openflow.telemetry;
CREATE EVENT TABLE IF NOT EXISTS openflow.telemetry.events;
ALTER ACCOUNT SET EVENT_TABLE = openflow.telemetry.events;
```
- Use an existing account-specific event table:
```sql
USE ROLE ACCOUNTADMIN;
ALTER ACCOUNT SET EVENT_TABLE = 'existing_database.existing_schema.existing_event_table';
```
### Verify the deployment
1. In the navigation menu, select **Ingestion** %raa% **Openflow**. Creating a deployment takes about 45 minutes on AWS. Once it's created, you can view your deployment in the Deployments tab of Openflow UI with its state marked as **Active**.
## Create a runtime environment in your cloud
1. In **Openflow Control Plane**, select **Create a runtime**. The **Create Runtime** dialog box appears.
2. From the **Deployment** drop-down list, choose the deployment in which you want to create a runtime.
3. Enter a name for your runtime.
4. Choose a node type from the **Node type** drop-down list. This specifies the size of your nodes.
5. In the **Min/Max node** range selector, select a range. The minimum value specifies the
number of nodes that the runtime starts with when idle and the maximum value specifies the
number of nodes that the runtime can scale up to, in the event of high data volume or CPU load.
6. Select **Create**. The runtime takes a couple of minutes to get created.
Once created, you can view your runtime by navigating to the **Runtimes** tab of the Openflow control plane. Click the runtime to open the Openflow canvas.
## Next step
Deploy a connector in a runtime. For a list of connectors available in Openflow, see [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors).
## Networking considerations: Openflow EKS to source systems
For BYOC deployments, take note of the following considerations:
- Openflow CloudFormation stack creates one VPC with two public subnets and two private subnets.
- Public subnets host the AWS Network Load Balancer, which is created later. Private subnets host the EKS Cluster and all of the EC2 instances backing the node groups. Openflow runtimes run within Private subnet 1.
- NAT Gateway is currently the egress for both DPA and EKS. Both DPA and EKS run in the Private subnet 1 of the installation.
For BYO-VPC deployments, take note of the following considerations:
- Openflow requires you to enter the two private subnets that will run Openflow and two public subnets for the AWS Load Balancer.
- You have to provide your own egress routing to the Internet from those private subnets, which can be the central NAT Gateway.
- No Internet Gateway is created by Openflow. You have to provide appropriate public internet egress routing.
The network connectivity generally is as follows:
**An Openflow EC2 Instance** (Agent or EKS) runs in a **private subnet** that requires **Route Table entries** to send egress traffic to a **Transit Gateway**, a **PrivateLink VPC Endpoint**, or a **NAT Gateway** connected to an **Internet Gateway**.
### Example: BYOC deployment with a new VPC to communicate with RDS in a different VPC of the same account
To enable communication between the Openflow EKS cluster and the RDS instance, you need to create a new
security group, with the EKS cluster security group as the source for the inbound rule for RDS connectivity, and attach the group in RDS.
1. Find the EKS cluster security group, navigate to EKS and find your deployment key.
You can also find it on the Openflow UI by performing the following steps:
1. Sign in to Openflow.
2. Go to the **Deployments** tab.
3. Select the More options icon next to your deployment.
4. Select **View details**. The value in the field **Key** is your deployment key.
2. After finding the deployment key, you can use it to filter your AWS resources by the key value.
3. Create a new security group that allows access from the Openflow EKS cluster using the relevant database
port. For PostgreSQL the default port is 5432.
4. Attach it in RDS as a new security group.
If you need to troubleshoot, the [Reachability Analyzer](https://docs.aws.amazon.com/vpc/latest/reachability/getting-started.html) can be useful.
It will give you detailed information about what may be blocking connectivity by using tracing capabilities within the AWS platform.
See the following AWS docs for accessing DB instances using VPC peering and the associated security group configuration:
- [Scenarios for accessing a DB instance in a VPC - Amazon Relational Database Service](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.Scenarios.html#USER_VPC.Scenario3)
- [Update your security groups to reference peer security groups - Amazon Virtual Private Cloud](https://docs.aws.amazon.com/vpc/latest/peering/vpc-peering-security-groups.html)
## Configuring PrivateLink in AWS
This section explains how to access and configure Openflow using private connectivity.
### Access Openflow over PrivateLink
Before starting with the private link configuration, enable PrivateLink for your account as described in [](/user-guide/admin-security-privatelink).
1. Using the `ACCOUNTADMIN` role, call the `SYSTEM$GET_PRIVATELINK_CONFIG` function in your Snowflake account and identify the value for `openflow-privatelink-url`. This is the URL for accessing Openflow over PrivateLink.
2. Create a `CNAME` record in your DNS to resolve the URL value to your VPC endpoint.
3. Confirm that your DNS settings can resolve the value.
4. Confirm that you can connect to Openflow UI using this URL from your browser.
### Configure a new deployment using PrivateLink
Snowflake recommends that you use the **Bring your own VPC** version of Openflow deployment and create a VPC endpoint in your VPC before applying the CloudFormation template.
Before starting with the PrivateLink configuration, make sure that PrivateLink is enabled for your account as described in [](/user-guide/admin-security-privatelink).
Perform the following steps:
1. Retrieve Snowflake's VPC endpoint service ID and Openflow PrivateLink URLs:
1. Run the following SQL command using the `ACCOUNTADMIN` role:
```sql
SELECT SYSTEM$GET_PRIVATELINK_CONFIG()
```
2. From the output, identify and save the values for the following keys:
- `privatelink-vpce-id`
- `openflow-privatelink-url`
- `external-telemetry-privatelink-url`
2. Create a VPC endpoint with parameters:
- Type: **PrivateLink Ready partner services**
- Service: `privatelink-vpce-id` value obtained in the previous step.
- VPC: The VPC where your Openflow deployment will be running.
- Subnets: Select two availability zones and private subnets where your Openflow deployment will be running.
3. Set up Route 53 private hosted zone with the following parameters:
1. Domain: `privatelink.snowflakecomputing.com`
2. Type: **Private hosted zone**
3. Select the region and VPC where your Openflow deployment will be running.
4. Add two `CNAME` records for the URLs identified in the first step:
1. For `openflow-privatelink-url`
- Record name: `openflow-privatelink-url` value obtained in the first step
- Record type: `CNAME`
- Value: DNS name of your VPC endpoint
2. For `external-telemetry-privatelink-url`
- Record name: `external-telemetry-privatelink-url` value obtained in the first step
- Record type: `CNAME`
- Value: DNS name of your VPC endpoint
5. Create a dedicated security group for the deployment and enable traffic from the security group to the VPC endpoint:
1. Open the security group associated with your VPC endpoint.
2. Add an inbound rule to the security group that allows **All traffic** from the security group created for your deployment.
6. Create a new deployment and apply the CloudFormation Stack following the instructions in the [Create a deployment in your cloud](#create-a-deployment-in-your-cloud) section and ensure that:
- The **PrivateLink** option is enabled. The **End user authentication over PrivateLink** option can be either enabled or disabled.
- The security group created for the deployment is used when creating the CloudFormation stack.
7. Wait until the EKS cluster for your deployment is created. To confirm successful creation, navigate to AWS Console under **Elastic Kubernetes Service**. Verify that a cluster identified as `` displays status **ACTIVE**.
8. Allow for traffic from your EKS to the VPC endpoint:
1. Open the security group associated with your VPC endpoint.
2. Add an inbound rule to the security group that allows **All traffic** from the security group assigned to your EKS cluster. The EKS cluster's security group starts with `eks-cluster-sg--`.
### Configuring VPC Gateway Endpoints for S3 in AWS
Configuring an AWS VPC Gateway Endpoint for S3 is the primary method to allow an Agent EC2 instance in a private subnet to access the Amazon Linux 2023 repository privately,
without requiring an Internet Gateway, a NAT Gateway, or a public IP address on the instance. The Agent EC2 instance uses this repository to install its dependencies, for instance Docker.
To configure a VPC Gateway Endpoint for S3:
1. Open a browser to the AWS VPC dashboard.
2. In the navigation pane, select **Endpoints**.
3. Click **Create endpoint** and create a new VPC endpoint with parameters:
- Type: **AWS services**
- Service: `com.amazonaws..s3` of type `Gateway`
- VPC: Select the VPC of your deployment
- Route tables: Select the route table(s) that are associated with your private subnet(s)
- Policy: Choose **Full access**
## Configuring private deployments
Private deployments are a feature that allows you to deploy Openflow in a VPC without the need for public internet ingress or egress.
To configure private deployments, you need to choose the following options when creating a new deployment:
1. In the **Deployment location** step, select **Amazon Web Services** as the deployment location.
2. In the **VPC Configuration** step, select **Bring your own VPC** to use an existing VPC.
3. In the **PrivateLink** step, enable the PrivateLink feature. Enabling this option requires additional setup in your AWS account, see [Configuring PrivateLink in AWS](#configuring-privatelink-in-aws). The **End user authentication over PrivateLink** option can be either enabled or disabled.
4. In the **Custom ingress** step, enable the custom ingress feature. Enabling this option requires additional setup in your AWS account. For more information, see [](/user-guide/data-integration/openflow/setup-openflow-byoc-custom-ingress).
Private deployments require that your existing VPC is able to access the following domains:
- `*.amazonaws.com`, a detailed list of services being accessed includes:
- `com.amazonaws.iam`
- `com.amazonaws..s3`
- `com.amazonaws..ec2`
- `com.amazonaws..ecr.api`
- `com.amazonaws..ecr.dkr`
- `com.amazonaws..secretsmanager`
- `com.amazonaws..sts`
- `com.amazonaws..eks`
- `com.amazonaws..autoscaling`
- `*.privatelink.snowflakecomputing.com`
- `oidc-eks..api.aws`
- `shield.us-east-1.amazonaws.com`
## Installation process
Between the CloudFormation stack and the Openflow Agent, there are
several coordinated steps that the BYOC deployment installation process
manages. The goal is to separate responsibilities between a cold-start
that gives organizations an easy way to provide inputs to their BYOC
deployment (solved via CloudFormation), and the configuration of the
deployment and its core software components that will need to change
over time (solved by the Openflow Agent).
The deployment Agent facilitates the creation of the Openflow deployment infrastructure and
installation of the deployment software components including the deployment service. The deployment agent authenticates
with Snowflake System Image Registry to obtain Openflow container images.
The steps are as follows:
When using BYO-VPC, you will choose a VPC ID and two private subnet IDs from the template, and
the CloudFormation stack will use the selected ones rather than creating the resources mentioned in steps 1a, 1b, and 1c.
1. The CloudFormation template creates the following and configures with the AWS permissions mentioned in [Configured AWS permissions](#configured-aws-permissions):
1. One VPC with two public subnets and two private subnets. Public
subnets host the AWS Network Load Balancer (created later).
Private Subnets host the EKS cluster and all of the EC2 instances
backing the NodeGroups. Openflow runtimes run within a private
subnet.
2. Internet Gateway for egress from the VPC
3. NAT Gateway for egress from the private subnets
4. AWS Secrets Manager entry for the OIDC configuration input by the user
5. IAM role and instance profile for the Openflow Agent to use from its EC2 instance
6. An EC2 instance for Openflow deployment agent, complete with a UserData
script to automatically run the initialization process. This
script sets environment variables for the Openflow deployment agent to use,
derived from the input CloudFormation parameters.
7. EC2 Instance Connect endpoint for the Openflow deployment agent to upgrade
the deployment when needed.
- When using BYO-VPC, by default the CloudFormation stack will create an EC2 Instance Connect endpoint. However, this default behavior can be modified. When using the managed VPC option, the CloudFormation stack will always create an EC2 Instance Connect endpoint.
- The Instance Connect endpoint can be shared across many VPCs.
- If a deployment is deleted, along with deleting the CloudFormation stack, it will also remove the endpoint. This would block access to other BYO-VPC agents if the endpoint is shared.
- To add an EC2 Instance Connect endpoint, perform the following steps in your AWS account:
1. In the left navigation, navigate to **VPC** %raa% **Endpoints**.
2. Select **Create Endpoint**.
3. Choose the endpoint type as EC2 Instance Connect Endpoint.
4. Select a VPC. Leave all the security groups clear (not selected) to use the default VPC security group.
5. When selecting a subnet, use the same value as Private Subnet 1 in the CloudFormation parameters.
6. Select **Create**. It takes approximately 5 minutes for the endpoint to be created.
8. S3 Bucket that stores the Terraform state, logs, and outputs for
the Openflow Agent
2. The Openflow deployment agent creates the following:
1. An EKS cluster containing:
- Node groups
- Autoscaling groups
- AWS VPC Container Network Interface (CNI) add-on
- Amazon Elastic Block Store (EBS) CSI add-on
1. Secrets manager records for PostgreSQL, OAuth credentials, and so on.
2. IAM policies and roles for various K8s service accounts to
retrieve their secrets from AWS Secrets Manager.
3. K8s components
- Namespaces
- Cluster autoscaler
- EBS CSI expandable storage
- AWS Load Balancer Controller, which creates the publicly accessible Network Load Balancer
- Let's Encrypt certificate issuer
- Nginx Ingress, configured for Let's Encrypt
- Metrics Server
- Certificate manager from [Jetstack](http://jetstack.io/)
- [External secrets operator](http://external-secrets.io/)
- Service accounts for Temporal, deployment service, and OIDC
- Secrets stores for Temporal, deployment service, and OIDC
- External secrets for Temporal and deployment service. The external secret for OIDC is created and managed by the runtime operator.
- PostgreSQL
- Temporal
- Self-signed certificate issuer and ingress configuration for communications between runtime nodes
- Openflow runtime operator
- Openflow deployment service
By default, all AWS accounts have a quota of five Elastic IP addresses
per region, because public (IPv4) internet addresses are a scarce public
resource. Snowflake strongly recommends that you use Elastic IP
addresses primarily for their ability to remap the address to another
instance in the case of instance failure, and to use DNS hostnames for
all other inter-node communication.
### Track the installation progress
After the CloudFormation stack moves into the CREATE_COMPLETE state, the Openflow agent automatically creates the rest of the infrastructure.
There are a few steps that can take 10-15 minutes each, such as:
1. Creating the EKS cluster
2. Installing the EBS CSI add-on to the EKS cluster
3. Creating the RDS PostgreSQL database
Status reporting for the Openflow agent is not available yet. In the meantime, you
can view logs on the Openflow agent to verify whether the BYOC deployment is ready for runtimes. To do this, perform the following steps:
1. In the EC2 instances list, locate the following two instances:
- openflow-agent-\{data-plane-key\}: This is the Openflow agent that you will use to manage runtimes
- \{data-plane-key\}-mgmt-group: This is a node in the BYOC deployment's EKS cluster that runs an operator and other core software
2. Right-click on the openflow-agent-\{data-plane-key\} instance and select **Connect**.
3. Switch from **EC2 Instance Connect** to **Connect using EC2 Instance Connect Endpoint**. Leave the default EC2 Instance Connect Endpoint
in place.
4. Click **Connect**. A new browser tab or window will appear with a
command-line interface.
5. Run the following command to tail the installation logs of the docker image that is configuring your deployment:
```bash
journalctl -xe -f -n 100 -u docker
```
6. Once the installation is complete, you'll see the following output:
```text
{timestamp} - app stack applied successfully
{timestamp} - All resources applied successfully
```
### Configured AWS permissions
This section lists the AWS permissions configured by Openflow BYOC stack based on the roles.
\{key\} represents the deployment key that uniquely identifies cloud resources created and managed by Openflow for a particular deployment.
**Administrative user**
`cloudformation` and all of the following permissions.
**IAM Role: openflow-agent-role-\{key\}**
This role is assumed by the Openflow deployment agent EC2 instance through the instance profile `OpenflowAgentEC2InstanceProfile-{key}`. The following Openflow-managed policies are attached to the role.
Openflow-managed policy: `openflow-agent-ec2-policy-{key}`
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:CreateSecurityGroup",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RevokeSecurityGroupEgress",
"ec2:DeleteSecurityGroup"
],
"Condition": {
"StringLike": {
"aws:ResourceTag/Name": [
"{key}-*",
"k8s-traffic-{key}-*",
"eks-cluster-sg-{key}-*"
]
}
},
"Resource": "arn:aws:ec2:{Region}:{Account_ID}:security-group/*"
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateLaunchTemplateVersion",
"ec2:ModifyLaunchTemplate"
],
"Condition": {
"StringLike": {
"aws:ResourceTag/Name": "{key}-*-group"
}
},
"Resource": [
"arn:aws:ec2:{Region}:{Account_ID}:launch-template/*"
]
}
]
}
```
Openflow-managed policy: `openflow-agent-eks-policy-{key}`
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"eks:ListTagsForResource",
"eks:TagResource",
"eks:UntagResource",
"eks:UpdateClusterVersion",
"eks:UpdateNodegroupVersion"
],
"Resource": [
"arn:aws:eks:{Region}:{Account_ID}:cluster/{key}",
"arn:aws:eks:{Region}:{Account_ID}:nodegroup/{key}/*",
"arn:aws:eks:{Region}:{Account_ID}:addon/{key}/*"
]
},
{
"Effect": "Allow",
"Action": [
"eks:DescribeAddonVersions"
],
"Resource": "*"
}
]
}
```
Openflow-managed policy: `openflow-agent-iam-policy-{key}`
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"iam:TagRole",
"iam:UntagRole"
],
"Resource": [
"arn:aws:iam::{Account_ID}:role/{key}-*"
]
},
{
"Effect": "Allow",
"Action": [
"iam:ListOpenIDConnectProviderTags",
"iam:TagOpenIDConnectProvider",
"iam:UntagOpenIDConnectProvider"
],
"Resource": "arn:aws:iam::{Account_ID}:oidc-provider/oidc.eks.{Region}.amazonaws.com/id/*"
},
{
"Effect": "Allow",
"Action": [
"iam:CreatePolicy",
"iam:DeletePolicy",
"iam:DeletePolicyVersion",
"iam:GetPolicy",
"iam:GetPolicyVersion",
"iam:ListPolicyVersions",
"iam:CreatePolicyVersion",
"iam:TagPolicy",
"iam:UntagPolicy"
],
"Resource": [
"arn:aws:iam::{Account_ID}:policy/*-role-policy-{key}"
]
},
{
"Effect": "Allow",
"Action": [
"iam:AttachRolePolicy",
"iam:CreateRole",
"iam:UpdateRole",
"iam:DeleteRole",
"iam:DeleteRolePolicy",
"iam:DetachRolePolicy",
"iam:GetRolePolicy",
"iam:ListAttachedRolePolicies",
"iam:ListInstanceProfilesForRole",
"iam:ListRolePolicies",
"iam:PutRolePolicy",
"iam:TagRole",
"iam:UntagRole",
"iam:UpdateAssumeRolePolicy"
],
"Resource": [
"arn:aws:iam::{Account_ID}:role/*-role-{key}",
"arn:aws:iam::{Account_ID}:role/{key}-*"
]
}
]
}
```
Openflow-managed policy: `openflow-agent-misc-policy-{key}`
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:TagResource",
"secretsmanager:UntagResource"
],
"Resource": "arn:aws:secretsmanager:{Region}:{Account_ID}:secret:*-{key}*"
},
{
"Effect": "Allow",
"Action": [
"ssm:GetParameter"
],
"Resource": [
"arn:aws:ssm:{Region}::parameter/aws/service/eks/optimized-ami/*"
]
},
{
"Effect": "Allow",
"Action": [
"elasticloadbalancing:DeleteTargetGroup"
],
"Condition": {
"StringEquals": {
"aws:ResourceTag/elbv2.k8s.aws/cluster": "{key}"
}
},
"Resource": "arn:aws:elasticloadbalancing:{Region}:{Account_ID}:targetgroup/*/*"
},
{
"Effect": "Allow",
"Action": [
"elasticloadbalancing:DescribeListeners",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeTags",
"elasticloadbalancing:DescribeTargetGroups"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"elasticloadbalancing:DeleteLoadBalancer",
"elasticloadbalancing:SetSecurityGroups"
],
"Resource": "arn:aws:elasticloadbalancing:{Region}:{Account_ID}:loadbalancer/net/runtime-ingress-{key}*"
}
]
}
```
The following inline policies are also attached to the role.
Inline policy: `managed-policy-creation-permission`
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"iam:CreatePolicy",
"iam:DeletePolicy",
"iam:DeletePolicyVersion",
"iam:GetPolicy",
"iam:GetPolicyVersion",
"iam:ListPolicyVersions",
"iam:CreatePolicyVersion",
"iam:TagPolicy",
"iam:UntagPolicy"
],
"Resource": [
"arn:aws:iam::{Account_ID}:policy/openflow-agent-ec2-policy-{key}",
"arn:aws:iam::{Account_ID}:policy/openflow-agent-iam-policy-{key}",
"arn:aws:iam::{Account_ID}:policy/openflow-agent-eks-policy-{key}",
"arn:aws:iam::{Account_ID}:policy/openflow-agent-misc-policy-{key}"
]
}
]
}
```
Inline policy: `OpenflowAgentPolicy`
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"autoscaling:DescribeTags",
"ec2:DescribeImages",
"ec2:DescribeInstances",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeTags",
"ec2:DescribeVolumes",
"ec2:DescribeVpcs",
"ec2:DescribeVpcAttribute",
"iam:GetRole",
"iam:GetOpenIDConnectProvider",
"ec2:RunInstances",
"ec2:CreateLaunchTemplate",
"ec2:CreateSecurityGroup",
"ec2:CreateTags",
"ec2:DeleteTags"
],
"Resource": "*",
"Effect": "Allow"
},
{
"Condition": {
"StringLike": {
"aws:ResourceTag/Name": [
"{key}-oidc-provider"
]
}
},
"Action": [
"iam:CreateOpenIDConnectProvider",
"iam:DeleteOpenIDConnectProvider",
"iam:TagOpenIDConnectProvider",
"iam:UpdateOpenIDConnectProviderThumbprint"
],
"Resource": "arn:aws:iam::{Account_ID}:oidc-provider/oidc.eks.{Region}.amazonaws.com/id/*",
"Effect": "Allow"
},
{
"Action": [
"iam:CreatePolicy",
"iam:DeletePolicy",
"iam:DeletePolicyVersion",
"iam:GetPolicy",
"iam:GetPolicyVersion",
"iam:ListPolicyVersions",
"iam:CreatePolicyVersion",
"iam:TagPolicy",
"iam:UntagPolicy"
],
"Resource": [
"arn:aws:iam::{Account_ID}:policy/dp-service-role-policy-{key}",
"arn:aws:iam::{Account_ID}:policy/oauth2-role-policy-{key}",
"arn:aws:iam::{Account_ID}:policy/temporal-service-role-policy-{key}",
"arn:aws:iam::{Account_ID}:policy/oidc-service-role-policy-{key}",
"arn:aws:iam::{Account_ID}:policy/dps-temporal-role-policy-{key}",
"arn:aws:iam::{Account_ID}:policy/dps-postgres-role-policy-{key}",
"arn:aws:iam::{Account_ID}:policy/token-refresh-role-policy-{key}"
],
"Effect": "Allow"
},
{
"Action": [
"iam:AttachRolePolicy",
"iam:CreateRole",
"iam:UpdateRole",
"iam:DeleteRole",
"iam:DeleteRolePolicy",
"iam:DetachRolePolicy",
"iam:GetRolePolicy",
"iam:ListAttachedRolePolicies",
"iam:ListInstanceProfilesForRole",
"iam:ListRolePolicies",
"iam:PutRolePolicy",
"iam:TagRole",
"iam:UntagRole",
"iam:UpdateAssumeRolePolicy"
],
"Resource": [
"arn:aws:iam::{Account_ID}:role/openflow-agent-role-{key}",
"arn:aws:iam::{Account_ID}:role/{key}-*",
"arn:aws:iam::{Account_ID}:role/dps-temporal-role-{key}",
"arn:aws:iam::{Account_ID}:role/dps-postgres-role-{key}",
"arn:aws:iam::{Account_ID}:role/dp-service-role-{key}",
"arn:aws:iam::{Account_ID}:role/oauth2-role-{key}",
"arn:aws:iam::{Account_ID}:role/oidc-service-role-{key}",
"arn:aws:iam::{Account_ID}:role/token-refresh-role-{key}"
],
"Effect": "Allow"
},
{
"Action": [
"autoscaling:CreateOrUpdateTags",
"autoscaling:DeleteTags"
],
"Resource": "arn:aws:autoscaling:{Region}:{Account_ID}:autoScalingGroup:*:autoScalingGroupName/eks-{key}-*",
"Effect": "Allow"
},
{
"Condition": {
"StringLike": {
"aws:ResourceTag/Name": [
"{key}-EC2SecurityGroup-*",
"k8s-traffic-{key}-*",
"eks-cluster-sg-{key}-*",
"{key}-cluster-sg",
"{key}-custom-ingress-default-sg"
]
}
},
"Action": [
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RevokeSecurityGroupEgress",
"ec2:DeleteSecurityGroup",
"ec2:CreateTags",
"ec2:DeleteTags",
"ec2:CreateNetworkInterface",
"ec2:DeleteNetworkInterface"
],
"Resource": "arn:aws:ec2:{Region}:{Account_ID}:security-group/*",
"Effect": "Allow"
},
{
"Condition": {
"StringLike": {
"aws:ResourceTag/elbv2.k8s.aws/cluster": "{key}"
}
},
"Action": [
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:RevokeSecurityGroupEgress",
"ec2:DeleteSecurityGroup",
"ec2:CreateTags",
"ec2:DeleteTags",
"ec2:CreateNetworkInterface",
"ec2:DeleteNetworkInterface"
],
"Resource": "arn:aws:ec2:{Region}:{Account_ID}:security-group/*",
"Effect": "Allow"
},
{
"Action": [
"ec2:CreateSecurityGroup"
],
"Resource": "arn:aws:ec2:{Region}:{Account_ID}:vpc/{VPC_ID}",
"Effect": "Allow"
},
{
"Condition": {
"StringLike": {
"ec2:ResourceTag/Name": "openflow-agent-{key}"
}
},
"Action": [
"ec2:AttachNetworkInterface"
],
"Resource": "arn:aws:ec2:{Region}:{Account_ID}:instance/*",
"Effect": "Allow"
},
{
"Condition": {
"StringLike": {
"aws:ResourceTag/Name": "{key}-*-group"
}
},
"Action": [
"ec2:DeleteLaunchTemplate"
],
"Resource": "arn:aws:ec2:{Region}:{Account_ID}:launch-template/*",
"Effect": "Allow"
},
{
"Action": [
"eks:CreateCluster",
"eks:CreateAccessEntry",
"eks:CreateAddon",
"eks:CreateNodegroup",
"eks:DeleteCluster",
"eks:DescribeCluster",
"eks:ListClusters",
"eks:ListNodeGroups",
"eks:DescribeUpdate",
"eks:UpdateClusterConfig",
"eks:TagResource"
],
"Resource": "arn:aws:eks:{Region}:{Account_ID}:cluster/{key}",
"Effect": "Allow"
},
{
"Action": [
"eks:DescribeAddon",
"eks:DescribeAddonVersions",
"eks:UpdateAddon",
"eks:DeleteAddon",
"eks:DescribeUpdate"
],
"Resource": "arn:aws:eks:{Region}:{Account_ID}:addon/{key}/*",
"Effect": "Allow"
},
{
"Action": [
"eks:DeleteNodegroup",
"eks:DescribeNodegroup",
"eks:ListNodegroups",
"eks:UpdateNodegroupConfig",
"eks:TagResource",
"eks:DescribeUpdate"
],
"Resource": "arn:aws:eks:{Region}:{Account_ID}:nodegroup/{key}/*",
"Effect": "Allow"
},
{
"Action": [
"s3:CreateBucket",
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::byoc-tf-state-{key}-{Region}",
"Effect": "Allow"
},
{
"Action": [
"s3:DeleteObject",
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::byoc-tf-state-{key}-{Region}/*",
"Effect": "Allow"
},
{
"Action": [
"secretsmanager:CreateSecret",
"secretsmanager:DeleteSecret",
"secretsmanager:DescribeSecret",
"secretsmanager:GetResourcePolicy",
"secretsmanager:GetSecretValue",
"secretsmanager:PutSecretValue",
"secretsmanager:UpdateSecretVersionStage",
"secretsmanager:TagResource",
"secretsmanager:UntagResource"
],
"Resource": "arn:aws:secretsmanager:{Region}:{Account_ID}:secret:*-{key}*",
"Effect": "Allow"
},
{
"Condition": {
"StringLike": {
"iam:AWSServiceName": "eks.amazonaws.com"
}
},
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": "arn:aws:iam::*:role/aws-service-role/eks.amazonaws.com/AWSServiceRoleForAmazonEKS",
"Effect": "Allow"
},
{
"Condition": {
"StringLike": {
"iam:AWSServiceName": "eks-nodegroup.amazonaws.com"
}
},
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": "arn:aws:iam::*:role/aws-service-role/eks-nodegroup.amazonaws.com/AWSServiceRoleForAmazonEKSNodegroup",
"Effect": "Allow"
},
{
"Action": [
"eks:AssociateAccessPolicy",
"eks:ListAssociatedAccessPolicies",
"eks:DisassociateAccessPolicy"
],
"Resource": "arn:aws:eks:{Region}:{Account_ID}:access-entry/{key}/*",
"Effect": "Allow"
},
{
"Action": "iam:PassRole",
"Resource": "*",
"Effect": "Allow"
},
{
"Action": [
"iam:TagRole",
"iam:UntagRole"
],
"Resource": "arn:aws:iam::{Account_ID}:role/{key}-*",
"Effect": "Allow"
},
{
"Action": [
"iam:UntagOpenIDConnectProvider"
],
"Resource": "arn:aws:iam::{Account_ID}:oidc-provider/oidc.eks.{Region}.amazonaws.com/id/*",
"Effect": "Allow"
},
{
"Action": [
"eks:TagResource",
"eks:UntagResource",
"eks:UpdateNodegroupVersion"
],
"Resource": [
"arn:aws:eks:{Region}:{Account_ID}:cluster/{key}",
"arn:aws:eks:{Region}:{Account_ID}:nodegroup/{key}/*",
"arn:aws:eks:{Region}:{Account_ID}:addon/{key}/*"
],
"Effect": "Allow"
},
{
"Condition": {
"StringLike": {
"aws:ResourceTag/Name": "{key}-*-group"
}
},
"Action": [
"ec2:CreateLaunchTemplateVersion",
"ec2:ModifyLaunchTemplate"
],
"Resource": "arn:aws:ec2:{Region}:{Account_ID}:launch-template/*",
"Effect": "Allow"
},
{
"Action": [
"ssm:GetParameter"
],
"Resource": "arn:aws:ssm:{Region}::parameter/aws/service/eks/optimized-ami/*",
"Effect": "Allow"
}
]
}
```
**IAM Role: \{key\}-cluster-ServiceRole**
AWS-managed policies:
- AmazonEKSClusterPolicy
- AmazonEKSVPCResourceController
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"cloudwatch:PutMetricData"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ec2:DescribeAccountAttributes",
"ec2:DescribeAddresses",
"ec2:DescribeInternetGateways"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
```
**IAM Role: \{key\}-addon-vpc-cni-Role**
AWS-managed policies:
- AmazonEKS_CNI_Policy
**IAM Role: \{key\}-eks-role**
AWS-managed policies:
- AmazonEBSCSIDriverPolicy
- AmazonEC2ContainerRegistryReadOnly
- AmazonEKS_CNI_Policy
- AmazonEKSWorkerNodePolicy
- AmazonSSMManagedInstanceCore
- AutoScalingFullAccess
- ElasticLoadBalancingFullAccess
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ec2:CreateSecurityGroup",
"ec2:CreateTags"
],
"Effect": "Allow",
"Resource": [
"arn:aws:ec2:{Region}:{Account_ID}:security-group/*",
"arn:aws:ec2:{Region}:{Account_ID}:vpc/{VPC_ID}"
],
"Sid": "CreateOpenflowEKSSecurityGroupAndTags"
},
{
"Action": [
"ec2:AuthorizeSecurityGroupIngress",
"ec2:DeleteSecurityGroup"
],
"Condition": {
"StringLike": {
"aws:ResourceTag/Name": "eks-cluster-sg-{key}-*"
}
},
"Effect": "Allow",
"Resource": [
"arn:aws:ec2:{Region}:{Account_ID}:security-group/*"
],
"Sid": "OpenflowManageEKSSecurityGroup"
}
]
}
```
\{VPC_ID\} represents the identifier of the VPC that was either created by BYOC or used by BYO-VPC.
The following roles are used by Kubernetes service accounts to read their secrets from AWS Secrets Manager. Each role has a single Openflow-managed policy attached whose name matches the role name with a `-policy` suffix (for example, the `oidc-service-role-{key}` role uses the `oidc-service-role-policy-{key}` policy).
**IAM Role: oidc-service-role-\{key\}**
Openflow-managed policy: `oidc-service-role-policy-{key}`
```json
{
"Statement": [
{
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:GetResourcePolicy",
"secretsmanager:ListSecretVersionIds"
],
"Effect": "Allow",
"Resource": [
"arn:aws:secretsmanager:{Region}:{Account_ID}:secret:oidc-{key}*"
]
}
],
"Version": "2012-10-17"
}
```
**IAM Role: dps-postgres-role-\{key\}**
Openflow-managed policy: `dps-postgres-role-policy-{key}`
```json
{
"Statement": [
{
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:GetResourcePolicy",
"secretsmanager:ListSecretVersionIds"
],
"Effect": "Allow",
"Resource": [
"arn:aws:secretsmanager:{Region}:{Account_ID}:secret:postgres_creds-{key}*"
]
}
],
"Version": "2012-10-17"
}
```
**IAM Role: dps-temporal-role-\{key\}**
Openflow-managed policy: `dps-temporal-role-policy-{key}`
```json
{
"Statement": [
{
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:GetResourcePolicy",
"secretsmanager:ListSecretVersionIds"
],
"Effect": "Allow",
"Resource": [
"arn:aws:secretsmanager:{Region}:{Account_ID}:secret:temporal_creds-{key}*"
]
}
],
"Version": "2012-10-17"
}
```
**IAM Role: dp-service-role-\{key\}**
Openflow-managed policy: `dp-service-role-policy-{key}`
```json
{
"Statement": [
{
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:GetResourcePolicy",
"secretsmanager:ListSecretVersionIds"
],
"Effect": "Allow",
"Resource": [
"arn:aws:secretsmanager:{Region}:{Account_ID}:secret:dps_creds-{key}*",
"arn:aws:secretsmanager:{Region}:{Account_ID}:secret:snowflake-oauth2-{key}*"
]
}
],
"Version": "2012-10-17"
}
```
**IAM Role: oauth2-role-\{key\}**
Openflow-managed policy: `oauth2-role-policy-{key}`
```json
{
"Statement": [
{
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:GetResourcePolicy",
"secretsmanager:ListSecretVersionIds"
],
"Effect": "Allow",
"Resource": [
"arn:aws:secretsmanager:{Region}:{Account_ID}:secret:snowflake-oauth2-{key}*"
]
}
],
"Version": "2012-10-17"
}
```
**IAM Role: token-refresh-role-\{key\}**
Openflow-managed policy: `token-refresh-role-policy-{key}`
```json
{
"Statement": [
{
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:GetResourcePolicy",
"secretsmanager:ListSecretVersionIds"
],
"Effect": "Allow",
"Resource": [
"arn:aws:secretsmanager:{Region}:{Account_ID}:secret:snowflake-oauth2-{key}*"
]
}
],
"Version": "2012-10-17"
}
```
**IAM Role: \{key\}-nodegroup-NodeInstanceRole**
AWS-managed policies:
- AmazonEBSCSIDriverPolicy
- AmazonEC2ContainerRegistryReadOnly
- AmazonEKS_CNI_Policy
- AmazonEKSWorkerNodePolicy
- AmazonSSMManagedInstanceCore
- AutoScalingFullAccess
- ElasticLoadBalancingFullAccess
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"servicediscovery:CreateService",
"servicediscovery:DeleteService",
"servicediscovery:GetService",
"servicediscovery:GetInstance",
"servicediscovery:RegisterInstance",
"servicediscovery:DeregisterInstance",
"servicediscovery:ListInstances",
"servicediscovery:ListNamespaces",
"servicediscovery:ListServices",
"servicediscovery:GetInstancesHealthStatus",
"servicediscovery:UpdateInstanceCustomHealthStatus",
"servicediscovery:GetOperation",
"route53:GetHealthCheck",
"route53:CreateHealthCheck",
"route53:UpdateHealthCheck",
"route53:ChangeResourceRecordSets",
"route53:DeleteHealthCheck",
"appmesh:*"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeScalingActivities",
"autoscaling:DescribeTags",
"ec2:DescribeInstanceTypes",
"ec2:DescribeLaunchTemplateVersions"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeImages",
"ec2:GetInstanceTypesFromInstanceRequirements",
"eks:DescribeNodegroup"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"iam:CreateServiceLinkedRole"
],
"Condition": {
"StringEquals": {
"iam:AWSServiceName": "elasticloadbalancing.amazonaws.com"
}
},
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"ec2:DescribeAccountAttributes",
"ec2:DescribeAddresses",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInternetGateways",
"ec2:DescribeVpcs",
"ec2:DescribeVpcPeeringConnections",
"ec2:DescribeSubnets",
"ec2:DescribeSecurityGroups",
"ec2:DescribeInstances",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeTags",
"ec2:GetCoipPoolUsage",
"ec2:DescribeCoipPools",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeLoadBalancerAttributes",
"elasticloadbalancing:DescribeListeners",
"elasticloadbalancing:DescribeListenerCertificates",
"elasticloadbalancing:DescribeSSLPolicies",
"elasticloadbalancing:DescribeRules",
"elasticloadbalancing:DescribeTargetGroups",
"elasticloadbalancing:DescribeTargetGroupAttributes",
"elasticloadbalancing:DescribeTargetHealth",
"elasticloadbalancing:DescribeTags"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"cognito-idp:DescribeUserPoolClient",
"acm:ListCertificates",
"acm:DescribeCertificate",
"iam:ListServerCertificates",
"iam:GetServerCertificate",
"waf-regional:GetWebACL",
"waf-regional:GetWebACLForResource",
"waf-regional:AssociateWebACL",
"waf-regional:DisassociateWebACL",
"wafv2:GetWebACL",
"wafv2:GetWebACLForResource",
"wafv2:AssociateWebACL",
"wafv2:DisassociateWebACL",
"shield:GetSubscriptionState",
"shield:DescribeProtection",
"shield:CreateProtection",
"shield:DeleteProtection"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"ec2:AuthorizeSecurityGroupIngress",
"ec2:RevokeSecurityGroupIngress"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"ec2:CreateSecurityGroup"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"ec2:CreateTags"
],
"Condition": {
"Null": {
"aws:RequestTag/elbv2.k8s.aws/cluster": "false"
},
"StringEquals": {
"ec2:CreateAction": "CreateSecurityGroup"
}
},
"Effect": "Allow",
"Resource": "arn:aws:ec2:*:*:security-group/*"
},
{
"Action": [
"ec2:CreateTags",
"ec2:DeleteTags"
],
"Condition": {
"Null": {
"aws:RequestTag/elbv2.k8s.aws/cluster": "true",
"aws:ResourceTag/elbv2.k8s.aws/cluster": "false"
}
},
"Effect": "Allow",
"Resource": "arn:aws:ec2:*:*:security-group/*"
},
{
"Action": [
"ec2:AuthorizeSecurityGroupIngress",
"ec2:RevokeSecurityGroupIngress",
"ec2:DeleteSecurityGroup"
],
"Condition": {
"Null": {
"aws:ResourceTag/elbv2.k8s.aws/cluster": "false"
}
},
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"elasticloadbalancing:CreateLoadBalancer",
"elasticloadbalancing:CreateTargetGroup"
],
"Condition": {
"Null": {
"aws:RequestTag/elbv2.k8s.aws/cluster": "false"
}
},
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"elasticloadbalancing:CreateListener",
"elasticloadbalancing:DeleteListener",
"elasticloadbalancing:CreateRule",
"elasticloadbalancing:DeleteRule"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"elasticloadbalancing:AddTags",
"elasticloadbalancing:RemoveTags"
],
"Condition": {
"Null": {
"aws:RequestTag/elbv2.k8s.aws/cluster": "true",
"aws:ResourceTag/elbv2.k8s.aws/cluster": "false"
}
},
"Effect": "Allow",
"Resource": [
"arn:aws:elasticloadbalancing:*:*:targetgroup/*/*",
"arn:aws:elasticloadbalancing:*:*:loadbalancer/net/*/*",
"arn:aws:elasticloadbalancing:*:*:loadbalancer/app/*/*"
]
},
{
"Action": [
"elasticloadbalancing:AddTags",
"elasticloadbalancing:RemoveTags"
],
"Effect": "Allow",
"Resource": [
"arn:aws:elasticloadbalancing:*:*:listener/net/*/*/*",
"arn:aws:elasticloadbalancing:*:*:listener/app/*/*/*",
"arn:aws:elasticloadbalancing:*:*:listener-rule/net/*/*/*",
"arn:aws:elasticloadbalancing:*:*:listener-rule/app/*/*/*"
]
},
{
"Action": [
"elasticloadbalancing:ModifyLoadBalancerAttributes",
"elasticloadbalancing:SetIpAddressType",
"elasticloadbalancing:SetSecurityGroups",
"elasticloadbalancing:SetSubnets",
"elasticloadbalancing:DeleteLoadBalancer",
"elasticloadbalancing:ModifyTargetGroup",
"elasticloadbalancing:ModifyTargetGroupAttributes",
"elasticloadbalancing:DeleteTargetGroup"
],
"Condition": {
"Null": {
"aws:ResourceTag/elbv2.k8s.aws/cluster": "false"
}
},
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"elasticloadbalancing:AddTags"
],
"Condition": {
"Null": {
"aws:RequestTag/elbv2.k8s.aws/cluster": "false"
},
"StringEquals": {
"elasticloadbalancing:CreateAction": [
"CreateTargetGroup",
"CreateLoadBalancer"
]
}
},
"Effect": "Allow",
"Resource": [
"arn:aws:elasticloadbalancing:*:*:targetgroup/*/*",
"arn:aws:elasticloadbalancing:*:*:loadbalancer/net/*/*",
"arn:aws:elasticloadbalancing:*:*:loadbalancer/app/*/*"
]
},
{
"Action": [
"elasticloadbalancing:RegisterTargets",
"elasticloadbalancing:DeregisterTargets"
],
"Effect": "Allow",
"Resource": "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*"
},
{
"Action": [
"elasticloadbalancing:SetWebAcl",
"elasticloadbalancing:ModifyListener",
"elasticloadbalancing:AddListenerCertificates",
"elasticloadbalancing:RemoveListenerCertificates",
"elasticloadbalancing:ModifyRule"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"route53:ChangeResourceRecordSets"
],
"Effect": "Allow",
"Resource": "arn:aws:route53:::hostedzone/*"
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"route53:GetChange"
],
"Effect": "Allow",
"Resource": "arn:aws:route53:::change/*"
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"route53:ListResourceRecordSets",
"route53:ListHostedZonesByName"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ec2:CreateSnapshot",
"ec2:AttachVolume",
"ec2:DetachVolume",
"ec2:ModifyVolume",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInstances",
"ec2:DescribeSnapshots",
"ec2:DescribeTags",
"ec2:DescribeVolumes",
"ec2:DescribeVolumesModifications"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"ec2:CreateTags"
],
"Condition": {
"StringEquals": {
"ec2:CreateAction": [
"CreateVolume",
"CreateSnapshot"
]
}
},
"Effect": "Allow",
"Resource": [
"arn:aws:ec2:*:*:volume/*",
"arn:aws:ec2:*:*:snapshot/*"
]
},
{
"Action": [
"ec2:DeleteTags"
],
"Effect": "Allow",
"Resource": [
"arn:aws:ec2:*:*:volume/*",
"arn:aws:ec2:*:*:snapshot/*"
]
},
{
"Action": [
"ec2:CreateVolume"
],
"Condition": {
"StringLike": {
"aws:RequestTag/ebs.csi.aws.com/cluster": "true"
}
},
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"ec2:CreateVolume"
],
"Condition": {
"StringLike": {
"aws:RequestTag/CSIVolumeName": "*"
}
},
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"ec2:DeleteVolume"
],
"Condition": {
"StringLike": {
"ec2:ResourceTag/ebs.csi.aws.com/cluster": "true"
}
},
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"ec2:DeleteVolume"
],
"Condition": {
"StringLike": {
"ec2:ResourceTag/CSIVolumeName": "*"
}
},
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"ec2:DeleteVolume"
],
"Condition": {
"StringLike": {
"ec2:ResourceTag/kubernetes.io/created-for/pvc/name": "*"
}
},
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"ec2:DeleteSnapshot"
],
"Condition": {
"StringLike": {
"ec2:ResourceTag/CSIVolumeSnapshotName": "*"
}
},
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"ec2:DeleteSnapshot"
],
"Condition": {
"StringLike": {
"ec2:ResourceTag/ebs.csi.aws.com/cluster": "true"
}
},
"Effect": "Allow",
"Resource": "*"
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"route53:ChangeResourceRecordSets"
],
"Effect": "Allow",
"Resource": "arn:aws:route53:::hostedzone/*"
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"route53:ListHostedZones",
"route53:ListResourceRecordSets",
"route53:ListTagsForResource"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
```
---
title: Set up Openflow - Snowflake Deployment - Task overview
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs.md
section: Loading & Unloading Data
---
# Set up Openflow - Snowflake Deployment - Task overview
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/about-spcs)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/monitor)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
To setup an %ofsfspcs%, perform the following tasks:
| Order |
Task |
Description |
Persona |
| 1 |
[Setup core Snowflake](/user-guide/data-integration/openflow/setup-openflow-spcs-sf) |
Before creating a deployment, you must configure core Snowflake which include
an Openflow admin role, required privileges, and network configuration.
|
Snowflake administrator |
| 2 |
Optionally [Set up PrivateLink UI access](/user-guide/data-integration/openflow/setup-openflow-spcs-configure-pr-ui) |
Configure PrivateLink to access the Snowflake Openflow Runtime UI using private connectivity. |
Snowflake administrator |
| 3 |
[Create deployment](/user-guide/data-integration/openflow/setup-openflow-spcs-deployment) |
After configuring core Snowflake, you then create an Openflow deployment.
Optionally, configure a Openflow-specific event table to store Openflow logs and metrics.
|
Deployment engineer, Snowflake administrator for event table configuration |
| 4 |
[Create Snowflake role](/user-guide/data-integration/openflow/setup-openflow-spcs-create-rr) |
After creating an %ofsfspcs%, you must create a Snowflake role and associated external access integrations. |
Data engineer |
| 5 |
[Create runtime](/user-guide/data-integration/openflow/setup-openflow-spcs-create-runtime) |
Create a runtime associated with the previously created Snowflake role. |
Data engineer |
| 6 |
[Configure allowed domains for Openflow connectors](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) |
Configure access to external domains for Openflow connectors. |
Data engineer |
| 7 |
[Connect your data sources using Openflow connectors](/user-guide/data-integration/openflow/connectors/about-openflow-connectors) |
Configure one or more connectors in the %ofsfspcs%. |
Data engineer |
Note that steps 3, 4 and 5 are typically repeated for each connector you want to configure in a given deployment.
## Next steps
[](/user-guide/data-integration/openflow/setup-openflow-spcs-sf)
---
title: Set up Openflow - Snowflake Deployment: Configure allowed domains for Openflow connectors
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list.md
section: Loading & Unloading Data
---
" />
# Set up %ofsfspcs%: Configure allowed domains for Openflow connectors
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/about-spcs)
- [](/user-guide/data-integration/openflow/setup-openflow-spcs)
%ofsfspcs-plural% access external domain resources. Snowflake controls access to external domains
using [network rules](/user-guide/network-rules) and
[external access integrations](/developer-guide/external-network-access/creating-using-external-network-access)
to either grant or deny access to specific domains.
This topic describes the process of [creating a network rule](/sql-reference/sql/create-network-rule)
and [creating an external access integration](/sql-reference/sql/create-external-access-integration) to grant access to a specific domain.
In addition, the known domains used by Openflow connectors are provided.
Two possible workflows exist for managing access to external domains:
- [Create a new network rule and external access integration](#label-openflow-create-new-network-rule-grant-domain-access): Create a new network rule that defines a list of allowed domain/port combinations
and create a new external access integration using the newly created network rule.
- [Alter an existing network rule](#label-openflow-alter-existing-network-rule-grant-domain-access): Alter an existing network rule
to add a list of allowed domain/port combinations.
## Create a network rule granting access to one or more domains
To create a new network rule that grants access to one or more domain/port combinations,
execute an SQL statement similar to:
```sql
USE ROLE SECURITYADMIN;
CREATE NETWORK RULE MY_OPENFLOW_NETWORK_RULE
TYPE = HOST_PORT
MODE = EGRESS
VALUE_LIST = ('', '');
```
For example, to allow Snowflake to access `googleads.googleapis.com`, execute the following.
```sql
USE ROLE SECURITYADMIN;
CREATE NETWORK RULE GOOGLEADS_OPENFLOW_NETWORK_RULE
TYPE = HOST_PORT
MODE = EGRESS
VALUE_LIST = ('googleads.googleapis.com');
```
For more information, see [](/sql-reference/sql/create-network-rule).
After the network rule is created, a external access integration has to be created.
To create a new integration, execute an SQL statement similar to:
```sql
USE ROLE SECURITYADMIN;
CREATE EXTERNAL ACCESS INTEGRATION MY_OPENFLOW_EAI
ALLOWED_NETWORK_RULES = (MY_OPENFLOW_NETWORK_RULE)
ENABLED = TRUE
COMMENT = 'External Access Integration for Openflow connectivity';
```
## Alter an existing network rule granting access to one or more domains
To alter an existing network rule to grant access to one or more domain/port combinations,
execute an SQL statement similar to:
```sql
USE ROLE SECURITYADMIN;
ALTER NETWORK RULE GOOGLEADS_OPENFLOW_NETWORK_RULE SET
VALUE_LIST = ('', '', 'googleads.googleapis.com');
```
For more information, see [](/sql-reference/sql/alter-network-rule).
Use [](/sql-reference/sql/show-network-rules) to list the existing network rules.
Use [](/sql-reference/sql/desc-network-rule) to describe the properties of a specific network rule.
If the altered network rule is already associated with an external access integration, it will be updated automatically.
If you do not have an external access integration for the altered network rule,
refer to the section above for instructions on creating a new integration.
## Next steps
1. Associate an external access integration with your runtime:
1. Navigate to the Openflow canvas.
2. Select the **Runtimes** tab.
3. For the runtime which requires the new external access integration,
click the %vertical-more-icon% menu.
4. Select **External access integrations**.
5. Select all required external access integrations from the dropdown list.
Note you may select multiple external access integrations.
6. Click **Save**.
Restarting the runtime is not required and the changes are applied immediately.
2. Deploy a connector in a runtime, for a list of connectors available in Openflow, see [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors).
## Domains used by Openflow connectors
The following domains are used by Openflow connectors and require network rules to be granted access.
### Amazon Ads
The following domains are used by the Amazon Ads connector.
- `advertising-api.amazon.com`
- `advertising-api-eu.amazon.com`
- `advertising-api-fe.amazon.com`
- `api.amazon.com`
- `api.amazon.co.uk`
- `api.amazon.co.jp`
- Report location.
For example, `offline-report-storage-eu-west-1-prod.s3.eu-west-1.amazonaws.com` is used to download reports.
The exact report URL location is not always known before creating a report.
Snowflake recommends allow listing all s3 regions:
- `*.s3.eu-west-[1-3].amazonaws.com`
- `*.s3.eu-central-[1-2].amazonaws.com`
- `*.s3.eu-north-1.amazonaws.com`
- `*.s3.eu-south-[1-2].amazonaws.com`
- `*.s3.il-central-1.amazonaws.com`
- For advertising-api-fe.amazon.com (Far East / APAC):
- `*.s3.ap-northeast-[1-3].amazonaws.com`
- `*.s3.ap-south-[1-2].amazonaws.com`
- `*.s3.ap-southeast-[1-7].amazonaws.com`
- `*.s3.ap-east-[1-2].amazonaws.com`
- `*.s3.me-south-1.amazonaws.com`
- `*.s3.me-central-1.amazonaws.com`
- `*.s3.af-south-1.amazonaws.com`
The last domain is obtained from the report URL is returned after the report is ready to fetch.
This is an Amazon S3 bucket where the report is stored. Customers will need to specify their own AWS region.
for example, `us-east-1` or `eu-west-1` and a specific bucket. As it may be not possible to know the
exact region and bucket, Snowflake suggests using wildcards and listing all possible regions for a given location.
### AWS Secret Manager
The following domains are used by the AWS Secret Manager connector.
- `secretsmanager.us-west-2.amazonaws.com`
- `sts.us-west-2.amazonaws.com`
- `aws.amazon.com`
- `amazonaws.com`
### Box
The following domains are used by the Box connector.
- `api.box.com`
- `box.com`
### Confluence
The following domains are used by the Confluence connector.
- Customer-specific domain name, such as `https://company-name.atlassian.net/`.
- For OAuth, [https://atlassian.company-name.com/](https://atlassian.company-name.com/)
### Microsoft Dataverse
The following domains are used by the Dataverse connector.
- Customer-specific domain name, such as `org12345467.crm.dynamics.com`
- For OAuth, `login.microsoftonline.com`
### Google Ads
The following domains are used by the Google Ads connector.
- `googleads.googleapis.com`
### Google Drive
The following domains are used by the Google Drive connector:
- `drive.google.com`
- `www.googleapis.com`
- `oauth2.googleapis.com`
- `www.googleapis.com`
### Google Sheets
The following domains are used by the Google Sheets connector.
- `sheets.googleapis.com`
### Hubspot
The following domains are used by the HubSpot connector.
- `api.hubapi.com`
### Jira Cloud
The following domains are used by the Jira Cloud connector.
- Customer-specific domain name, for example `company-name.atlassian.net`
- `api.atlassian.com`
### Kafka
The following domains are used by the Kafka connector.
- Customer Kafka bootstrap servers and all Kafka brokers
### Kinesis
The following domains are used by the Kinesis connector.
- AWS region dependent. For example:
for us-west-2:
- `kinesis.us-west-2.amazonaws.com`
- `kinesis-fips.us-west-2.api.aws`
- `kinesis-fips.us-west-2.amazonaws.com`
- `kinesis.us-west-2.api.aws`
- `*.control-kinesis.us-west-2.amazonaws.com`
- `*.control-kinesis.us-west-2.api.aws`
- `*.data-kinesis.us-west-2.amazonaws.com`
- `*.data-kinesis.us-west-2.api.aws`
- `dynamodb.us-west-2.amazonaws.com`
### LinkedIn Ads
The following domains are used by the LinkedIn Ads connector.
- `www.linkedin.com`
- `api.linkedin.com`
### Meta Ads
The following domains are used by the Meta Ads connector.
- `graph.facebook.com`
### MySQL
The following domains are used by the MySQL connector.
- Customer-specific domain and port combination.
### PostgreSQL
The following domains are used by the PostgreSQL connector.
- Customer-specific domain and port combination.
### SharePoint
The following domains are used by the SharePoint connector.
- Customer-specific domain—for example, `company-domain.sharepoint.com` or an alias that redirects to `company-domain.sharepoint.com`
- `graph.microsoft.com:80`
- `graph.microsoft.com:443`
- `login.microsoftonline.com`
### Slack
The following domains are used by the Slack connector.
- `slack.com`
- `api.slack.com`
- `hooks.slack.com`
- `files.slack.com`
- `wss-primary.slack.com`
- `wss-backup.slack.com`
### SQL Server
The following domains are used by the SQL Server connector.
- Customer-specific domain and port combination.
### Workday
The following domains are used by the Workday connector.
- Customer-specific domain and port combination. For example, `company-domain.tenant.myworkday.com`.
To obtain the domain, you can use the report URL (base URL is always the same).
---
title: Set up Openflow - Snowflake Deployment: Core Snowflake
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs-sf.md
section: Loading & Unloading Data
---
# Set up %ofsfspcs%: Core Snowflake
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/about-spcs)
- [](/user-guide/data-integration/openflow/setup-openflow-spcs)
%ofsfspcs% requires the creation of the following Snowflake specific resources:
1. [Create the OPENFLOW_ADMIN role](#create-the-openflow-admin-role)
2. [Configure required privileges](#configure-required-privileges)
To complete these tasks, Sign in to %sf-web-interface-link% and open a SQL worksheet.
## Create the OPENFLOW_ADMIN role
Create the required Openflow administration role.
`` denotes the user that will be used to access Openflow.
```sql
USE ROLE ACCOUNTADMIN;
CREATE ROLE IF NOT EXISTS OPENFLOW_ADMIN;
GRANT ROLE OPENFLOW_ADMIN TO USER ;
```
Users with a default role of ACCOUNTADMIN can't login to %ofsfspcs% runtimes and will get an error message when attempting to do so.
Snowflake recommends assigning a different default role to any user that will login to a runtime.
In addition, Snowflake recommends setting default secondary roles to `ALL` for all Openflow users.
To change the default role and enable all secondary roles, execute the following:
For example:
```sql
USE ROLE ACCOUNTADMIN;
ALTER USER SET DEFAULT_ROLE = ;
ALTER USER SET DEFAULT_SECONDARY_ROLES = ('ALL');
```
## Configure required privileges
Openflow requires defining specific Snowflake Account level privileges.
These privileges are assigned to the ACCOUNTADMIN role as part of the default set of privileges.
ACCOUNTADMIN will automatically have the following privileges and will be able to grant them
to a role of their choosing for the Openflow admin role, shown as `OPENFLOW_ADMIN` role in the following example:
```sql
USE ROLE ACCOUNTADMIN;
GRANT CREATE OPENFLOW DATA PLANE INTEGRATION ON ACCOUNT TO ROLE OPENFLOW_ADMIN;
GRANT CREATE OPENFLOW RUNTIME INTEGRATION ON ACCOUNT TO ROLE OPENFLOW_ADMIN;
GRANT CREATE COMPUTE POOL ON ACCOUNT TO ROLE OPENFLOW_ADMIN;
```
## Next steps
Optionally, [Set up PrivateLink UI access](/user-guide/data-integration/openflow/setup-openflow-spcs-configure-pr-ui) to access the Snowflake Openflow Runtime UI using private connectivity.
[Create deployment](/user-guide/data-integration/openflow/setup-openflow-spcs-deployment)
---
title: Set up Openflow - Snowflake Deployment: Create deployment
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs-deployment.md
section: Loading & Unloading Data
---
# Set up %ofsfspcs%: Create deployment
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
After configuring core Snowflake, create an Openflow deployment. A deployment is the
control plane component that manages your runtimes and connectors. Each deployment can host
multiple runtimes, and each runtime can run multiple connectors, giving you flexibility to
isolate workloads by project, team, or environment. There is no separate charge for the
deployment itself; only active runtimes consume Snowflake credits.
1. [Create a deployment](#label-openflow-spcs-create-deployment) - create the deployment itself.
2. [[Optional] Configure an Openflow-specific event table](#label-openflow-spcs-event-table) - configure an Openflow-specific event table to store Openflow logs and metrics.
## Create a deployment
To access the Openflow Runtime UI using PrivateLink as described in [Setup PrivateLink UI access](/user-guide/data-integration/openflow/setup-openflow-spcs-configure-pr-ui),
ensure the **PrivateLink** option is enabled when creating a new %ofsfspcs%.
1. Sign in to %sf-web-interface-link% with a role defined in [Configure core Snowflake requirements](/user-guide/data-integration/openflow/setup-openflow-spcs-sf).
2. In the navigation menu, select **Ingestion** %raa% **Openflow**.
3. Select **Launch Openflow**.
4. In the Openflow UI, select **Create a deployment**. The **Deployments** tab opens.
5. Select **Create a deployment**. The Creating a deployment wizard opens.
6. In the **Prerequisites** step, ensure that you meet all the requirements. Select **Next**.
7. In the **Deployment location** step, select **Snowflake** as the deployment location.
Enter a name for your deployment. Select **Next**.
8. Select **Create Deployment**.
Your deployment will then be created.
## [Optional] Configure an Openflow-specific event table
Openflow generates logs and metrics and sends them to the Snowflake Event Table.
For helpful queries to analyze this telemetry data, see [Monitor Openflow](/user-guide/data-integration/openflow/monitor).
By default, Openflow uses the [account event table](#label-logging-event-table-default) (SNOWFLAKE.TELEMETRY.EVENTS), but you can configure an Openflow-specific event table per deployment. A dedicated event table is recommended to optimize query performance, enable granular access control, and simplify Openflow monitoring and maintenance.
1. To store the event table outside the Openflow database, grant the OPENFLOW_ADMIN role
access to the `` and `` where you want to store it:
```sql
USE ROLE ACCOUNTADMIN;
GRANT USAGE ON DATABASE TO ROLE OPENFLOW_ADMIN;
GRANT USAGE ON SCHEMA . TO ROLE OPENFLOW_ADMIN;
```
2. Create the event table:
```sql
USE ROLE OPENFLOW_ADMIN;
CREATE EVENT TABLE IF NOT EXISTS ..EVENTS;
```
3. Get your dataplane name, which you use in the next step, from the `name` column:
```sql
SHOW OPENFLOW DATA PLANE INTEGRATIONS;
```
4. Set the event table for this deployment, replacing `` with the value from the previous step:
```sql
ALTER OPENFLOW DATA PLANE INTEGRATION
SET EVENT_TABLE = '..EVENTS';
```
## [Optional] Create a monitoring role
A monitoring role lets data engineers or operations teams monitor Openflow without having the OPENFLOW_ADMIN role.
- To create a monitoring role, run the following code:
```sql
USE ROLE OPENFLOW_ADMIN;
-- Create a role for monitoring Openflow deployments and runtimes if it doesn't yet exist
CREATE ROLE IF NOT EXISTS ;
GRANT MONITOR ON INTEGRATION TO ROLE ;
-- Add to role hierarchy so administrators can manage objects owned by this role
GRANT ROLE TO ROLE ;
-- Grant the role to the appropriate Snowflake users
GRANT ROLE TO USER ;
```
### Next steps
[Create Snowflake role](/user-guide/data-integration/openflow/setup-openflow-spcs-create-rr)
---
title: Set up Openflow - Snowflake Deployment: Create runtime
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs-create-runtime.md
section: Loading & Unloading Data
---
# Set up %ofsfspcs%: Create runtime
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/about-spcs)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/monitor)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
A runtime is a containerized Apache NiFi instance that executes your data integration flows –
connectors and custom flow definitions. Each runtime is isolated for security and resource
control, and can scale from one node up to fifty to handle varying data volumes.
To create a runtime in your Snowflake deployment:
1. Sign in to %sf-web-interface-link%.
2. In the navigation menu, select **Ingestion** %raa% **Openflow**.
3. Select **Launch Openflow**. A new tab opens for the Openflow canvas.
4. In **Openflow Control Plane**, select **Create a runtime**. The **Create Runtime** dialog box appears.
5. In the **Create Runtime** populate the following fields:
| Field |
Description |
| **Runtime Name** |
Enter a name for your runtime. |
| **Deployment** drop down |
Choose the deployment previously created in [](/user-guide/data-integration/openflow/setup-openflow-spcs-deployment) |
| **Node Type** |
Choose a node type from the **Node type** drop-down list. This specifies the size of your nodes. |
| **Min/Max node** |
In the **Min/Max node** range selector, select a range. The minimum value specifies the number of nodes that the runtime starts with when idle and the maximum value specifies the number of nodes that the runtime can scale up to, in the event of high data volume or CPU load. |
| **Snowflake Role** |
Choose the Snowflake role previously created in [](/user-guide/data-integration/openflow/setup-openflow-spcs-create-rr). |
| **Usage Roles** |
Optionally, select the roles created to grant usage to the runtime for required databases, schema, and table access. |
| **External Access Integrations** |
Optionally, select the previously created external access integrations to grant access to external resources. |
6. Select **Create**. The runtime takes a couple of minutes to be created.
Once created, view your runtime by navigating to the **Runtimes** tab of the Openflow control plane.
Select the runtime to open the Openflow canvas.
## [Optional] Grant MONITOR privileges on the runtime
If you created a [monitoring role](#label-openflow-spcs-monitoring-role) when setting up your deployment, you can add the runtime to that role. This allows data engineers or operations teams to monitor the runtime without having the OPENFLOW_ADMIN role.
- To add the runtime to the monitoring role, run the following code, replacing `` with the name of the Openflow runtime integration:
```sql
USE ROLE OPENFLOW_ADMIN;
GRANT MONITOR ON INTEGRATION TO ROLE ;
```
## Next step
Configure allowed domains for Openflow connectors.
See [](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list).
---
title: Set up Openflow - Snowflake Deployment: Create Snowflake role
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs-create-rr.md
section: Loading & Unloading Data
---
# Set up %ofsfspcs%: Create Snowflake role
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/about-spcs)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/monitor)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
%ofsfspcs% requires the creation of a number of resources which are specific not to a deployment
but to a specific runtime. Typically such resources include:
- Creation of Runtime specific Snowflake role
- Creation of Runtime specific network rules and External Access Integrations (EAI)
This topic describes the creation of these resources.
1. Create a Snowflake Role and associated privileges to write data to Snowflake Role for Runtimes on Snowflake Deployment Section
2. Associate Snowflake Role. See Snowflake Role for Runtimes in the Snowflake Deployment Section.
3. Create External Access Integrations and associate them to Runtimes.
See [Creating External Access Integrations](#label-create-network-rules-and-external-access-integrations)
4. When Outbound PrivateLink connectivity is required to connect to a private system using SPCS Egress.
## Create a Snowflake role
When creating and editing Openflow Runtimes, Runtime Owners will have the ability to associate a role with the Runtime.
This role will be used for flows that execute within the Runtime.
For more information about Snowflake Roles, see [](#label-openflow-spcs-what-is-runtime-role).
Creating a Snowflake role is a prerequisite for creating a Runtime and involves the following steps:
1. Create the role itself
2. Grant the role access to the warehouse used by the Runtime.
3. Grant the role access to the Snowflake objects used by the Runtime.
4. Grant the role access to the External Access Integrations used by the Runtime.
To create a Snowflake role:
1. Create the required Snowflake role.
`` denotes the name of the associated runtime.
```sql
USE ROLE ACCOUNTADMIN;
CREATE ROLE IF NOT EXISTS OPENFLOW_RUNTIME_ROLE_;
GRANT ROLE OPENFLOW_RUNTIME_ROLE_ TO USER ;
```
2. Allow the Snowflake role to use an existing warehouse that you are planning to use for data ingestion.
Use this warehouse later when configuring your connectors for runtimes where you will be using this Snowflake role.
```sql
GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE OPENFLOW_RUNTIME_ROLE_;
```
3. Allow the Snowflake role to use, create or otherwise access Snowflake objects.
Depending on the Openflow connector being created the required underlying objects will vary.
The example below is for illustration purposes only.
```sql
GRANT USAGE ON DATABASE TO ROLE OPENFLOW_RUNTIME_ROLE_;
GRANT USAGE ON SCHEMA TO ROLE OPENFLOW_RUNTIME_ROLE_;
```
### Creating Network Rules and External Access Integrations
Snowflake's security model provides secure access to specific endpoints and systems
external to Snowflake using [network policies](/user-guide/network-policies).
Two key aspects of network policies are [](/user-guide/network-rules) and
[External Access Integrations (EAI)](/developer-guide/external-network-access/external-network-access-overview).
Each of which is used to provide secure access to external resources required by the runtime.
There are three steps that are required to create network rules and external access integrations:
1. Create the network rule, grouping the network identifiers into logical areas.
2. Create the external access integration (EAI), specifying the list of network rules and assuring the Snowflake Role has USAGE on the EAI.
3. Associate the EAI with the Runtime in the Openflow UI when creating Runtimes.
To create the required network rule and EAI, perform the following steps:
These examples use RUNTIME_NAME as a placeholder for the name of the Runtime being created.
1. Create an appropriate network rule. See [](/sql-reference/sql/create-network-rule) for more information.
`` denotes the name of the database that will contain the network rule.
Snowflake suggests creating a specific database for network rules and external access integrations related to Openflow.
```sql
USE DATABASE ;
CREATE NETWORK RULE IF NOT EXISTS OPENFLOW__NETWORK_RULE
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('comma separated list of host:port pairs');
```
2. Create an external access integration, or add the network rule to an existing one.
See [](/sql-reference/sql/create-external-access-integration) for more information.
To create a new EAI:
```sql
USE ROLE ACCOUNTADMIN;
CREATE EXTERNAL ACCESS INTEGRATION IF NOT EXISTS OPENFLOW__EAI
ALLOWED_NETWORK_RULES = (OPENFLOW__NETWORK_RULE)
ENABLED = TRUE;
```
To add the network rule to an existing EAI, first check which rules are already
associated with it, then update the EAI to include both the existing and new rules:
```sql
USE ROLE ACCOUNTADMIN;
-- Check the current rules on the EAI
DESCRIBE EXTERNAL ACCESS INTEGRATION OPENFLOW__EAI;
```
In the output, find the `ALLOWED_NETWORK_RULES` property and note the existing rules.
Then update the EAI, listing all existing rules along with the new one:
```sql
ALTER EXTERNAL ACCESS INTEGRATION OPENFLOW__EAI
SET ALLOWED_NETWORK_RULES = (
,
,
OPENFLOW__NETWORK_RULE
);
```
3. Grant access to the EAI to the previously created Snowflake role.
```sql
GRANT USAGE ON INTEGRATION OPENFLOW__EAI TO ROLE OPENFLOW_RUNTIME_ROLE_;
```
## Next steps
[Create runtime](/user-guide/data-integration/openflow/setup-openflow-spcs-create-runtime)
---
title: Set up Openflow Connector for Amazon Kinesis Data Streams
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/kinesis/setup.md
section: Loading & Unloading Data
---
# Set up %kinesis%
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/connectors/kinesis/about)
- [](/user-guide/data-integration/openflow/connectors/kinesis/maintenance)
- [](/user-guide/data-integration/openflow/connectors/kinesis/troubleshoot)
- [](/user-guide/data-integration/openflow/connectors/kinesis/performance-tuning)
This topic describes how to set up %kinesis%.
%kinesis% is designed for JSON message ingestion from Kinesis streams to Snowflake tables, with schema evolution capabilities.
## Set up the Openflow Connector for Kinesis
### Prerequisites
1. Review [](/user-guide/data-integration/openflow/connectors/kinesis/about).
2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs).
3. If you are using Openflow - Snowflake Deployments, ensure that you have reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list) and have granted access to the required domains for the Kinesis connector.
### Set up IAM roles and policies in AWS
As an AWS administrator, perform the following actions in your AWS account:
1. Create an AWS IAM user or role that Openflow will use to access the Kinesis data stream. For more information, see
[Creating IAM users](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html) in the AWS documentation.
2. Ensure that the AWS user has configured [Access Key credentials](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html).
3. Grant the AWS user the following IAM permissions:
| Service |
Actions |
Resources (ARNs) |
Purpose |
| Amazon Kinesis Data Streams |
`kinesis:DescribeStream`, `kinesis:DescribeStreamConsumer`, `kinesis:GetRecords`, `kinesis:GetShardIterator`, `kinesis:ListShards`, `kinesis:RegisterStreamConsumer` |
`arn:aws:kinesis:${REGION}:${ACCOUNT_ID}:stream/${STREAM_NAME}` |
Discovers shards, reads records through shared-throughput polling, resolves the stream ARN, registers an Enhanced Fan-Out consumer, and polls consumer status during registration. |
| Amazon Kinesis Data Streams |
`kinesis:DeregisterStreamConsumer`, `kinesis:DescribeStreamConsumer`, `kinesis:SubscribeToShard` |
`arn:aws:kinesis:${REGION}:${ACCOUNT_ID}:stream/${STREAM_NAME}/consumer/*` |
Describes, subscribes to, and deregisters Enhanced Fan-Out consumers by consumer ARN. |
| Amazon DynamoDB |
`dynamodb:CreateTable`, `dynamodb:DeleteTable`, `dynamodb:DescribeTable`, `dynamodb:GetItem`, `dynamodb:PutItem`, `dynamodb:Query`, `dynamodb:Scan`, `dynamodb:UpdateItem` |
`arn:aws:dynamodb:${REGION}:${ACCOUNT_ID}:table/${APPLICATION_NAME}`, `arn:aws:dynamodb:${REGION}:${ACCOUNT_ID}:table/${APPLICATION_NAME}_migration` |
Creates and manages the checkpoint/lease table (shard leases, node heartbeats, checkpoints) and a temporary migration table used during one-time migration from legacy checkpoint tables. |
Example IAM policy:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "KinesisStreamAccess",
"Effect": "Allow",
"Action": [
"kinesis:DescribeStream",
"kinesis:DescribeStreamConsumer",
"kinesis:GetRecords",
"kinesis:GetShardIterator",
"kinesis:ListShards",
"kinesis:RegisterStreamConsumer"
],
"Resource": "arn:aws:kinesis:${REGION}:${ACCOUNT_ID}:stream/${STREAM_NAME}"
},
{
"Sid": "KinesisConsumerAccess",
"Effect": "Allow",
"Action": [
"kinesis:DeregisterStreamConsumer",
"kinesis:DescribeStreamConsumer",
"kinesis:SubscribeToShard"
],
"Resource": "arn:aws:kinesis:${REGION}:${ACCOUNT_ID}:stream/${STREAM_NAME}/consumer/*"
},
{
"Sid": "DynamoDBTableAccess",
"Effect": "Allow",
"Action": [
"dynamodb:CreateTable",
"dynamodb:DeleteTable",
"dynamodb:DescribeTable",
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:Query",
"dynamodb:Scan",
"dynamodb:UpdateItem"
],
"Resource": [
"arn:aws:dynamodb:${REGION}:${ACCOUNT_ID}:table/${APPLICATION_NAME}",
"arn:aws:dynamodb:${REGION}:${ACCOUNT_ID}:table/${APPLICATION_NAME}_migration"
]
}
]
}
```
Before using the example policy, replace the following placeholders:
| Placeholder |
Description |
| `${REGION}` |
Your AWS region (for example, `us-east-1`) |
| `${ACCOUNT_ID}` |
Your AWS account ID (for example, `123456789012`) |
| `${STREAM_NAME}` |
The value of the **AWS Kinesis Stream Name** connector parameter |
| `${APPLICATION_NAME}` |
The value of the **AWS Kinesis Application Name** connector parameter. Used as the DynamoDB checkpoint table name and as the Enhanced Fan-Out registered consumer name. |
- The `${APPLICATION_NAME}_migration` table is a temporary DynamoDB table created only
during a one-time migration from legacy checkpoint tables to the new schema. It's deleted
automatically when migration completes. If your deployment has never used the legacy
KCL-based connector, you can omit the migration table ARN from the policy.
- The `dynamodb:DeleteTable` action is used during the migration process and can be removed
from the policy after migration is confirmed complete.
- The `kinesis:DeregisterStreamConsumer` action is invoked when the processor is removed
from the canvas. If the IAM principal doesn't have this permission, the consumer must be
deregistered manually through the AWS console or CLI.
### Set up Snowflake account
As a Snowflake account administrator, perform the following tasks:
1. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property).
2. Create a new role or use an existing role and grant the [database privileges](/sql-reference/sql/grant-privilege).
The connector requires the user to create the destination table. Make sure the user has the required privileges for managing Snowflake objects:
| Object |
Privilege |
Notes |
| Database |
USAGE |
|
| Schema |
USAGE |
|
| Table |
OWNERSHIP |
Required for the connector to ingest data into a table. |
Snowflake recommends creating a separate user and role for each Kinesis stream for better access control.
You can use the following script to create and configure a custom role (requires SECURITYADMIN or equivalent):
```sql
USE ROLE securityadmin;
CREATE ROLE openflow_kinesis_connector_role_1;
GRANT USAGE ON DATABASE kinesis_db TO ROLE openflow_kinesis_connector_role_1;
GRANT USAGE ON SCHEMA kinesis_schema TO ROLE openflow_kinesis_connector_role_1;
```
Privileges must be granted directly to the connector role and can't be inherited.
3. Configure the destination table
We highly recommend using server-side schema evolution for schema changes and
[an error table for DML error logging](#label-kinesis-dml-error-logging).
The example below shows how to create a table and add OWNERSHIP permissions.
```sql
USE ROLE openflow_kinesis_connector_role_1;
CREATE TABLE kinesis_db.kinesis_schema. (
kinesisMetadata object
)
ENABLE_SCHEMA_EVOLUTION = TRUE
ERROR_LOGGING = TRUE;
USE ROLE securityadmin;
GRANT OWNERSHIP ON TABLE TO ROLE openflow_kinesis_connector_role_1;
```
These connectors provide support for automatic schema detection and evolution. The structure of tables in Snowflake is defined and evolved automatically to support the structure of new data loaded by the connector. It will automatically map the record content's first-level keys to table columns matching by name (case-insensitive).
With Schema evolution enabled, Snowflake can automatically expand the destination table by adding new columns that are detected in the incoming stream and dropping NOT NULL constraints to accommodate new data patterns. For more information, see [Table schema evolution](/user-guide/data-load-schema-evolution).
If ENABLE_SCHEMA_EVOLUTION is not enabled, then you have to create the schema manually by extending the table definition. The connector tries to match the record content's first-level keys to the table columns by name. If keys from the JSON do not match the table columns, the connector ignores the keys.
4. (Optional) Configure a secrets manager
Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store.
1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you use the EC2 instance role associated with Openflow as this way no other secrets have to be persisted.
2. In the Openflow canvas, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values.
3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow.
5. Grant access to users
Any other Snowflake users who require access to the raw ingested data by the connector (for example, for custom processing in Snowflake), should be granted the role created in step 2.
### (Optional) Configure outbound AWS PrivateLink
If you're running the connector in Openflow - Snowflake Deployments and want to route the connector's Kinesis traffic over
[outbound private connectivity](/user-guide/private-connectivity-outbound) (AWS PrivateLink) instead of the public internet,
follow the steps in this section.
The connector makes outbound calls to the following AWS services:
| Service |
Purpose |
PrivateLink support |
| Amazon Kinesis Data Streams |
Reads stream records. |
Supported by this connector. |
| Amazon DynamoDB |
Stores checkpoint metadata for processed records. |
Not supported. Use the public endpoint. |
Amazon DynamoDB doesn't support Private DNS for its PrivateLink endpoint. See
[Considerations when using AWS PrivateLink for Amazon DynamoDB](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/privatelink-interface-endpoints.html#privatelink-considerations)
in the AWS documentation. Because Snowflake's `PRIVATE_HOST_PORT` network rule type relies on Private DNS, the connector
can't route DynamoDB traffic through a PrivateLink endpoint. Configure DynamoDB using `HOST_PORT` (public endpoint)
as shown in the example. Only checkpoint metadata flows through the public endpoint to DynamoDB. Stream records flow through the private Kinesis endpoint.
To configure outbound AWS PrivateLink, complete the following steps:
1. As ACCOUNTADMIN, provision an outbound PrivateLink endpoint for Amazon Kinesis Data Streams in the region where your stream is located. Replace `` with your AWS region (for example, `us-east-1`):
```sql
USE ROLE ACCOUNTADMIN;
SELECT SYSTEM$PROVISION_PRIVATELINK_ENDPOINT(
'com.amazonaws..kinesis-streams',
'kinesis..amazonaws.com'
);
```
For more information, see [SYSTEM$PROVISION_PRIVATELINK_ENDPOINT](/sql-reference/functions/system_provision_privatelink_endpoint) and [Managing outbound private connectivity endpoints on AWS](/user-guide/private-manage-endpoints-aws).
2. Create network rules that reach `kinesis..amazonaws.com` through the private endpoint and DynamoDB through the public endpoint. Replace `` with the schema you use to host network rules:
```sql
USE ROLE ACCOUNTADMIN;
USE SCHEMA ;
CREATE OR REPLACE NETWORK RULE openflow_kinesis_private_network_rule
MODE = EGRESS
TYPE = PRIVATE_HOST_PORT
VALUE_LIST = ('kinesis..amazonaws.com');
CREATE OR REPLACE NETWORK RULE openflow_kinesis_public_network_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('dynamodb..amazonaws.com:443');
```
3. Attach both network rules to an external access integration, then grant the runtime role permission to use the integration:
```sql
USE ROLE ACCOUNTADMIN;
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION openflow_kinesis_eai
ALLOWED_NETWORK_RULES = (
openflow_kinesis_private_network_rule,
openflow_kinesis_public_network_rule
)
ENABLED = TRUE
COMMENT = 'External access integration for the Openflow Connector for Kinesis';
GRANT USAGE ON INTEGRATION openflow_kinesis_eai TO ROLE ;
```
For the steps to associate the integration with a runtime, see
[](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list).
### Set up the connector
As a data engineer, perform the following tasks to install and configure the connector:
#### Install the connector
1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**.
2. On the Openflow connectors page, find the **Openflow connector for Amazon Kinesis Data Streams** and select **Add to runtime**.
3. In the Select runtime dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**.
Before you install the connector, ensure that you have created a database, schema, and a table in Snowflake for the connector to store ingested data.
4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.
5. Authenticate to the runtime with your Snowflake account credentials.
The Openflow canvas appears with the connector process group added to it.
#### Configure the connector
1. If needed, customize the connector configuration before configuring the built-in parameters.
2. Populate the process group parameters
1. Right-click on the imported process group and select **Parameters**.
2. Fill out the required parameter values.
##### Common parameters
| Parameter |
Description |
Required |
| AWS Access Key ID |
The AWS Access Key ID to connect to your Kinesis Stream and DynamoDB. |
Yes |
| AWS Kinesis Region |
The AWS Region to connect to. Use regular AWS region format, for example: `us-west-2`, `ap-southeast-1`, `eu-west-1`. See the [AWS Regions](https://docs.aws.amazon.com/general/latest/gr/rande.html#kinesis_region) page. |
Yes |
| AWS Secret Access Key |
The AWS Secret Access Key to connect to your Kinesis Stream and DynamoDB. |
Yes |
| AWS Kinesis Application Name |
The name that is used as the DynamoDB table name for tracking the application's progress on Kinesis Stream consumption. |
Yes |
| AWS Kinesis Consumer Type |
The strategy used to read records from a Kinesis Stream.
Must be one of the following values: **SHARED_THROUGHPUT**, **ENHANCED_FAN_OUT**.
For more information, see [Differences between shared throughput consumer and enhanced fan-out consumer](https://docs.aws.amazon.com/streams/latest/dev/enhanced-consumers.html).
|
Yes |
| AWS Kinesis Initial Stream Position |
The initial stream position from which the data starts replication. This takes effect only during the initial start for a given AWS Kinesis Application Name.
Possible values are:
**LATEST**: Latest stored record,
**TRIM_HORIZON**: Earliest stored record.
|
Yes |
| AWS Kinesis Stream Name |
The AWS Kinesis Stream Name to consume data from. |
Yes |
| Snowflake Destination Database |
The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. |
Yes |
| Snowflake Destination Schema |
The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
See the following examples:
`CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME`.
`CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively.
|
Yes |
| Snowflake Destination Table |
The table where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. |
Yes |
#### Start the connector
1. Right-click on the plane and select **Enable all Controller Services**.
2. Right-click on the plane and select **Start**. The connector starts data ingestion.
## Understanding KINESISMETADATA column
The connector populates the KINESISMETADATA structure with metadata about the Kinesis record. The structure contains the following information:
| Field Name |
Field Type |
Example Value |
Description |
| stream |
String |
`stream-name` |
The name of the Kinesis stream the record came from. |
| shardId |
String |
`shardId-000000000001` |
The identifier of the shard in the stream the record came from. |
| approximateArrival |
String |
`2025-11-05T09:12:15.300` |
The approximate time that the record was inserted into the stream (ISO 8601 format). |
| partitionKey |
String |
`key-1234` |
The partition key specified by the data producer for the record. |
| sequenceNumber |
String |
`123456789` |
The unique sequence number assigned by Kinesis Data Streams to the record in the shard. |
| subSequenceNumber |
Number |
`2` |
The subsequence number for the record (used for aggregated records with the same sequence number). |
| shardedSequenceNumber |
String |
`12345678900002` |
A combination of the sequence number and the subsequence number for the record. |
## Measuring ingestion latency
For change tracking, incremental processing, and Time Travel queries based on row modification time, the ROW_TIMESTAMP feature can be used.
It can be enabled by running the following command on your destination table:
```sql
ALTER TABLE SET ROW_TIMESTAMP = TRUE;
```
After row timestamps are enabled, tables expose the `METADATA$ROW_LAST_COMMIT_TIME` column, which returns the timestamp when each row was last modified.
For more information, see [Row timestamps](/user-guide/data-engineering/row-timestamps).
Row timestamp isn't available for interactive tables. For more information, see [](#label-limitations-of-interactive-tables).
## Using the connector with Apache Iceberg™ tables
The connector can ingest data into Snowflake-managed Apache Iceberg™ tables but must meet the following requirements:
- You must have been granted the USAGE privilege on the external volume associated with your Apache Iceberg™ table.
- You must create an Apache Iceberg™ table before running the connector.
### Grant usage on an external volume
For example, if your Iceberg table uses the `kinesis_external_volume` external volume and the connector uses the role `openflow_kinesis_connector_role_1`, run the following statement:
```sql
USE ROLE ACCOUNTADMIN;
GRANT USAGE ON EXTERNAL VOLUME kinesis_external_volume TO ROLE openflow_kinesis_connector_role_1;
```
### Create an Apache Iceberg™ table for ingestion
The connector does not create Iceberg tables automatically and does not support schema evolution. Before you run the connector, you must create an Iceberg table manually.
When you create an Iceberg table, you can use Iceberg data types (including VARIANT) or
[compatible Snowflake types](/user-guide/tables-iceberg-data-types).
For example, consider the following message:
```json
{
"id": 1,
"name": "Steve",
"body_temperature": 36.6,
"approved_coffee_types": ["Espresso", "Doppio", "Ristretto", "Lungo"],
"animals_possessed": {
"dogs": true,
"cats": false
},
"options": {
"can_walk": true,
"can_talk": false
},
"date_added": "2024-10-15"
}
```
To create an Iceberg table for the example message, use one of the following statements:
```sql
CREATE OR REPLACE ICEBERG TABLE my_iceberg_table (
kinesisMetadata OBJECT(
stream STRING,
shardId STRING,
approximateArrival STRING,
partitionKey STRING,
sequenceNumber STRING,
subSequenceNumber INTEGER,
shardedSequenceNumber STRING
),
id INT,
name string,
body_temperature float,
approved_coffee_types array(string),
animals_possessed variant,
date_added date,
options object(can_walk boolean, can_talk boolean)
)
EXTERNAL_VOLUME = 'my_volume'
CATALOG = 'SNOWFLAKE'
BASE_LOCATION = 'my_location/my_iceberg_table'
ICEBERG_VERSION = 3;
```
## Using the connector with Interactive Tables
Interactive tables are a special type of Snowflake table optimized for low-latency, high-concurrency queries. You can find out more about interactive tables in the [interactive tables documentation](/user-guide/interactive).
1. Create an interactive table:
```sql
CREATE INTERACTIVE TABLE REALTIME_METRICS (
metric_name VARCHAR,
metric_value NUMBER,
source_topic VARCHAR,
timestamp TIMESTAMP_NTZ
) CLUSTER BY (metric_name)
AS (SELECT
$1:M_NAME::VARCHAR,
$1:M_VALUE::NUMBER,
$1:RECORD_METADATA.topic::VARCHAR,
$1:RECORD_METADATA.timestamp::TIMESTAMP_NTZ
from TABLE(DATA_SOURCE(TYPE => 'STREAMING')));
```
Important considerations:
- Interactive tables have specific limitations and query restrictions. Review the [interactive tables documentation](/user-guide/interactive) before using them with the connector.
- For interactive tables, any required transformations must be handled in the table definition.
- Interactive warehouses are required to query interactive tables efficiently.
## Using the connector with a customer-defined schema for the destination table
The connector treats each Kinesis record as a row to be inserted into a Snowflake table. For example, if you have a Kinesis topic with the content of the message structured like the following JSON:
```json
{
"order_id": 12345,
"customer_name": "John",
"order_total": 100.00,
"isPaid": true
}
```
By default you don't have to specify all fields from the JSON. Schema evolution will take care of it. However, if you prefer a static schema, it can be created by running:
```sql
CREATE TABLE ORDERS (
kinesisMetadata OBJECT,
order_id NUMBER,
customer_name VARCHAR,
order_total FLOAT,
ispaid BOOLEAN
);
```
## Using the connector with a customer-defined PIPE
If you choose to create your own pipe, you can define the data transformation logic in the pipe's
[COPY INTO](/sql-reference/sql/copy-into-table) statement. You can rename columns as required and cast the data types as needed. For example:
```sql
CREATE TABLE ORDERS (
order_id VARCHAR,
customer_name VARCHAR,
order_total VARCHAR,
ispaid VARCHAR
);
CREATE PIPE ORDERS AS
COPY INTO ORDERS
FROM (
SELECT
$1:order_id::STRING,
$1:customer_name,
$1:order_total::STRING,
$1:isPaid::STRING
FROM TABLE(DATA_SOURCE(TYPE => 'STREAMING'))
);
```
When you define your own pipe your destination table columns do not have to match the JSON keys. You can rename the columns to your desired names and cast the data types if required.
To adjust the connector to work with a custom pipe, perform the following tasks:
1. Right-click on the PublishSnowpipeStreaming processor used in your Kinesis ingestion flow in the Openflow canvas.
2. Select **Configure** from the context menu.
3. Navigate to the **Properties** tab.
4. In the Destination type field, pick **Pipe**.
5. In the Pipe field, type the name of your pipe.
6. Select **Apply** to save the configuration.
## Customizing error handling
Error handling is split between Openflow-side failures and server-side failures within the Snowpipe Streaming service.
- **Openflow Errors (Client-Side Failures)**: Errors such as unparseable payloads or custom transformation failures occur before records reach Snowflake. By default these records are discarded. It's possible to process these errors in Openflow - use FlowFiles from the parse failure relationship in the ConsumeKinesis processor.
- **Snowpipe Streaming Errors (Server-Side Failures)**: Errors for records that successfully reach Snowflake but are incompatible with the destination table's schema (for example, type mismatches) are captured by the Snowflake infrastructure. When error logging is enabled on the destination table (`error_logging = true`), these failed rows are automatically ingested into the destination Error table.
## Next steps
- [](/user-guide/data-integration/openflow/connectors/kinesis/performance-tuning)
- [](/user-guide/data-integration/openflow/connectors/kinesis/maintenance)
- [](/user-guide/data-integration/openflow/connectors/kinesis/troubleshoot)
---
title: Set up PrivateLink UI access in Openflow - Snowflake Deployments
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/setup-openflow-spcs-configure-pr-ui.md
section: Loading & Unloading Data
---
# Set up PrivateLink UI access in %ofsfspcs-plural%
This feature is not available in the People's Republic of China.
Openflow Snowflake Deployments are available to all accounts in AWS, Azure, and GCP [](#label-na-general-regions).
- [](/user-guide/data-integration/openflow/about-spcs)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/monitor)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
This topic explains how to configure access to the Snowflake Openflow Runtime UI using private connectivity.
This is an optional task. If you will not be accessing the Openflow Runtime UI using public connectivity,
you can skip this task.
There are two tasks to configure access to the Snowflake Openflow Runtime UI using private connectivity:
1. [](#label-openflow-spcs-configure-pr-ui-access-ui)
2. [](#label-openflow-spcs-configure-pr-ui-create-deployment)
## Prerequisites
Before configuring private link for the Openflow Runtime UI, enable PrivateLink for your account as described in [](/user-guide/admin-security-privatelink).
## Determine PrivateLink URLs
1. Using the ACCOUNTADMIN role, call the SYSTEM$GET_PRIVATELINK_CONFIG function in your Snowflake account and identify the value for `openflow-privatelink-url`. This is the URL for accessing Openflow UI over PrivateLink in the form:
- `-.openflow..privatelink.snowflakecomputing.com`
2. The URL for accessing the Runtime UI in a Snowflake deployment will be in the form:
- `of---.spcs..privatelink.snowflake.app`
3. Create CNAME records in your DNS to resolve these URL values to your VPC endpoint.
4. Confirm that your DNS settings can resolve the value.
5. Confirm that you can connect to Openflow UI using this URL from your browser.
6. Confirm that you can connect to Runtime UI using this URL from your browser.
## Configure PrivateLink for Openflow Runtime UI access
Perform the following steps:
1. Retrieve Snowflake's VPC endpoint service ID and Openflow PrivateLink URLs:
1. As a user with the ACCOUNTADMIN role, execute
```sql
SELECT SYSTEM$GET_PRIVATELINK_CONFIG();
```
1. From the output, identify and save the values for the following keys:
- `privatelink-vpce-id`
- `openflow-privatelink-url`
- `external-telemetry-privatelink-url`
2. Construct the Runtime URL
- `of---.spcs..privatelink.snowflake.app`
2. Create a VPC endpoint with parameters:
If the Snowflake account where you plan to create your Openflow Deployment
had previously configured PrivateLink for %sf-web-interface%,
use the existing AWS VPC endpoint and add the additional OpenFlow DNS records to your Route 53.
- Type: `PrivateLink Ready partner services`
- Service: `privatelink-vpce-id` value obtained in the previous step.
- VPC: The VPC where your Openflow deployment will be running.
- Subnets: Select two availability zones and private subnets where your Openflow deployment will run.
3. Set up a Route 53 private hosted zone for Openflow UI with the following parameters:
- Domain: `privatelink.snowflakecomputing.com`
- Type: `Private hosted zone`
- Select the region and VPC where your Openflow deployment will run.
4. Set up a Route 53 private hosted zone for Openflow UI with the following parameters:
- Domain: `privatelink.snowflakecomputing.com`
- Type: `Private hosted zone`
- Select the region and VPC where your Openflow deployment will run.
5. Set up a Route 53 private hosted zone for Runtime UI with the following parameters:
- Domain: `privatelink.snowflake.app`
- Type: `Private hosted zone`
- Select the region and VPC where your Openflow deployment will run.
6. Add two CNAME records for the URLs identified in the first step:
- For `openflow-privatelink-url`
- Record name: `openflow-privatelink-url` value obtained in the first step
- Record type: `CNAME`
- Value: DNS name of your VPC endpoint
- For Runtime UI URL
- Record name: `openflow-runtime-ui-privatelink-url` value obtained in the first step
- Record type: `CNAME`
- Value: DNS name of your VPC endpoint
When creating a new %ofsfspcs%, ensure the **PrivateLink** option is enabled.
### Next steps
[Create deployment](/user-guide/data-integration/openflow/setup-openflow-spcs-deployment)
---
title: Set Up SAP® BDC Connect for Snowflake Zerocopy Connector
source: https://docs.snowflake.com/en/user-guide/data-integration/zero-copy/sap-sql/setup.md
section: Loading & Unloading Data
---
# Set Up %sapbdc% Zerocopy Connector
- [](/user-guide/data-integration/zero-copy/about-sap-snowflake)
- [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-snowflake)
- [](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-bdc)
- [](/user-guide/data-integration/zero-copy/sap-sql/security)
- [](/user-guide/data-integration/zero-copy/sap-sql/explore-data-products)
- [](/user-guide/data-integration/zero-copy/sap-sql/publish-data)
The Zerocopy Connector is subject to the [SAP® BDC Connect for Snowflake Terms](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/sap-bdc-connect-snowflake/).
The steps to share Data Products from SAP® BDC to SAP® Snowflake accounts and existing Snowflake
accounts that use the SAP® BDC Connect for Snowflake are exactly the same.
This topic describes how to create and manage a Zerocopy Connector for
%sapbdc% on the Snowflake side. For the SAP® side setup, see
[](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-snowflake) or
[](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-bdc).
For the privileges required for each operation, see [](/user-guide/data-integration/zero-copy/sap-sql/security).
## Prerequisites
Before creating a Zerocopy Connector:
- An `ORGADMIN` must accept the SAP® BDC Connect for Snowflake Terms.
This only needs to be done once per Snowflake organization. Terms of Service
cannot be self-revoked — contact Snowflake support and legal to revoke them.
To accept the SAP® BDC Connect for Snowflake Terms in Snowsight:
1. Sign in to Snowflake as a user with the `ORGADMIN` role.
2. In the navigation menu, select **Admin** %raa% **Terms**.
3. In the **Snowflake Marketplace** section, next to
**SAP® BDC Connect for Snowflake Terms**, select **Review**.
4. Select **Acknowledge & Continue**.
- Complete the SAP® side setup described in
[](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-snowflake) or
[](/user-guide/data-integration/zero-copy/sap-sql/setup-sap-bdc).
- The role used to create the connector must have `CREATE ZEROCOPY CONNECTOR`
on the target schema. By default, the owner role of a schema has this privilege.
## Create a Database and Schema
A Zerocopy Connector is a schema-level object. Before creating one, ensure
you have a target database and schema, or create new ones. For reference, see
[](/sql-reference/sql/create-database) and
[](/sql-reference/sql/create-schema).
```sql
CREATE DATABASE IF NOT EXISTS my_db;
CREATE SCHEMA IF NOT EXISTS my_db.my_schema;
```
## Create a Zerocopy Connector
A Zerocopy Connector is a schema-level object. You can specify a fully qualified
name (`..`), a partially qualified name, or a plain
name when the database and schema are set in the current session context.
```sql
CREATE [ OR REPLACE ] ZEROCOPY CONNECTOR [ IF NOT EXISTS ]
PARTNER = SAP_BDC;
```
```sql
CREATE ZEROCOPY CONNECTOR IF NOT EXISTS my_db.my_schema.my_sap_connector
PARTNER = SAP_BDC;
```
After creation, the connector is in `NEW` state. No connection is established
until you run `ALTER ... CONNECT`.
## Enroll with SAP® BDC
The connector must be in `NEW`, `CONNECT_ERROR`, or `DISCONNECTED`
state. See [](#connector-states) for details.
```sql
ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector
CONNECT WITH CONFIG = (
INVITATION_LINK = ''
);
```
The connector immediately enters `CONNECTING` state while the connection is
established asynchronously. Use `DESC ZEROCOPY CONNECTOR` to check the
current state.
### Verify Connector State
Use `DESCRIBE` to check the current state of a connector:
```sql
DESC ZEROCOPY CONNECTOR my_db.my_schema.my_sap_connector;
```
#### Output
| Column |
Description |
| `name` |
Name of the Zerocopy Connector. |
| `partner` |
The data partner (e.g., `SAP_BDC`). |
| `config` |
The configuration of the data partner. For SAP® BDC, this contains the SAP® BDC Connector Endpoint. |
| `status` |
Current connector state. See [](#connector-states). |
| `connection_error` |
Error message if the connector is in `CONNECT_ERROR` or `DISCONNECT_ERROR` state; otherwise empty. |
| `catalog_linked_databases` |
Mounted catalog-linked databases that are visible to the current role. |
| `share_back` |
Whether sharing data from Snowflake to SAP® BDC is enabled for this connector. |
| `shares` |
Snowflake data shares that are associated with this connector. |
| `database_name` |
Database in which the connector resides. |
| `schema_name` |
Schema in which the connector resides. |
| `owner` |
Role that owns the connector. |
| `owner_role_type` |
Type of the owner role. |
| `comment` |
Optional comment set on the connector. |
| `created_on` |
Timestamp when the connector was created. |
| `updated_on` |
Timestamp when the connector was last updated. |
To list all connectors visible to the current role:
```sql
SHOW ZEROCOPY CONNECTORS IN SCHEMA my_db.my_schema;
SHOW ZEROCOPY CONNECTORS IN DATABASE my_db;
SHOW ZEROCOPY CONNECTORS IN ACCOUNT;
```
## Set Properties
You can set optional properties on a connector using `ALTER ... SET`:
```sql
-- Set a comment
ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector
SET COMMENT = 'SAP BDC connector for sales data products';
-- Enabling share back allows publishing data from Snowflake to SAP BDC
ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector
SET SHARE_BACK = TRUE;
```
To unset a property and restore its default value:
```sql
ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector
UNSET COMMENT;
ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector
UNSET SHARE_BACK;
```
## Disconnect the Connector
All catalog-linked databases created from the connector must be
dropped before disconnecting. Share-back must be disabled before
disconnecting. The connector must be in `CONNECTED` or
`DISCONNECT_ERROR` state.
```sql
ALTER ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector
DISCONNECT;
```
The connector immediately enters `DISCONNECTING` state while the connection
is dropped asynchronously. When successful, it transitions to `DISCONNECTED`.
## Drop the Connector
You can only drop a connector that is in `NEW`, `CONNECT_ERROR`,
`DISCONNECT_ERROR`, or `DISCONNECTED` state. Zerocopy Connectors do not support `UNDROP`.
```sql
DROP ZEROCOPY CONNECTOR IF EXISTS my_db.my_schema.my_sap_connector;
```
## Next Steps
Once the connector is in `CONNECTED` state, you can:
- List available SAP® data products and create catalog-linked databases.
See [](/user-guide/data-integration/zero-copy/sap-sql/explore-data-products).
- Publish Snowflake data back to SAP® BDC. See [](/user-guide/data-integration/zero-copy/sap-sql/publish-data).
---
title: Set up tasks for the Openflow Connector for Oracle
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/oracle/setup-tasks.md
section: Loading & Unloading Data
---
# Set up tasks for the %oracleofc%
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
The %oracleofc% is also subject to additional terms of service beyond the standard
connector terms of service. For more information, see the
[Openflow Connector for Oracle Addendum](https://www.snowflake.com/en/legal/optional-offerings/offering-specific-terms/openflow-oracle-terms/).
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/oracle/about)
- [](/user-guide/data-integration/openflow/connectors/oracle/manage-commercial-terms)
- [](/user-guide/data-integration/openflow/connectors/oracle/data-mapping)
- [](/user-guide/data-integration/openflow/connectors/oracle/setup-oracledb)
This topic describes the overall tasks required to set up, configure, and run the %oracleofc%.
## Prerequisites
Before you set up the %oracleofc%, verify that the following prerequisites are met:
1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/oracle/about).
2. Ensure that you have set up an Openflow deployment:
- [](/user-guide/data-integration/openflow/setup-openflow-byoc)
- [Set up Openflow - Snowflake Deployment](/user-guide/data-integration/openflow/setup-openflow-spcs)
3. Ensure that you add only one connector instance per runtime.
## Tasks
Perform the following tasks to set up, configure, and run the %oracleofc%.
| Order |
Task |
Description |
Persona |
| 1 |
Review [](#label-oracle-of-connector-prerequisites) |
Review and confirm all required prerequisites. |
**Snowflake account administrator** |
| 2 |
[Enable the connector](#label-oracle-enable-service) |
Accept the Oracle XStream terms to make the connector visible in the list of available connectors. |
**Organization administrator (ORGADMIN)** |
| 3 |
[Configure the Oracle database](/user-guide/data-integration/openflow/connectors/oracle/setup-oracledb) |
Configure the Oracle database for %oracleofc% including replication settings and credentials. |
**Oracle database administrator** |
| 4 |
[Set up Snowflake](/user-guide/data-integration/openflow/connectors/oracle/setup-snowflake) |
Create the destination database, service user, role, warehouse, and key pair authentication for the %oracleofc%. |
**Snowflake account administrator** |
| 5 |
[Configure the connector](/user-guide/data-integration/openflow/connectors/oracle/setup-connector) |
Install, configure, and run the %oracleofc% connector. |
**Snowflake account administrator** |
| 6 |
[Set up licensing](#label-oracle-license-setup) |
Configure your licensing model after the connector detects your source database inventory. |
**Organization administrator (ORGADMIN)** |
## Next steps
- [Monitor the flow](/user-guide/data-integration/openflow/monitor).
- [Maintenance](/user-guide/data-integration/openflow/connectors/oracle/maintenance) for reinstalling the connector or changing the XStream position.
---
title: Set up the Atlassian Jira Cloud (Agile) flow
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/jira-cloud/setup-agile.md
section: Loading & Unloading Data
---
# Set up the %jiraagile% flow
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/jira-cloud/about)
- [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-core)
- [](/user-guide/data-integration/openflow/connectors/jira-cloud/migrate-from-legacy)
This topic describes the steps to install and configure the %jiraagile% flow, the agile
flow of the %jira%. The core flow is documented separately in [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-core).
The agile flow is independent of the core flow. It uses its own API token, parameter contexts,
state service, and Snowflake destination configuration. Both flows can write to the same Snowflake
database and schema, since they create tables with different names.
## Prerequisites
1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/jira-cloud/about).
2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs).
3. If using %OFSFSPCS-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list)
and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-jira-cloud) connector.
## Get the credentials
As a Jira Cloud administrator, perform the following tasks in your Atlassian account. You can reuse
the API token from the core flow or create a separate token. The core flow and agile flow can use
the same token, but they always share the underlying Jira API rate budget regardless.
1. Navigate to the [API tokens page](https://id.atlassian.com/manage-profile/security/api-tokens).
2. Select **Create API token with scopes**.
3. In the **Create an API token** dialog box, provide a descriptive name for the API token and
select an expiration date for the API token. This can range from 1 to 365 days.
4. Select the API token app **Jira**.
5. Select the agile scopes listed in [Required API scopes](#label-jira-agile-api-scopes).
6. Select **Create token**.
7. In the **Copy your API token** dialog box, select **Copy** to copy your generated API
token and then paste the token to the connector parameters, or save it securely.
8. Select **Close** to close the dialog box.
### Required API scopes
The agile flow always requires the following baseline Jira API scopes:
- `read:board-scope:jira-software`, `read:board-scope.admin:jira-software`, `read:project:jira`
(covers the always-created `BOARD` table)
- `read:jira-user` (covers the connection verification that runs at startup against
`GET /rest/api/3/myself`)
The API token owner additionally needs the **Browse projects** Jira permission on every project
whose boards you want to ingest, as well as access to each board's saved filter (used when reading
board configuration).
Some optional tables require additional scopes on top of the baseline:
| Table (Enabled Tables value) |
Additional Jira API scope |
Notes |
| `SPRINT` (populates `SPRINT` and `BOARD_SPRINT`) |
`read:sprint:jira-software` |
No additional permission required. |
| `BOARD_PROJECT` |
None. |
No additional permission required. |
| `BOARD_ISSUE` |
`read:jira-work` |
Issues that fail per-issue permission checks (for example, issue-level security) are skipped silently. |
If you reuse a single API token across both flows, combine these scopes with the core flow scopes
documented in [](#label-jira-core-api-scopes).
Tokens without scopes are also supported and grant access based solely on the API token owner's
permissions. However, tokens with scopes are recommended for fine-grained access control.
## Set up Snowflake account
If you've already completed the Snowflake account setup for the core flow, you can reuse the same
role, service user, key pair, database, schema, and warehouse for the agile flow. The agile flow
parameters point at this same Snowflake configuration.
Otherwise, perform the following tasks:
As a Snowflake account administrator, perform the following tasks:
1. Create a new role or use an existing role.
2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property).
3. Grant the Snowflake service user the role you created in the previous steps.
4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2.
5. Configure a secrets manager supported by Openflow (recommended), for example, AWS, Azure, and HashiCorp, and store the public and private keys in the secret store.
If for any reason, you don't want to use a secrets manager, then you are responsible for safeguarding the
public key and private key files used for key-pair authentication according to the security policies of your organization.
1. After the secrets manager is configured, determine how you will authenticate to it. On AWS, use the
EC2 instance role associated with Openflow as this way no other secrets have to be persisted.
2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the main menu (⋮) in the upper-right corner.
Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values.
3. At this point, all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow.
6. If any other Snowflake users require access to the tables ingested by the connector (for example, for custom processing in Snowflake),
then grant those users the role created in step 1.
7. Create a database and schema in Snowflake for the connector to store ingested data. Grant the following [](#label-database-privileges) to the role created in the first step.
```sql
CREATE DATABASE jira_destination_db;
CREATE SCHEMA jira_destination_db.jira_destination_schema;
GRANT USAGE ON DATABASE jira_destination_db TO ROLE ;
GRANT USAGE ON SCHEMA jira_destination_db.jira_destination_schema TO ROLE ;
GRANT CREATE TABLE ON SCHEMA jira_destination_db.jira_destination_schema TO ROLE ;
```
8. Create a warehouse that the connector will use or use an existing one. Start with the smallest warehouse size, then experiment with size depending on the
amount of data transferred. Large data volumes typically scale better with
[multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes.
9. Ensure that the user with the role used by the connector has the required privileges to use the warehouse. If that's not the case then grant the required privileges to the role.
```sql
CREATE WAREHOUSE jira_connector_warehouse WITH WAREHOUSE_SIZE = 'X-Small';
GRANT USAGE ON WAREHOUSE jira_connector_warehouse TO ROLE ;
```
## Set up the connector
The agile flow is shipped as the %jiraagile% process group. As a data engineer, perform the
following tasks to install and configure it.
### Install the connector
To install the connector, do the following as a data engineer:
1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**.
2. On the Openflow connectors page, find the connector and select **Add to runtime**.
3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**.
Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data.
4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.
5. Authenticate to the runtime with your Snowflake account credentials.
The Openflow canvas appears with the connector process group added to it.
After import, the agile flow appears on the canvas as the %jiraagile% process group.
### Configure the connector
1. Right-click on the imported %jiraagile% process group and select **Parameters**.
2. Populate the required parameter values as described in
[Flow parameters](#label-jira-agile-flow-parameters).
### Flow parameters
The agile flow uses its own separate parameter contexts. The Jira credentials and Snowflake destination
must be configured independently from the core flow. Both flows can point to the same Snowflake
destination database and schema.
- [Jira Cloud (Agile) Source Parameters](#label-jira-agile-source-parameters): Used to establish connection with the Jira API.
- [Jira Cloud (Agile) Destination Parameters](#label-jira-agile-destination-parameters): Used to establish connection with Snowflake.
- [Jira Cloud (Agile) Ingestion Parameters](#label-jira-agile-ingestion-parameters): Used to define the configuration of data ingested from Jira.
#### Jira Cloud (Agile) Source Parameters
| Parameter |
Description |
| Jira Email |
Email address for the Atlassian account used for authentication. |
| Jira API Token |
API access token for your Atlassian Jira account. See [Required API scopes](#label-jira-agile-api-scopes) for the scopes to configure. |
| Environment URL |
URL to the Atlassian Jira environment. For example, `https://your-domain.atlassian.net`. |
#### Jira Cloud (Agile) Destination Parameters
| Parameter |
Description |
Required |
| Destination Database |
The database where data will be persisted. It must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
|
Yes |
| Destination Schema |
The schema where data will be persisted, which must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME`
- `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
|
Yes |
| Snowflake Authentication Strategy |
When using:
- **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN.
This token is managed automatically by Snowflake.
BYOC deployments must have previously configured
[runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN.
- **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.
|
Yes |
| Snowflake Account Identifier |
When using:
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted.
|
Yes |
| Snowflake Private Key |
When using:
- **Session Token Authentication Strategy**: Must be blank.
-
- **KEY_PAIR**: Must be the RSA private key used for authentication.
-
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers.
Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
|
No |
| Snowflake Private Key File |
When using:
- **Session token authentication strategy**: The private key file must be blank.
- **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake,
formatted according to PKCS8 standards and including standard PEM headers and footers.
The header line begins with `-----BEGIN PRIVATE`.
To upload the private key file, select the **Reference asset** checkbox.
|
No |
| Snowflake Private Key Password |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the password associated with the Snowflake private key file.
|
No |
| Snowflake Role |
When using
- **Session Token Authentication Strategy**: Use your Snowflake role.
You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime.
- **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user.
|
Yes |
| Snowflake Username |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance.
|
Yes |
| Oversized Value Strategy |
Determines how the connector handles values that exceed its internal size limits (16 MB) during replication.
Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table.
- **Set Null**: The value is replaced with `NULL` in the destination table.
Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
|
No |
| Snowflake Warehouse |
Snowflake warehouse used to run queries. |
Yes |
#### Jira Cloud (Agile) Ingestion Parameters
| Parameter |
Description |
| Enabled Tables |
Comma-separated list of optional tables to populate. Ingestion of `BOARD` is always
enabled and can't be disabled. Available values:
- `BOARD_ISSUE` (issues associated with the board)
- `BOARD_PROJECT` (projects associated with the boards)
- `SPRINT` (sprints and board-sprint associations, populates both `SPRINT` and
`BOARD_SPRINT`)
Default value: `BOARD_ISSUE, BOARD_PROJECT, SPRINT`.
|
| Merge Interval |
Time interval between journal-to-destination merge operations. When a merge runs, the Snowflake
warehouse resumes. The merge is skipped if no new data has been loaded since the previous merge.
Default value: `1 min`.
|
## Run the flow
1. Right-click on the canvas and select **Enable all Controller Services**.
2. Right-click on the %jiraagile% process group and select **Start**. The flow starts the data ingestion.
On first run, the flow creates the required Snowflake tables in the destination schema. See
[](#label-jira-entities) for the full list of tables created by the agile flow and the parameters
that control which optional tables are populated.
## Resetting the connector state
If you want to restart the ingestion from scratch, clear the agile flow's ingestion state. The
agile flow uses its own centralized state service rather than per-processor state.
To reset the state, perform the following steps:
1. Right-click the %jiraagile% process group and select **Stop**.
2. Navigate to the **Controller Settings** for the process group.
3. Find the **StandardJiraIngestionStateService** controller service and select **View State**.
4. Select **Clear State**. This clears the agile flow's ingestion tracking.
5. Optionally, update the connector parameters if needed.
6. Right-click the %jiraagile% process group and select **Start**.
The agile flow's destination tables (`BOARD`, `SPRINT`, `BOARD_SPRINT`, `BOARD_PROJECT`,
`BOARD_ISSUE`) are fully refreshed on every scheduled run, regardless of whether you clear the
state.
## Next steps
- [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-core) if you haven't yet installed the core flow.
- [](/user-guide/data-integration/openflow/connectors/jira-cloud/migrate-from-legacy) if you're moving from a previous version of the Jira Cloud connector.
---
title: Set up the Atlassian Jira Cloud (Core) flow
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/jira-cloud/setup-core.md
section: Loading & Unloading Data
---
# Set up the %jiracore% flow
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
- [](/user-guide/data-integration/openflow/connectors/jira-cloud/about)
- [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-agile)
- [](/user-guide/data-integration/openflow/connectors/jira-cloud/migrate-from-legacy)
This topic describes the steps to install and configure the %jiracore% flow, the core flow
of the %jira%. The agile flow is documented separately in [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-agile).
## Prerequisites
1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/jira-cloud/about).
2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs).
3. If using %OFSFSPCS-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list)
and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-jira-cloud) connector.
## Get the credentials
As a Jira Cloud administrator, perform the following tasks in your Atlassian account:
1. Navigate to the [API tokens page](https://id.atlassian.com/manage-profile/security/api-tokens).
2. Select **Create API token with scopes**.
3. In the **Create an API token** dialog box, provide a descriptive name for the API token and
select an expiration date for the API token. This can range from 1 to 365 days.
4. Select the API token app **Jira**.
5. Select the required scopes based on the features you plan to use. See
[Required API scopes](#label-jira-core-api-scopes) for details.
6. Select **Create token**.
7. In the **Copy your API token** dialog box, select **Copy** to copy your generated API
token and then paste the token to the connector parameters, or save it securely.
8. Select **Close** to close the dialog box.
### Required API scopes
The core flow always requires the following baseline Jira API scopes:
- `read:jira-work` (covers issues, projects, fields, comments, changelogs, worklogs, votes,
watchers, remote links, permissions, project components, and project versions)
- `read:jira-user` (covers users and user groups, and the connection verification and timezone
lookup that run at startup against `GET /rest/api/3/myself`)
The API token owner additionally needs the **Browse projects** Jira permission on every project
that you want to ingest.
Some optional tables require additional scopes or permissions on top of the baseline:
| Table (Enabled Tables value) |
Additional Jira API scope |
Additional Jira permission |
| `ISSUE_VOTE` |
None. |
**View voters and watchers** on the relevant projects. |
| `ISSUE_WATCHER` |
None. |
**View voters and watchers** on the relevant projects. |
| `WORKLOG` |
None. |
**View worklogs** on the relevant projects. |
| `ISSUE_SECURITY_SCHEME` |
`manage:jira-configuration` |
**Administer Jira** (global). |
| `DELETED_ISSUE` (`Deletes Fetch Strategy = AUDIT`) |
`manage:jira-configuration` |
**Administer Jira** (global). |
Comments restricted to specific roles or groups are visible only when the API token owner is a member
of these roles or groups, regardless of the token scope or permission configuration.
Tokens without scopes are also supported and grant access based solely on the API token owner's
permissions. However, tokens with scopes are recommended for fine-grained access control.
## Set up Snowflake account
As a Snowflake account administrator, perform the following tasks:
1. Create a new role or use an existing role.
2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property).
3. Grant the Snowflake service user the role you created in the previous steps.
4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2.
5. Configure a secrets manager supported by Openflow (recommended), for example, AWS, Azure, and HashiCorp, and store the public and private keys in the secret store.
If for any reason, you don't want to use a secrets manager, then you are responsible for safeguarding the
public key and private key files used for key-pair authentication according to the security policies of your organization.
1. After the secrets manager is configured, determine how you will authenticate to it. On AWS, use the
EC2 instance role associated with Openflow as this way no other secrets have to be persisted.
2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the main menu (⋮) in the upper-right corner.
Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values.
3. At this point, all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow.
6. If any other Snowflake users require access to the tables ingested by the connector (for example, for custom processing in Snowflake),
then grant those users the role created in step 1.
7. Create a database and schema in Snowflake for the connector to store ingested data. Grant the following [](#label-database-privileges) to the role created in the first step.
```sql
CREATE DATABASE jira_destination_db;
CREATE SCHEMA jira_destination_db.jira_destination_schema;
GRANT USAGE ON DATABASE jira_destination_db TO ROLE ;
GRANT USAGE ON SCHEMA jira_destination_db.jira_destination_schema TO ROLE ;
GRANT CREATE TABLE ON SCHEMA jira_destination_db.jira_destination_schema TO ROLE ;
```
8. Create a warehouse that the connector will use or use an existing one. Start with the smallest warehouse size, then experiment with size depending on the
amount of data transferred. Large data volumes typically scale better with
[multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes.
9. Ensure that the user with the role used by the connector has the required privileges to use the warehouse. If that's not the case then grant the required privileges to the role.
```sql
CREATE WAREHOUSE jira_connector_warehouse WITH WAREHOUSE_SIZE = 'X-Small';
GRANT USAGE ON WAREHOUSE jira_connector_warehouse TO ROLE ;
```
## Set up the connector
The core flow is shipped as the %jiracore% process group. As a data engineer, perform the
following tasks to install and configure it.
### Install the connector
To install the connector, do the following as a data engineer:
1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**.
2. On the Openflow connectors page, find the connector and select **Add to runtime**.
3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**.
Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data.
4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.
5. Authenticate to the runtime with your Snowflake account credentials.
The Openflow canvas appears with the connector process group added to it.
After import, the core flow appears on the canvas as the %jiracore% process group.
### Configure the connector
1. Right-click on the imported %jiracore% process group and select **Parameters**.
2. Populate the required parameter values as described in
[Flow parameters](#label-jira-core-flow-parameters).
### Flow parameters
The core flow uses the following parameter contexts:
- [Jira Cloud (Core) Source Parameters](#label-jira-core-source-parameters): Used to establish connection with the Jira API.
- [Jira Cloud (Core) Destination Parameters](#label-jira-core-destination-parameters): Used to establish connection with Snowflake.
- [Jira Cloud (Core) Ingestion Parameters](#label-jira-core-ingestion-parameters): Used to define the configuration of data ingested from Jira.
#### Jira Cloud (Core) Source Parameters
| Parameter |
Description |
| Jira Email |
Email address for the Atlassian account used for authentication. |
| Jira API Token |
API access token for your Atlassian Jira account. See [Required API scopes](#label-jira-core-api-scopes) for the scopes to configure. |
| Environment URL |
URL to the Atlassian Jira environment. For example, `https://your-domain.atlassian.net`. |
#### Jira Cloud (Core) Destination Parameters
| Parameter |
Description |
Required |
| Destination Database |
The database where data will be persisted. It must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
|
Yes |
| Destination Schema |
The schema where data will be persisted, which must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME`
- `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
|
Yes |
| Snowflake Authentication Strategy |
When using:
- **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN.
This token is managed automatically by Snowflake.
BYOC deployments must have previously configured
[runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN.
- **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.
|
Yes |
| Snowflake Account Identifier |
When using:
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted.
|
Yes |
| Snowflake Private Key |
When using:
- **Session Token Authentication Strategy**: Must be blank.
-
- **KEY_PAIR**: Must be the RSA private key used for authentication.
-
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers.
Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
|
No |
| Snowflake Private Key File |
When using:
- **Session token authentication strategy**: The private key file must be blank.
- **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake,
formatted according to PKCS8 standards and including standard PEM headers and footers.
The header line begins with `-----BEGIN PRIVATE`.
To upload the private key file, select the **Reference asset** checkbox.
|
No |
| Snowflake Private Key Password |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the password associated with the Snowflake private key file.
|
No |
| Snowflake Role |
When using
- **Session Token Authentication Strategy**: Use your Snowflake role.
You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime.
- **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user.
|
Yes |
| Snowflake Username |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance.
|
Yes |
| Oversized Value Strategy |
Determines how the connector handles values that exceed its internal size limits (16 MB) during replication.
Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table.
- **Set Null**: The value is replaced with `NULL` in the destination table.
Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
|
No |
| Snowflake Warehouse |
Snowflake warehouse used to run queries. |
Yes |
#### Jira Cloud (Core) Ingestion Parameters
| Parameter |
Description |
| Enabled Tables |
Comma-separated list of optional tables to populate. See [](#label-jira-core-enabled-tables) for the full list of values and guidance on which tables to enable. Default value: `CHANGELOG, COMMENT, WORKLOG`. |
| Issue Fields |
A list of fields to return for each issue, used to retrieve a subset of fields. See [](#label-jira-core-issue-fields) for available values and custom field handling. Default value: `*standard`. |
| Project Keys Filter |
Optional comma-separated list of Jira project keys to limit ingestion to specific projects. If empty, all projects accessible by the API token owner are fetched. For example, `PROJ1, PROJ2`. |
| Deletes Fetch Strategy |
Strategy for fetching deleted issues. Set to `NONE` to skip delete tracking, or `AUDIT` to fetch deleted issues from the Jira audit log endpoint. The `AUDIT` strategy requires the API token owner to have the **Administer Jira** global permission and the `manage:jira-configuration` scope. Default value: `NONE`. |
| Merge Interval |
Time interval between journal-to-destination merge operations. When a merge runs, the Snowflake warehouse resumes. The merge is skipped if no new data has been loaded since the previous merge. Default value: `1 min`. |
## Run the flow
1. Right-click on the canvas and select **Enable all Controller Services**.
2. Right-click on the %jiracore% process group and select **Start**. The flow starts the data ingestion.
On first run, the flow creates the required Snowflake tables in the destination schema. See
[](#label-jira-entities) for the full list of tables created by the core flow and the parameters
that control which optional tables are populated.
## Resetting the connector state
If you need to change the project filter or want to restart the ingestion from scratch, you must
clear the ingestion state. The core flow uses a centralized state service rather than per-processor state.
To reset the state, perform the following steps:
1. Right-click the %jiracore% process group and select **Stop**.
2. Navigate to the **Controller Settings** for the process group.
3. Find the **StandardJiraIngestionStateService** controller service and select **View State**.
4. Select **Clear State**. This clears all project tracking, pagination, and timestamp state.
5. Optionally, update the connector parameters if needed.
6. Right-click the %jiracore% process group and select **Start**.
Clearing the ingestion state causes the connector to re-fetch all data from the beginning. The
destination tables are not truncated. Existing rows are updated in place, and rows that no
longer exist in Jira are flagged with `_SNOWFLAKE_DELETED = TRUE`.
## Accessing the data
Data fetched from Jira is available in the destination tables with explicit column schemas. There is
no need to use JSON flattening or views to query the data.
Each entity is stored in its own table. For example, to query issues and their comments:
```sql
SELECT i.KEY, i.SUMMARY, c.BODY AS comment_body, c.CREATED AS comment_created
FROM ISSUE i
JOIN COMMENT c ON i.ID = c.ISSUE_ID
ORDER BY c.CREATED DESC;
```
To exclude deleted issues from query results, filter on the connector-managed
`_SNOWFLAKE_DELETED` column. The connector sets this flag to `TRUE` on the matching `ISSUE`
row when an issue is deleted in Jira, so no anti-join against `DELETED_ISSUE` is needed:
```sql
SELECT i.*
FROM ISSUE i
WHERE i._SNOWFLAKE_DELETED = FALSE;
```
The `DELETED_ISSUE` table is still useful when you need the deletion timestamp or the user who
performed the deletion. See [](#label-jira-metadata-columns) for the full set of connector-managed
metadata columns.
## Enabled tables configuration
The `Enabled Tables` parameter controls which optional tables are populated. Ingestion of the
`ISSUE`, `PROJECT`, `USER`, and `FIELD` tables is always enabled and can't be disabled.
Enabling all tables may cause performance issues and require a larger runtime.
Available values for `Enabled Tables`:
- `CHANGELOG` (field change history for issues)
- `COMMENT` (comments on issues)
- `ISSUE_REMOTE_LINK` (remote links attached to issues)
- `ISSUE_SECURITY_SCHEME` (issue-level security configurations)
- `ISSUE_VOTE` (users who voted on issues)
- `ISSUE_WATCHER` (users watching issues)
- `PERMISSION` (global and project permission definitions)
- `PROJECT_COMPONENT` (components defined in a project)
- `PROJECT_VERSION` (release versions of a project)
- `USER_GROUP` (group memberships per user)
- `WORKLOG` (time tracking entries on issues)
The per-issue tables (`CHANGELOG`, `COMMENT`, `ISSUE_REMOTE_LINK`,
`ISSUE_VOTE`, `ISSUE_WATCHER`, `WORKLOG`) and per-project tables
(`PROJECT_COMPONENT`, `PROJECT_VERSION`) only ingest data for issues and projects
that are also covered by `Project Keys Filter`.
Some tables are populated by calling the Jira API once per parent entity (for example, once per
user or once per issue). On large Jira instances, enabling these tables can significantly increase
the number of API calls and the load on the ingestion runtime, and can slow down population of the
parent table due to back-pressure on the upstream processor. Enable these tables only when the
corresponding data is required.
## Issue fields configuration
The `ISSUE` table schema depends on the `Issue Fields` parameter. The parameter accepts a
comma-separated list of field IDs or one of the special values below. Prefix a field with a minus
(`-`) to exclude it. For example, `*all,-description` returns all fields except `description`.
- `*standard` (default): Fetches standard Jira fields (Summary, Status, Priority, Assignee).
- `*navigable`: Fetches all navigable fields.
- `*all`: Fetches all fields, including custom fields.
- Individual field IDs can be specified (for example, `summary,status,customfield_10001`).
The default value `*standard` **doesn't include custom fields**. To ingest custom fields,
set this parameter to `*all` or list the custom fields explicitly by ID, for example,
`*standard,customfield_10001`. To find custom field IDs, follow
[this guide](https://confluence.atlassian.com/jirakb/get-custom-field-ids-for-jira-and-jira-service-management-744522503.html).
Column names in the `ISSUE` table are derived from Jira field display names by:
1. Uppercasing the display name.
2. Replacing spaces with underscores.
3. Removing every character that isn't a letter, digit, or underscore.
For example, the display name `OF Test (Multi-User)` becomes the column `OF_TEST_MULTIUSER`.
If two fields produce the same column name after this transformation, the second field's column
is suffixed with `__` to keep names unique. For example, two fields with
display name `Custom Field` and IDs `customfield_1` and `customfield_2` produce columns
`CUSTOM_FIELD` and `CUSTOM_FIELD__CUSTOMFIELD_2`.
Jira field types are mapped to Snowflake column types as follows:
| Jira field type |
Snowflake column type |
| `number` |
NUMBER |
| `array` |
ARRAY |
| `progress`, `votes`, `watches`, `timetracking` |
VARIANT |
| All other types |
VARCHAR |
## Next steps
- [](/user-guide/data-integration/openflow/connectors/jira-cloud/setup-agile) to install the agile flow.
- [](/user-guide/data-integration/openflow/connectors/jira-cloud/migrate-from-legacy) if you're moving from a previous version of the Jira Cloud connector.
---
title: Set up the Openflow Connector for Amazon Ads
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/amazon-ads/setup.md
section: Loading & Unloading Data
---
# Set up the %amazonads%
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
This topic describes the steps to set up the %amazonads%.
## Prerequisites
1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/amazon-ads/about).
2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs).
3. If using %ofsfspcs-plural%, ensure that you have reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list)
and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-amazon-ads) connector.
## Get the credentials
As an Amazon Ads administrator, perform the following actions:
1. Make sure that you have access to an [Amazon Ads account](https://advertising.amazon.com/).
2. [Acquire Access to Amazon Ads API](https://advertising.amazon.com/API/docs/en-us/guides/onboarding/overview) and complete the onboarding process.
3. [Get client ID and client secret](https://advertising.amazon.com/API/docs/en-us/guides/get-started/retrieve-access-token).
4. [Create an authorization grant](https://advertising.amazon.com/API/docs/en-us/guides/get-started/create-authorization-grant)
and [retrieve a refresh token](https://advertising.amazon.com/API/docs/en-us/guides/get-started/retrieve-access-token).
5. Review the [available regions](https://advertising.amazon.com/API/docs/en-us/reference/api-overview#api-endpoints)
and get a base URL used for requests based on the region in which you are advertising.
6. [Fetch profile IDs](https://advertising.amazon.com/API/docs/en-us/guides/get-started/retrieve-profiles) for report configuration.
## Set up Snowflake account
As a Snowflake account administrator, perform the following tasks:
1. Create a new role or use an existing role and grant the [](#label-database-privileges).
2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property).
3. Grant the Snowflake service user the role you created in the previous steps.
4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2.
5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow,
for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store.
If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the
public key and private key files used for key-pair authentication according to the security policies of your organization.
1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the
EC2 instance role associated with Openflow as this way no other secrets have to be persisted.
2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right.
Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values.
3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow.
6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake),
then grant those users the role created in step 1.
7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated,
and the amount of data transferred. Large table numbers typically scale better with
[multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes.
## Set up the connector
As a data engineer, perform the following tasks to install and configure the connector:
### Install the connector
To install the connector, do the following as a data engineer:
1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**.
2. On the Openflow connectors page, find the connector and select **Add to runtime**.
3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**.
Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data.
4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.
5. Authenticate to the runtime with your Snowflake account credentials.
The Openflow canvas appears with the connector process group added to it.
### Configure the connector
1. Right-click on the imported process group and select **Parameters**.
2. Populate the required parameter values as described in [Flow parameters](#flow-parameters).
### Flow parameters
This section describes the flow parameters that you can configure based on the following parameter contexts:
- [Amazon Ads source parameters](#amazon-ads-source-parameters): Used to establish connection with Amazon Ads API.
- [Amazon Ads destination parameters](#amazon-ads-destination-parameters): Used to establish connection with Snowflake.
- [Amazon Ads ingestion parameters](#amazon-ads-ingestion-parameters): Used to define the configuration of data downloaded from Amazon Ads.
#### Amazon Ads source parameters
| Parameter |
Description |
| Client ID |
Client ID of the Amazon Advertising account |
| Client Secret |
Client secret of the Amazon Advertising account |
| OAuth Base URL |
The URL of the authorization server that issues the access token
- Possible values:
-
- [https://api.amazon.com/auth/o2/token](https://api.amazon.com/auth/o2/token)
- [https://api.amazon.co.uk/auth/o2/token](https://api.amazon.co.uk/auth/o2/token)
- [https://api.amazon.co.jp/auth/o2/token](https://api.amazon.co.jp/auth/o2/token)
|
| Refresh Token |
Refresh Token for Amazon Ads API |
| Region |
Environment from which the advertising data is downloaded
- Possible values:
-
- NA
- EU
- FE
|
#### Amazon Ads destination parameters
| Parameter |
Description |
Required |
| Destination Database |
The database where data will be persisted. It must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
|
Yes |
| Destination Schema |
The schema where data will be persisted, which must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME`
- `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
|
Yes |
| Snowflake Authentication Strategy |
When using:
- **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN.
This token is managed automatically by Snowflake.
BYOC deployments must have previously configured
[runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN.
- **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.
|
Yes |
| Snowflake Account Identifier |
When using:
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted.
|
Yes |
| Snowflake Private Key |
When using:
- **Session Token Authentication Strategy**: Must be blank.
-
- **KEY_PAIR**: Must be the RSA private key used for authentication.
-
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers.
Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
|
No |
| Snowflake Private Key File |
When using:
- **Session token authentication strategy**: The private key file must be blank.
- **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake,
formatted according to PKCS8 standards and including standard PEM headers and footers.
The header line begins with `-----BEGIN PRIVATE`.
To upload the private key file, select the **Reference asset** checkbox.
|
No |
| Snowflake Private Key Password |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the password associated with the Snowflake private key file.
|
No |
| Snowflake Role |
When using
- **Session Token Authentication Strategy**: Use your Snowflake role.
You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime.
- **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user.
|
Yes |
| Snowflake Username |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance.
|
Yes |
| Oversized Value Strategy |
Determines how the connector handles values that exceed its internal size limits (16 MB) during replication.
Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table.
- **Set Null**: The value is replaced with `NULL` in the destination table.
Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
|
No |
| Snowflake Warehouse |
Snowflake warehouse used to run queries. |
Yes |
#### Amazon Ads Ingestion Parameters
| Parameter |
Description |
| Report Name |
Name of the report to be used as a destination table name.
The name must be unique within the destination schema.
|
| Report Ad Product |
Type of advertising product being reported
- Possible values:
-
- SPONSORED_PRODUCTS
- SPONSORED_BRANDS
- SPONSORED_DISPLAY
- SPONSORED_TELEVISION
- DEMAND_SIDE_PLATFORM
|
| Report Columns |
Set of columns which will be present in the end report.
The list of available columns depends on the report type and can be found
in the [Amazon Ads API documentation](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/overview).
For example, for the `spCampaigns` report type, the list of available columns
can be found in the [Sponsored Products documentation](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/campaign#sponsored-products).
|
| Report Filters |
Set of filters used to trim the data returned.
The list of available filters depends on the report type and can be found in the [Amazon Ads API documentation](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/overview).
For example, for the `spCampaigns` report type, the list of available filters
can be found in the [Sponsored Products documentation](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/campaign#sponsored-products).
Filters must be in the format of `columnName=filterValue` and values must
separated by a comma (`,`). For example, `campaignStatus=ENABLED,PAUSED`.
|
| Report Group By |
Determines the level of granularity and how the data within the report will be aggregated and presented.
The list of available group by columns depends on the report type and can
be found in the [Amazon Ads API documentation](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/overview).
For example, for the `spCampaigns` report type, the list of available
group by columns can be found in the [Sponsored Products documentation](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/campaign#sponsored-products).
|
| Report Ingestion Strategy |
Mode in which data is fetched, either snapshot or incremental
- Possible values:
-
- `SNAPSHOT`
- `INCREMENTAL`
|
| Report Ingestion Window |
Specifies the number of days, data from which should be downloaded during
incremental ingestion. For example, with a 30-day report ingestion window,
an incremental load starts ingestion from 30 days prior to the last successful
ingestion date, unless this calculated date falls before the overall start date,
in which case ingestion begins from the overall start date.
If the `SNAPSHOT` ingestion strategy is used, all available data from the start date to the present is downloaded, so there is no need to use a report ingestion window.
|
| Report Profile ID |
The [profile ID](https://advertising.amazon.com/API/docs/en-us/guides/get-started/retrieve-profiles)
associated with an advertising account in a specific marketplace
|
| Report Time Unit |
Date aggregation
- Possible values:
-
- `DAILY`: Each day is represented by a one row
- `SUMMARY`: The whole ingested date period is represented as one row
|
| Report Type |
The Amazon Ads API supports a number of [report types](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/overview).
For example: [sbAds](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/ad) and [spCampaigns](https://advertising.amazon.com/API/docs/en-us/guides/reporting/v3/report-types/campaign).
Copy value of `reportTypeId` from the documentation and paste it into the parameter value.
|
| Report Start Date |
Start date from which the ingestion should happen. The date format is YYYY-MM-DD. |
| Report Schedule |
Schedule time for processor creating reports. For example: `8 h` or `1 d`. The `h` represents hours and `d` days. |
Data retention in the Amazon Ads API is a specific timeframe, ranging from 60
to 365 days depending on the report type, during which historical advertising
performance data is stored and accessible for retrieval.
After this period, older data may no longer be available.
## Run the flow
1. Right-click on the plane and select **Enable all Controller Services**.
2. Right-click on the imported process group and select **Start**.
The connector starts the data ingestion.
---
title: Set up the Openflow Connector for Box
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/box/setup.md
section: Loading & Unloading Data
---
# Set up the %box%
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
This topic describes the steps to set up the %box%.
## Prerequisites
1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/box/about).
2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs).
3. If using %ofsfspcs-plural%, ensure that you have reviewed [configuring requireddomains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list)
and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-box) connector.
## Get the credentials
As a **Box developer** or **Box administrator**, create a [Box Platform application](https://developer.box.com/guides/applications/app-types/platform-apps/) as follows:
1. Navigate to [Box Developer Console](https://app.box.com/developers/console).
2. Select **Create Platform App**.
3. Select **Custom App** as the application type.
4. Provide a name and description for the app, and select a purpose from the drop-down list.
5. Select **Server Authentication (with JWT)** as the authentication method.
6. Select **Create App**.
7. To configure the app, navigate to the **Configuration** tab.
8. In the **App Access Level** section, select **App + Enterprise Access**.
9. In the **Application Scopes** section, select the following options:
- **Read all files and folders stored in Box**.
- **Write all files and folders stored in Box**: To download files and folders. Note that the connector can't upload any files.
Snowflake recommends granting the service account with only the Viewer role.
To grant the application access to files in Box, select a folder that you want to synchronize. Share it with the app service account using the email of the service account from step n.
%box% is able to discover and download files from the specified folder and all its subfolders, but it cannot modify the files.
- **Manage users**: To read users in the enterprise.
- **Manage groups**: To read groups and their members in the enterprise.
- **Manage enterprise properties**: To read enterprise events.
10. In the **Add and Manage Public Keys** section, generate a public/private key pair. Box downloads a JSON configuration file with a private key.
11. Save the changes.
12. Navigate to the **Authorization** tab, and submit the app for authorization for access to the enterprise.
13. Request your enterprise administrator to approve the app.
14. After the approval is granted, go to the **General Settings** tab and save the app service account email address.
For more information, see [Setup with JWT](https://developer.box.com/guides/authentication/jwt/jwt-setup/).
## Set up Snowflake account
As a Snowflake account administrator, perform the following tasks manually
or by using the script included below:
1. Create a new role or use an existing role and grant the [](#label-database-privileges).
2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property).
3. Grant the Snowflake service user the role you created in the previous steps.
4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2.
5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store.
If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the
public key and private key files used for key-pair authentication according to the security policies of your organization.
1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the
EC2 instance role associated with Openflow as this way no other secrets have to be persisted.
2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right.
Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values.
3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow.
6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake),
then grant those users the role created in step 1.
7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated,
and the amount of data transferred. Large table numbers typically scale better with
[multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes.
### Example setup
```sql
--The following script assumes you'll need to create all required roles, users, and objects.
--However, you may want to reuse some that are already in existence.
--Create a Snowflake service user to manage the connector
USE ROLE USERADMIN;
CREATE USER TYPE=SERVICE COMMENT='Service user for Openflow automation';
--Create a pair of secure keys (public and private). For more information, see
--key-pair authentication. Store the private key for the user in a file to supply
--to the connector’s configuration. Assign the public key to the Snowflake service user:
ALTER USER SET RSA_PUBLIC_KEY = '';
--Create a role to manage the connector and the associated data and
--grant it to that user
USE ROLE SECURITYADMIN;
CREATE ROLE ;
GRANT ROLE TO USER ;
--The following block is for the use case: Ingest files and perform processing with Cortex
--Create a role for read access to the cortex search service created by this connector.
--This role should be granted to any role that will use the service
CREATE ROLE ;
GRANT ROLE TO ROLE ;
--Create the database the data will be stored in and grant usage to the roles created
USE ROLE ACCOUNTADMIN; --use whatever role you want to own your DB
CREATE DATABASE IF NOT EXISTS ;
GRANT USAGE ON DATABASE TO ROLE ;
--Create the schema the data will be stored in and grant the necessary privileges
--on that schema to the connector admin role:
USE DATABASE ;
CREATE SCHEMA IF NOT EXISTS ;
GRANT USAGE ON SCHEMA TO ROLE ;
GRANT CREATE TABLE, CREATE DYNAMIC TABLE, CREATE STAGE, CREATE SEQUENCE, CREATE CORTEX
SEARCH SERVICE ON SCHEMA TO ROLE ;
--The following block is for use case: Ingest files and perform processing with Cortex
--Grant the Cortex read-only role access to the database and schema
GRANT USAGE ON DATABASE TO ROLE ;
GRANT USAGE ON SCHEMA TO ROLE ;
--Create the warehouse this connector will use if it doesn't already exist. Grant the
--appropriate privileges to the connector admin role. Adjust the size according to your needs.
CREATE WAREHOUSE
WITH
WAREHOUSE_SIZE = 'MEDIUM'
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE;
GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE ;
```
## Use cases
You can configure the connector for the following use cases:
- [Ingest files only](#ingest-files-only)
- [Ingest files and perform processing with Cortex](#ingest-files-and-perform-processing-with-cortex)
- [Extract Box metadata using Box AI and ingest it into a Snowflake table](#extract-box-metadata-using-box-ai-and-ingest-it-into-a-snowflake-table)
- [Synchronize Box file metadata instances with a Snowflake table](#synchronize-box-file-metadata-instances-with-a-snowflake-table)
### Ingest files only
Use the connector definition to perform custom processing on ingested files.
#### Set up the connector
As a data engineer, perform the following tasks to install and configure the connector:
##### Install the connector
To install the connector, do the following as a data engineer:
1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**.
2. On the Openflow connectors page, find the connector and select **Add to runtime**.
3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**.
Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data.
4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.
5. Authenticate to the runtime with your Snowflake account credentials.
The Openflow canvas appears with the connector process group added to it.
##### Configure the connector
1. Right-click on the imported process group and select **Parameters**.
2. Enter the required parameter values as described in [Box ingestion parameters](#box-ingestion-parameters), [Box destination parameters](#box-destination-parameters) and [Box source parameters](#box-source-parameters).
###### Box source parameters
| Parameter |
Description |
| Box App Config JSON |
An application JSON configuration that was downloaded during the app creation. |
| Box App Config File |
An application json file that was downloaded during the app creation. Either "Box App Config File" or "Box App Config JSON" has to be set. Select the **Reference asset** checkbox to upload the config file. |
###### Box destination parameters
| Parameter |
Description |
Required |
| Destination Database |
The database where data will be persisted. It must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
|
Yes |
| Destination Schema |
The schema where data will be persisted, which must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME`
- `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
|
Yes |
| Snowflake Authentication Strategy |
When using:
- **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN.
This token is managed automatically by Snowflake.
BYOC deployments must have previously configured
[runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN.
- **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.
|
Yes |
| Snowflake Account Identifier |
When using:
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted.
|
Yes |
| Snowflake Private Key |
When using:
- **Session Token Authentication Strategy**: Must be blank.
-
- **KEY_PAIR**: Must be the RSA private key used for authentication.
-
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers.
Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
|
No |
| Snowflake Private Key File |
When using:
- **Session token authentication strategy**: The private key file must be blank.
- **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake,
formatted according to PKCS8 standards and including standard PEM headers and footers.
The header line begins with `-----BEGIN PRIVATE`.
To upload the private key file, select the **Reference asset** checkbox.
|
No |
| Snowflake Private Key Password |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the password associated with the Snowflake private key file.
|
No |
| Snowflake Role |
When using
- **Session Token Authentication Strategy**: Use your Snowflake role.
You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime.
- **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user.
|
Yes |
| Snowflake Username |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance.
|
Yes |
| Oversized Value Strategy |
Determines how the connector handles values that exceed its internal size limits (16 MB) during replication.
Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table.
- **Set Null**: The value is replaced with `NULL` in the destination table.
Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
|
No |
| Snowflake Warehouse |
Snowflake warehouse used to run queries. |
Yes |
###### Box ingestion parameters
| Parameter |
Description |
| Box Folder ID |
The ID of the folder to read the files from. Set this to `0` to synchronize all folders the Box app has access to. It can be retrieved from the URL, for example [https://app.box.com/folder/FOLDER_ID](https://app.box.com/folder/FOLDER_ID). |
| File Extensions To Ingest |
A comma-separated list that specifies file extensions to ingest. The connector tries to convert the files to PDF format first, if possible. Nonetheless, the extension check is performed on the original file extension. If some of the specified file extensions are not supported by Cortex Parse Document, then the connector ignores those files, logs a warning message in an event log, and continues processing other files. |
| Snowflake File Hash Table Name |
Name of the table to store file hashes to determine if the content has changed. This parameter should generally not be changed. |
#### Run the flow
1. Right-click on the plane and select **Enable all Controller Services**.
2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion.
After starting the connector, it retrieves all files from the specified folder, and then consumes `admin_logs_streaming` events within the last 14 days.
This is done to capture data that may otherwise have been missed during the initialization process.
During that time, `not found` errors may occur, which are caused by files that appear in the events but are no longer present.
### Ingest files and perform processing with Cortex
Use the connector definition to:
- Create AI assistants for public documents within your organization's Box enterprise
- Enable your AI assistants to adhere to access controls specified in your organization's Box enterprise
#### Set up the connector
As a data engineer, perform the following tasks to install and configure the connector:
##### Install the connector
To install the connector, do the following as a data engineer:
1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**.
2. On the Openflow connectors page, find the connector and select **Add to runtime**.
3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**.
Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data.
4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.
5. Authenticate to the runtime with your Snowflake account credentials.
The Openflow canvas appears with the connector process group added to it.
##### Configure the connector
1. Right-click on the imported process group and select **Parameters**.
2. Populate the required parameter values as described in [Box Cortex Connect Ingestion Parameters](#box-cortex-connect-ingestion-parameters), [Box Cortex Connect Destination Parameters](#box-cortex-connect-destination-parameters) and [Box Cortex Connect Source Parameters](#box-cortex-connect-source-parameters).
###### Box Cortex Connect Source Parameters
| Parameter |
Description |
| Box App Config JSON |
An application JSON configuration that was downloaded during the app creation. |
| Box App Config File |
An application json file that was downloaded during the app creation. Either "Box App Config File" or "Box App Config JSON" has to be set. Select the **Reference asset** checkbox to upload the config file. |
###### Box Cortex Connect Destination Parameters
| Parameter |
Description |
Required |
| Destination Database |
The database where data will be persisted. It must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
|
Yes |
| Destination Schema |
The schema where data will be persisted, which must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME`
- `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
|
Yes |
| Snowflake Authentication Strategy |
When using:
- **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN.
This token is managed automatically by Snowflake.
BYOC deployments must have previously configured
[runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN.
- **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.
|
Yes |
| Snowflake Account Identifier |
When using:
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted.
|
Yes |
| Snowflake Private Key |
When using:
- **Session Token Authentication Strategy**: Must be blank.
-
- **KEY_PAIR**: Must be the RSA private key used for authentication.
-
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers.
Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
|
No |
| Snowflake Private Key File |
When using:
- **Session token authentication strategy**: The private key file must be blank.
- **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake,
formatted according to PKCS8 standards and including standard PEM headers and footers.
The header line begins with `-----BEGIN PRIVATE`.
To upload the private key file, select the **Reference asset** checkbox.
|
No |
| Snowflake Private Key Password |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the password associated with the Snowflake private key file.
|
No |
| Snowflake Role |
When using
- **Session Token Authentication Strategy**: Use your Snowflake role.
You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime.
- **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user.
|
Yes |
| Snowflake Username |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance.
|
Yes |
| Oversized Value Strategy |
Determines how the connector handles values that exceed its internal size limits (16 MB) during replication.
Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table.
- **Set Null**: The value is replaced with `NULL` in the destination table.
Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
|
No |
| Snowflake Warehouse |
Snowflake warehouse used to run queries. |
Yes |
###### Box Cortex connect ingestion parameters
| Parameter |
Description |
| Box Folder ID |
The ID of the folder to read the files from. Set this to `0` to synchronize all folders the Box app has access to. It can be retrieved from the URL, for example [https://app.box.com/folder/FOLDER_ID](https://app.box.com/folder/FOLDER_ID). |
| File Extensions To Ingest |
A comma-separated list that specifies file extensions to ingest. The connector tries to convert the files to PDF format first, if possible. Nonetheless, the extension check is performed on the original file extension. If some of the specified file extensions are not supported by Cortex Parse Document, then the connector ignores those files, logs a warning message in an event log, and continues processing other files. |
| Snowflake File Hash Table Name |
Name of the table to store file hashes to determine if the content has changed. This parameter should generally not be changed. |
| OCR Mode |
The OCR mode to use when parsing files with [](/user-guide/snowflake-cortex/parse-document) function. The value can be `OCR` or `LAYOUT`. |
| Snowflake Cortex Search Service User Role |
An identifier of a role that is assigned usage permissions on the Cortex Search service. |
| Snowflake File Hash Table Name |
Name of the table to store file hashes to determine if the content has changed. This parameter should generally not be changed. |
#### Run the flow
1. Right-click on the plane and select **Enable all Controller Services**.
2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion.
After starting the connector, it retrieves all files from the specified folder, and then consumes `admin_logs_streaming` events within the last 14 days.
This is done to capture any data that may have been missed during the initialization process.
During that time, `not found` errors may occur, caused by the files that appear in the events but are no longer present.
#### Query the Cortex Search service
You can use the [Cortex Search](/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) service to build chat
and search applications to chat with or query your documents in Box.
After you install and configure the connector and it begins
ingesting content from Box, you can query the Cortex Search service.
For more information about using Cortex Search, see [Query a Cortex Search service](/user-guide/snowflake-cortex/cortex-search/query-cortex-search-service).
**Filter responses**
To restrict responses from the Cortex Search service to documents that a specific user
has access to in Box, you can specify a filter containing the user ID or email address of the user
when you query Cortex Search. For example, `filter.@contains.user_ids` or `filter.@contains.user_emails`.
The name of the Cortex Search service created by the connector is `search_service` in the schema `Cortex`.
Run the following SQL code in a SQL worksheet to query
the Cortex Search service with files ingested from your Box site.
Replace the following:
- application_instance_name: Name of your database and connector application instance.
- user_emailID: Email ID of the user who you want to filter the responses for.
- your_question: The question that you want to get responses for.
- number_of_results: Maximum number of results to return in the response. The maximum value is 1,000 and the default value is 10.
```sql
SELECT PARSE_JSON(
SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
'.cortex.search_service',
'{
"query": "",
"columns": ["chunk", "web_url"],
"filter": {"@contains": {"user_emails": ""} },
"limit":
}'
)
)['results'] AS results
```
Here is a complete list of values that you can enter for `columns`:
| Column name |
Type |
Description |
| `full_name` |
String |
A full path to the file from the Box site documents root. Example: `folder_1/folder_2/file_name.pdf`. |
| `web_url` |
String |
A URL that displays an original Box file in a browser. |
| `last_modified_date_time` |
String |
Date and time when the item was most recently modified. |
| `chunk` |
String |
A piece of text from the document that matched the Cortex Search query. |
| `user_ids` |
Array |
An array of user IDs that have access to the document. |
| `user_emails` |
Array |
An array of user email IDs that have access to the document. It also includes user email IDs from all the Microsoft 365 groups that are assigned to the document. |
**Example: Query an AI assistant for human resources (HR) information**
You can use Cortex Search to query an AI assistant for employees to chat with the latest versions of
HR information, such as onboarding, code of conduct, team processes, and organization policies.
Using response filters, you can also allow HR team members to query employee contracts while adhering to access controls configured in Box.
Run the following in a [SQL worksheet](#label-snowsight-worksheets-create-file) to query the Cortex Search service with files ingested from Box.
Select the database as your application instance name and schema as **Cortex**.
Replace the following:
- application_instance_name: Name of your database and connector application instance.
- user_emailID: Email ID of the user who you want to filter the responses for.
```sql
SELECT PARSE_JSON(
SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
'.cortex.search_service',
'{
"query": "What is my vacation carryover policy?",
"columns": ["chunk", "web_url"],
"filter": {"@contains": {"user_emails": ""} },
"limit": 1
}'
)
)['results'] AS results
```
**Python:**
Run the following code in a [Python worksheet](#label-snowsight-worksheets-create) to query the
Cortex Search service with files ingested from Box.
Ensure that you add the `snowflake.core` package to your database.
Replace the following:
- application_instance_name: Name of your database and connector application instance.
- user_emailID: Email ID of the user who you want to filter the responses for.
```python
from snowflake.snowpark import Session
from snowflake.core import Root
def main(session: snowpark.Session):
root = Root(session)
# fetch service
my_service = (root
.databases[""]
.schemas["cortex"]
.cortex_search_services["search_service"]
)
# query service
resp = my_service.search(
query="What is my vacation carryover policy?",
columns = ["chunk", "web_url"],
filter = {"@contains": {"user_emails": ""} },
limit=1
)
return (resp.to_json())
```
**REST API:**
Execute the following code in a command-line interface to query the Cortex Search
service with files ingested from your Box.
Access to the Snowflake REST APIs requires authentication via both key pair authentication and OAuth.
For more information,
see [](#label-cortex-search-query-syntax-rest)
and [](/developer-guide/snowflake-rest-api/authentication).
Replace the following:
- application_instance_name: Name of your database and connector application instance.
- account_url: Your Snowflake account URL. For instructions on finding your account URL, see [](#label-account-name-find).
```bash
curl --location "https:///api/v2/databases//schemas/cortex/cortex-search-services/search_service" \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer " \
--data '{
"query": "What is my vacation carryover policy?",
"columns": ["chunk", "web_url"],
"limit": 1
}'
```
Sample response:
```text
{
"results" : [ {
"web_url" : "https://.box.com/sites//",
"chunk" : "Answer to the question asked."
} ]
}
```
### Extract Box metadata using Box AI and ingest it into a Snowflake table
Use the connector definition to:
- Extract metadata about your Box files and ingest them to into a Snowflake table
- Perform operations on the metadata of your files stored in Box
#### Create a Snowflake table for storing the Box metadata
1. Ensure that Box AI is enabled for the extraction of metadata to occur. For more information, see [Configuring Box AI](https://support.box.com/hc/en-us/articles/22166647877011-Configuring-Box-AI).
2. Create a Snowflake table where the metadata will be sent
For the connector to know what kind of metadata to extract, you must create a Snowflake table in your database and schema with the column names of the fields you would like to extract.
Add descriptions to each column to improve the performance of the model used to extract the metadata from the files.
3. In the table created in the previous step, ensure that there is a column to store the Box file ID and that it is of type VARCHAR.
The name of this column is required to be entered as the Box File Identifier Column parameter in later steps.
The list of supported columns types for the metadata table is VARCHAR, STRING, TEXT, FLOAT, DOUBLE, and DATE.
Here is an example of the table that you can create for this connector:
```sql
CREATE OR REPLACE TABLE OPENFLOW.BOX_METADATA_SCHEMA.LOAN_AGREEMENT_METADATA (
BOX_FILE_ID VARCHAR COMMENT 'Box file identifier column',
LOAN_ID STRING COMMENT 'Unique loan agreement identifier (e.g. L-2025-0001)',
BORROWER_NAME STRING COMMENT 'Name of the borrower entity or individual',
LENDER_NAME STRING COMMENT 'Name of the lending institution',
LOAN_AMOUNT DOUBLE COMMENT 'Principal amount of the loan (in USD)',
INTEREST_RATE FLOAT COMMENT 'Annual interest rate (%)',
EFFECTIVE_DATE DATE COMMENT 'Date on which the loan becomes effective',
MATURITY_DATE DATE COMMENT 'Scheduled loan maturity date',
LOAN_TERM_MONTHS FLOAT COMMENT 'Original term length in months',
COLLATERAL_DESCRIPTION TEXT COMMENT 'Description of collateral securing the loan',
CREDIT_SCORE FLOAT COMMENT 'Borrower credit score',
JURISDICTION STRING COMMENT 'Governing law jurisdiction (e.g. NY, CA)'
);
```
#### Set up the connector
As a data engineer, perform the following tasks to install and configure the connector:
##### Install the connector
To install the connector, do the following as a data engineer:
1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**.
2. On the Openflow connectors page, find the connector and select **Add to runtime**.
3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**.
Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data.
4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.
5. Authenticate to the runtime with your Snowflake account credentials.
The Openflow canvas appears with the connector process group added to it.
##### Configure the connector
1. Right-click on the imported process group and select **Parameters**.
2. Populate the required parameter values as described in [Box Ingest Metadata Source Parameters](#box-ingest-metadata-source-parameters), [Box Ingest Metadata Destination Parameters](#box-ingest-metadata-destination-parameters) and [Box Ingest Metadata Ingestion Parameters](#box-ingest-metadata-ingestion-parameters).
###### Box Ingest Metadata Source Parameters
| Parameter |
Description |
| Box App Config JSON |
An application JSON configuration that was downloaded during the app creation. |
| Box App Config File |
An application json file that was downloaded during the app creation. Either "Box App Config File" or "Box App Config JSON" has to be set. Select the **Reference asset** checkbox to upload the config file. |
###### Box Ingest Metadata Destination Parameters
| Parameter |
Description |
Required |
| Destination Database |
The database where data will be persisted. It must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
|
Yes |
| Destination Schema |
The schema where data will be persisted, which must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME`
- `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
|
Yes |
| Snowflake Authentication Strategy |
When using:
- **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN.
This token is managed automatically by Snowflake.
BYOC deployments must have previously configured
[runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN.
- **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.
|
Yes |
| Snowflake Account Identifier |
When using:
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted.
|
Yes |
| Snowflake Private Key |
When using:
- **Session Token Authentication Strategy**: Must be blank.
-
- **KEY_PAIR**: Must be the RSA private key used for authentication.
-
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers.
Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
|
No |
| Snowflake Private Key File |
When using:
- **Session token authentication strategy**: The private key file must be blank.
- **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake,
formatted according to PKCS8 standards and including standard PEM headers and footers.
The header line begins with `-----BEGIN PRIVATE`.
To upload the private key file, select the **Reference asset** checkbox.
|
No |
| Snowflake Private Key Password |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the password associated with the Snowflake private key file.
|
No |
| Snowflake Role |
When using
- **Session Token Authentication Strategy**: Use your Snowflake role.
You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime.
- **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user.
|
Yes |
| Snowflake Username |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance.
|
Yes |
| Oversized Value Strategy |
Determines how the connector handles values that exceed its internal size limits (16 MB) during replication.
Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table.
- **Set Null**: The value is replaced with `NULL` in the destination table.
Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
|
No |
| Snowflake Warehouse |
Snowflake warehouse used to run queries. |
Yes |
###### Box Ingest Metadata Ingestion Parameters
| Parameter |
Description |
| Box Folder ID |
The ID of the folder to read the files from. Set this to `0` to synchronize all folders the Box app has access to. The ID can be retrieved from the URL, for example [https://app.box.com/folder/FOLDER_ID](https://app.box.com/folder/FOLDER_ID). |
| Box File Identifier Column |
The column of the metadata table that will store the Box file ID to associate the given metadata with a file. This column must be of type VARCHAR and be part of the table created in [Create a Snowflake table for storing the Box metadata](#create-a-snowflake-table-for-storing-the-box-metadata). |
| Destination Metadata Table |
The Snowflake table you created in [Create a Snowflake table for storing the Box metadata](#create-a-snowflake-table-for-storing-the-box-metadata), which has the columns of the metadata you want to collect. |
#### Run the flow
1. Right-click on the plane and select **Enable all Controller Services**.
2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion.
After starting the connector, it retrieves all files from the specified folder, and then consumes `admin_logs_streaming` events from the last 14 days.
This is done to capture any data that may have been missed during the initialization process.
During that time, `not found` errors may occur, caused by the files that appear in the events but are no longer present.
### Synchronize Box file metadata instances with a Snowflake table
Use the connector definition to perform a data transformation on metadata
from Box in a Snowflake table and add the changes back to a Box metadata instance.
#### Create a Snowflake stream for storing the Box metadata
1. Create a Snowflake stream for the metadata table you want to use. The stream is used to monitor any changes that occur to the table with which you want to synchronize your Box files.
To learn how to create a table for storing Box metadata, see [Create a Snowflake table for storing the Box metadata](#create-a-snowflake-table-for-storing-the-box-metadata).
If the connector is stopped beyond the data retention time and the stream becomes stale, then you must recreate a stream and replace the previous one. To learn more about managing streams, see [](/user-guide/streams-manage).
Here is an example of a stream that you can create for this connector:
```sql
CREATE OR REPLACE STREAM OPENFLOW.BOX_METADATA_SCHEMA.LOAN_AGREEMENT_METADATA_STREAM
ON TABLE OPENFLOW.BOX_METADATA_SCHEMA.LOAN_AGREEMENT_METADATA
```
2. In the metadata table, ensure that there is a column to store the Box file ID and that it is of type VARCHAR.
The name of this column is required to be entered as the Box File Identifier Column parameter in later steps.
The list of supported columns types for the metadata table is VARCHAR, STRING, TEXT, FLOAT, DOUBLE, and DATE.
#### Set up the connector
As a data engineer, perform the following tasks to install and configure the connector:
##### Install the connector
To install the connector, do the following as a data engineer:
1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**.
2. On the Openflow connectors page, find the connector and select **Add to runtime**.
3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**.
Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data.
4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.
5. Authenticate to the runtime with your Snowflake account credentials.
The Openflow canvas appears with the connector process group added to it.
##### Configure the connector
1. Right-click on the imported process group and select **Parameters**.
2. Populate the required parameter values as described in [Box Publish Metadata Source Parameters](#box-publish-metadata-source-parameters), [Box Publish Metadata Destination Parameters](#box-publish-metadata-destination-parameters) and [Box Publish Metadata Ingestion Parameters](#box-publish-metadata-ingestion-parameters).
###### Box Publish Metadata Source Parameters
| Parameter |
Description |
| Source Database |
Snowflake Database that contains the schema that contains the Snowflake Stream that ingests the changes |
| Source Schema |
Schema that contains the Snowflake Stream that ingests the changes |
| Snowflake Account Identifier |
Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide your Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted. |
| Snowflake Authentication Strategy |
When using:
- **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN.
This token is managed automatically by Snowflake.
BYOC deployments must have previously configured
[runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN.
- **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.
|
| Snowflake Private Key |
Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide the RSA private key used for authentication. The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either Snowflake Private Key File or Snowflake Private Key must be defined. |
| Snowflake Private Key File |
Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, upload the file that contains the RSA Private Key used for authentication to Snowflake, formatted according to PKCS8 standards and having standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. Select the **Reference asset** checkbox to upload the private key file. |
| Snowflake Private Key Password |
Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide the password associated with the Snowflake Private Key File. |
| Snowflake Role |
When using Session Token for your Authentication Strategy, use your Snowflake Role. You can find your Snowflake Role in the Openflow UI, by going to View Details for your Runtime. When using Key Pair for your Authentication Strategy, use a valid role configured for your service user. |
| Snowflake Username |
Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide the user name used to connect to Snowflake instance. |
| Snowflake Warehouse |
Snowflake warehouse used to run queries |
| Snowflake Stream Name |
Snowflake stream name used for ingestion of changes from the source Snowflake table. You must create it before starting the connector and link to the table. |
###### Box Publish Metadata Destination Parameters
| Parameter |
Description |
| Box App Config JSON |
An application JSON configuration that was downloaded during the app creation. |
| Box App Config File |
An application json file that was downloaded during the app creation. Either "Box App Config File" or "Box App Config JSON" has to be set. Select the **Reference asset** checkbox to upload the config file. |
###### Box Publish Metadata Ingestion Parameters
| Parameter |
Description |
| Box File Identifier Column |
The column of the metadata table that will store the Box file ID to associate the given metadata with a file. This column must be of type VARCHAR and be part of the table created in [Create a Snowflake table for storing the Box metadata](#create-a-snowflake-table-for-storing-the-box-metadata). |
| Box Metadata Template Name |
Template name of the Box metadata template that will be added to the Box files. You don't need to manually create a template before starting the connector. If you enter a value in this parameter, a template is automatically created with this template name. The name provided should not overlap with any template that you have already created in your Box environment. |
| Box Metadata Template Key |
The Box template key of the Box metadata template that will be added to the Box files. This is the key that will be used to reference the template in the Box API. You don't need to manually create a template before starting the connector. If you enter a value in this parameter, a template is automatically created with this template key. The key provided should not overlap with any template that you have already created in your Box environment. |
#### Run the flow
1. Right-click on the plane and select **Enable all Controller Services**.
2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion.
After running the flow, you can query the Cortex Search service. For information on how to query the Cortex Search service, see [Query the Cortex Search service](#query-the-cortex-search-service).
### Finding files in stage
Files stored in the stage may have unreadable names. To find specific files, use the metadata
tables as your source of truth. These tables contain the mapping between file names and their
corresponding file IDs in the stage.
For Cortex-enabled setups, use the following query to find files:
```sql
SELECT DISTINCT METADATA:id FROM DOCS_CHUNKS WHERE METADATA:fullName LIKE '%';
```
For non-Cortex setups, use the following query:
```sql
SELECT FILE_ID FROM DOC_METADATA WHERE FILE_NAME = '';
```
Replace `` with the name or partial name of the file you're looking for.
The files in the stage start with the ID returned from these queries.
---
title: Set up the Openflow Connector for Google Ads
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/google-ads/setup.md
section: Loading & Unloading Data
---
# Set up the Openflow Connector for Google Ads
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
This topic describes the steps to set up the Openflow Connector for Google Ads.
## Prerequisites
1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/google-ads/about).
2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs).
3. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list)
and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-google-ads) connector.
## Get the credentials
As a Google Ads administrator, perform the following steps:
- Ensure that you have access to a Google Cloud project or [create a new one](https://developers.google.com/workspace/guides/create-project).
- Ensure that the [Google Ads API](https://cloud.google.com/endpoints/docs/openapi/enable-api) is
enabled for your Google Cloud project. Google Ads API access is
required to ingest data.
- [Configure](https://developers.google.com/google-ads/api/docs/oauth/service-accounts)
Service account authentication for Google Ads.
- Obtain developer token for your organization following
[instructions](https://developers.google.com/google-ads/api/docs/get-started/dev-token).
Developer token should have Access Level either Basic or Standard. For more information about Access Level please see [documentation](https://developers.google.com/google-ads/api/docs/access-levels).
## Set up Snowflake account
As a Snowflake account administrator, perform the following tasks:
1. Create a new role or use an existing role and grant the [](#label-database-privileges).
2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property).
3. Grant the Snowflake service user the role you created in the previous steps.
4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2.
5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store.
If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the
public key and private key files used for key-pair authentication according to the security policies of your organization.
1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the
EC2 instance role associated with Openflow as this way no other secrets have to be persisted.
2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right.
Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values.
3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow.
6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake),
then grant those users the role created in step 1.
7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated,
and the amount of data transferred. Large table numbers typically scale better with
[multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes.
## Set up the connector
As a data engineer, perform the following tasks to install and configure the connector:
### Install the connector
1. Create a database and schema in Snowflake for the connector to store ingested data. Grant required [](#label-database-privileges) to the role created in the first step.
Substitute the role placeholder with the actual value and use the following sql commands:
```sql
CREATE DATABASE GOOGLE_ADS_DESTINATION_DB;
CREATE SCHEMA GOOGLE_ADS_DESTINATION_DB.GOOGLE_ADS_DESTINATION_SCHEMA;
GRANT USAGE ON DATABASE GOOGLE_ADS_DESTINATION_DB TO ROLE ;
GRANT USAGE ON SCHEMA GOOGLE_ADS_DESTINATION_DB.GOOGLE_ADS_DESTINATION_SCHEMA TO ROLE ;
GRANT CREATE TABLE ON SCHEMA GOOGLE_ADS_DESTINATION_DB.GOOGLE_ADS_DESTINATION_SCHEMA TO ROLE ;
```
To install the connector, do the following as a data engineer:
1. Navigate to the Openflow overview page. In the **Featured connectors** section, select **View more connectors**.
2. On the Openflow connectors page, find the connector and select **Add to runtime**.
3. In the **Select runtime** dialog, select your runtime from the **Available runtimes** drop-down list and click **Add**.
Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data.
4. Authenticate to the deployment with your Snowflake account credentials and select **Allow** when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.
5. Authenticate to the runtime with your Snowflake account credentials.
The Openflow canvas appears with the connector process group added to it.
### Configure the connector
1. Right-click on the imported process group and select **Parameters**.
2. Populate the required parameter values as described in [Flow parameters](#flow-parameters).
#### Flow parameters
There are three parameter contexts. *Google Ads Destination Parameters* and
*Google Ads Source Parameters* are respectively responsible for allowing
connections with GoogleAds API and Snowflake. *Google Ads Ingestion Parameters*
is used to define the reconfiguration of data downloaded from Google
Ads. *Google Ads Parameters* aggregates all of them in one.
##### Google Ads Ingestion Parameters
| Parameter |
Description |
Required |
| Client Account ID |
ID of the account in the Google Ads for which given report should be ingested |
true |
| Login Customer ID |
Customer ID of the Google Ads manager account (MCC) for which the report should be ingested |
false |
| Google Ads Resource Name |
Name of the resource in Google Ads that is a source for the report |
true |
| Report Attributes |
Attributes of the selected resource |
true |
| Report Metrics |
Metrics collected in the context of a given resource |
false |
| Report Segments |
Buckets in which metrics should be grouped |
false |
| Report Start Date |
Start date from which the ingestion should happen. The date format is YYYY-MM-DD. |
false |
| Schedule |
Get Google Ads Report processor schedule |
true |
The easiest way to obtain proper combination of *Report Attributes*, *Report Metrics* and *Report Segments* is to use [Google Ads Query Builder](https://developers.google.com/google-ads/api/fields/v19/overview_query_builder).
Select the resource based on the one inserted into parameter *Google Ads Resource Name* and construct the query. Then copy and pase attributes, metrics and segments to corresponding parameters.
##### Google Ads Source Parameters
| Parameter |
Description |
Required |
| Google Developer Token |
Developer token required to query Google Ads API |
true |
| Google Service Account JSON |
Service Account JSON required for Google Ads authentication |
true |
##### Google Ads Destination Parameters
| Parameter |
Description |
Required |
| Destination Database |
The database where data will be persisted. It must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
|
Yes |
| Destination Schema |
The schema where data will be persisted, which must already exist in Snowflake.
The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.
See the following examples:
- `CREATE SCHEMA SCHEMA_NAME` or `CREATE SCHEMA schema_name`: use `SCHEMA_NAME`
- `CREATE SCHEMA "schema_name"` or `CREATE SCHEMA "SCHEMA_NAME"`: use `schema_name` or `SCHEMA_NAME`, respectively
|
Yes |
| Snowflake Authentication Strategy |
When using:
- **Snowflake Openflow Deployment** or **BYOC**: Use SNOWFLAKE_MANAGED_TOKEN.
This token is managed automatically by Snowflake.
BYOC deployments must have previously configured
[runtime roles](#label-deployment-byoc-setup-runtime-role) to use SNOWFLAKE_MANAGED_TOKEN.
- **BYOC:** Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.
|
Yes |
| Snowflake Account Identifier |
When using:
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted.
|
Yes |
| Snowflake Private Key |
When using:
- **Session Token Authentication Strategy**: Must be blank.
-
- **KEY_PAIR**: Must be the RSA private key used for authentication.
-
The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers.
Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.
|
No |
| Snowflake Private Key File |
When using:
- **Session token authentication strategy**: The private key file must be blank.
- **KEY_PAIR**: Upload the file that contains the RSA private key used for authentication to Snowflake,
formatted according to PKCS8 standards and including standard PEM headers and footers.
The header line begins with `-----BEGIN PRIVATE`.
To upload the private key file, select the **Reference asset** checkbox.
|
No |
| Snowflake Private Key Password |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the password associated with the Snowflake private key file.
|
No |
| Snowflake Role |
When using
- **Session Token Authentication Strategy**: Use your Snowflake role.
You can find your Snowflake role in the Openflow UI, by navigating to **View Details** for your Runtime.
- **KEY_PAIR** Authentication Strategy: Use a valid role configured for your service user.
|
Yes |
| Snowflake Username |
When using
- **Session Token Authentication Strategy**: Must be blank.
- **KEY_PAIR**: Provide the user name used to connect to the Snowflake instance.
|
Yes |
| Oversized Value Strategy |
Determines how the connector handles values that exceed its internal size limits (16 MB) during replication.
Possible values are:
- **Fail Table** (default): The table is marked as permanently failed, and replication stops for that table.
- **Set Null**: The value is replaced with `NULL` in the destination table.
Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.
|
No |
| Snowflake Warehouse |
Snowflake warehouse used to run queries. |
Yes |
## Run the flow
1. Right-click on the plane and select **Enable all Controller Services**.
2. Right-click on the imported process group and select **Start**. The connector starts the data ingestion.
## How to reset the connector
To fully reset connector to the initial state, do the following:
1. Ensure that there are no more flow files in the queues.
2. Stop all the processors.
3. Clear the state of the initial processor.
1. Right click on the processor `Get Google Ads Report` and select **View State**.
2. Select the option **Clear State**. This resets the state of the processor.
4. Drop the destination table in Snowflake.
---
title: Set up the Openflow Connector for Google Drive
source: https://docs.snowflake.com/en/user-guide/data-integration/openflow/connectors/google-drive/setup.md
section: Loading & Unloading Data
---
# Set up the Openflow Connector for Google Drive
This feature is not available in the People's Republic of China.
Snowflake connectors are supported in every region where Snowflake Openflow is available.
[Snowflake Openflow on BYOC deployments](/user-guide/data-integration/openflow/about-byoc) are available to all accounts in AWS Commercial Regions only ([](#label-na-general-regions)).
[Openflow Snowflake deployments](/user-guide/data-integration/openflow/about-spcs) are available to all accounts in AWS, Azure, and GCP Commercial Regions.
This connector is subject to the [Snowflake Connector Terms](https://www.snowflake.com/legal/snowflake-connector-terms/).
- [](/user-guide/data-integration/openflow/about)
- [](/user-guide/data-integration/openflow/manage)
- [](/user-guide/data-integration/openflow/connectors/about-openflow-connectors)
This topic describes the steps to set up the Openflow Connector for Google Drive.
## Prerequisites
1. Ensure that you have reviewed [](/user-guide/data-integration/openflow/connectors/google-drive/about).
2. Ensure that you have [](/user-guide/data-integration/openflow/setup-openflow-byoc) or [Set up Openflow - Snowflake Deployments](/user-guide/data-integration/openflow/setup-openflow-spcs).
3. If using %ofsfspcs-plural%, ensure that you've reviewed [configuring required domains](/user-guide/data-integration/openflow/setup-openflow-spcs-sf-allow-list)
and have granted access to the required domains for the [](#label-openflow-domains-used-by-openflow-connectors-google-drive) connector.
## Get the credentials
Setting up the connector requires specific permissions and account
settings for Snowflake Openflow processors to read data from Google.
This access is provided in part through setting up a service account and
a key for Openflow to authenticate as that service account.
For more information, see:
- [Configure access to the Google Cloud Search API](https://developers.google.com/cloud-search/docs/guides/project-setup#create_service_account_credentials)
- [Delegating domain-wide authority to the service
account](https://developers.google.com/identity/protocols/oauth2/service-account#delegatingauthority)
As a Google Drive administrator, perform the following steps:
### Prerequisites
Ensure that you meet the following requirements:
- You have a Google user with Super Admin permissions
- You have a Google Cloud Project with the following roles:
- Organization Policy Administrator
- Organization Administrator
### Enable service account key creation
By default Google disables service account key creation. For Openflow to
use the service account JSON, this key creation policy must be turned
off.
1. Log in to the [Google Cloud Console](https://console.cloud.google.com/) with a super admin
account that has the Organizational Policy Admin Role.
2. Ensure you are in the project associated with your organization, not
the project in your organization.
3. Click **Organization Policies**.
4. Select the **Disable service account key creation** policy.
5. Click **Manage Policy** and turn off enforcement.
6. Click **Set Policy**.
### Create service account and key
1. Open the [Google Cloud Console](https://console.cloud.google.com/)
and authenticate using a user that has been granted access to create
service accounts.
2. Ensure you are in a project of your organization.
3. In the left navigation, under the **IAM & Admin**, select the
**Service Accounts** tab.
4. Click **Create Service Account**.
5. Enter the service account name and click **Create and Continue**.
6. Click **Done**. In the table with the service accounts listed, find
the **OAuth 2 Client ID** column. Copy the Client ID as this will be
required later to set up domain-wide delegation in the next section.
7. On the newly created service account, click the menu under the table
with the service accounts listed for that service account and select
**Manage keys**.
8. Select **Add key** and then **Create new key**.
9. Leave the default selection of JSON and click **Create**.
The key is downloaded into your browser Downloads directory as a .json
file.
### Grant service account domain-wide delegation for listed scopes
1. Log in to your Google Admin account.
2. Select **Admin** from **Google Apps selector**.
3. In the left navigation, expand **Security** and then **Access** and select **Data
control** then click on **API Controls**.
4. On the API **Controls** screen, select **Manage domain wild
delegation**.
5. Click **Add new**.
6. Enter the OAuth 2 Client ID taken from the Create Service Account and
Key section and the following scopes:
- [https://www.googleapis.com/auth/drive](https://www.googleapis.com/auth/drive)
- [https://www.googleapis.com/auth/drive.metadata.readonly](https://www.googleapis.com/auth/drive.metadata.readonly)
- [https://www.googleapis.com/auth/admin.directory.group.member.readonly](https://www.googleapis.com/auth/admin.directory.group.member.readonly)
- [https://www.googleapis.com/auth/admin.directory.group.readonly](https://www.googleapis.com/auth/admin.directory.group.readonly)
- [https://www.googleapis.com/auth/drive.file](https://www.googleapis.com/auth/drive.file)
- [https://www.googleapis.com/auth/drive.metadata](https://www.googleapis.com/auth/drive.metadata)
7. Click **Authorize**.
## Set up Snowflake account
As a Snowflake account administrator, perform the following tasks manually
or by using the script included below:
1. Create a new role or use an existing role and grant the [](#label-database-privileges).
2. Create a new Snowflake service user with the type as [SERVICE](#label-user-type-property).
3. Grant the Snowflake service user the role you created in the previous steps.
4. Configure with [key-pair auth](/user-guide/key-pair-auth) for the Snowflake SERVICE user from step 2.
5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store.
If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the
public key and private key files used for key-pair authentication according to the security policies of your organization.
1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it's recommended that you the
EC2 instance role associated with Openflow as this way no other secrets have to be persisted.
2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right.
Navigate to **Controller Settings** %raa% **Parameter Provider** and then fetch your parameter values.
3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow.
6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake),
then grant those users the role created in step 1.
7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated,
and the amount of data transferred. Large table numbers typically scale better with
[multi-cluster warehouses](/user-guide/warehouses-multicluster), rather than larger warehouse sizes.
### Example setup
```sql
--The following script assumes you'll need to create all required roles, users, and objects.
--However, you may want to reuse some that are already in existence.
--Create a Snowflake service user to manage the connector
USE ROLE USERADMIN;
CREATE USER TYPE=SERVICE COMMENT='Service user for Openflow automation';
--Create a pair of secure keys (public and private). For more information, see
--key-pair authentication. Store the private key for the user in a file to supply
--to the connector’s configuration. Assign the public key to the Snowflake service user:
ALTER USER SET RSA_PUBLIC_KEY = '';
--Create a role to manage the connector and the associated data and
--grant it to that user
USE ROLE SECURITYADMIN;
CREATE ROLE ;
GRANT ROLE TO USER ;
--The following block is for USE CASE 2 (Cortex connect) ONLY
--Create a role for read access to the cortex search service created by this connector.
--This role should be granted to any role that will use the service
CREATE ROLE ;
GRANT ROLE TO ROLE ;
--Create the database the data will be stored in and grant usage to the roles created
USE ROLE ACCOUNTADMIN; --use whatever role you want to own your DB
CREATE DATABASE IF NOT EXISTS ;
GRANT USAGE ON DATABASE |