Monitor Openflow¶

This topic describes how to monitor the state of Openflow and troubleshoot problems.

Accessing Openflow logs¶

Snowflake sends Openflow logs to the event table you configured when you set up Openflow.

Snowflake recommends that you include a timestamp in the WHERE clause of event table queries. This is particularly important because of the potential volume of data generated by various Snowflake components. By applying filters, you can retrieve a smaller subset of data, which improves query performance.

To get started quickly with Openflow’s telemetry, see Example Queries below.

Openflow Telemetry Schema¶

For information about the event table columns, see Event table columns.

The following sections describe how Openflow structures telemetry in an Event Table.

Resource Attributes¶

Describes the event metadata set by Openflow. For general information on other types of resource attributes see RESOURCE_ATTRIBUTES column in the Event Table columns documentation.

Name	Type	Description
application	String	The fixed value `openflow`
cloud.service.provider	String	One of `aws`, `azure`, `gcp`, `spcs`
container.id	String	Unique identifier of the container
container.image.name	String	Fully qualified name of the container image. Openflow Runtime containers will include the path to the local container registry. For example, `$accountid.dkr.ecr.$region.amazonaws.com/snowflake-openflow/runtime-server`
container.image.tag	String	Version of the container image
k8s.container.name	String	The name of the K8s container. Openflow Runtime containers will start with the “Runtime Key” and end with `-gateway` or `-server`. For example, an Openflow Runtime named “PostgreSQL CDC” with a Runtime Key of postgresql-cdc, so it would have container names of: postgresql-cdc-gateway postgresql-cdc-server
k8s.container.restart_count	Numeric String	The number of times this container has restarted since it was created.
k8s.namespace.name	String	K8s namespace of the pod or container, starting with `runtime-` for Openflow Runtimes. Values also include `kube-system` and `openflow-runtime-infra`.
k8s.node.name	String	The internal domain name of the EKS node hosting the pod / container, or the EKS node itself. For example, ip-10-12-13-144.us-west-2.compute.internal
k8s.pod.name	String	The name of the K8s pod. Openflow Runtime pods will start with the “Runtime Key” and end with a numeric identifier for each pod replica. This number can grow up to the “Max Nodes” set for the Runtime, indexed at 0. For example, an Openflow Runtime named “PostgreSQL CDC” with a Runtime Key of postgresql-cdc and 3 nodes would have pod names of: postgresql-cdc-0 postgresql-cdc-1 postgresql-cdc-2
k8s.pod.start_time	ISO 8601 Date String	Timestamp that the pod was started
k8s.pod.uid	UUID String	Unique identifier of the pod within the cluster
openflow.dataplane.id	UUID String	The unique identifier of the Openflow Deployment, matching the “ID” shown in the Snowflake Openflow UI through Deployment > View Details.

Resource Attributes Example:

{
  "application": "openflow",
  "cloud.service.provider": "aws",
  "k8s.container.name": "pg-dev-server",
  "k8s.container.restart_count": "0",
  "k8s.namespace.name": "runtime-pg-dev",
  "k8s.node.name": "ip-10-10-62-36.us-east-2.compute.internal",
  "k8s.pod.name": "pg-dev-0",
  "k8s.pod.start_time": "2025-04-25T22:14:29Z",
  "k8s.pod.uid": "94610175-1685-4c8f-b0a1-42898d1058e6",
  "k8s.statefulset.name": "pg-dev",
  "openflow.dataplane.id": "abeddb4f-95ae-45aa-95b1-b4752f30c64a"
}

Copy

Scope¶

Name	Type	Description
name	String	Provider of the metric. One of: `runtime` for Openflow Connector metrics `github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver` for system-level metrics

Name

Type

Description

name

String

Provider of the metric. One of:

runtime for Openflow Connector metrics
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver for system-level metrics

Scope Example:

{
  "name": "runtime"
}

Copy

Record Type¶

Depending on the type of Openflow telemetry represented by this row, this will be one of:

LOG
METRIC

Openflow does not collect TRACE records, but that is also a valid type for this column in Snowflake Event Tables.

Record¶

Optional. This JSON object describes the type of metric represented by this row.

Name	Type	Description
metric	Object	Contains two fields: `name` for the unique metric produced, typically using dot-delimited namespaces `unit` for the value represented by the type, such as byte, nanosecond, and thread The name and unit values vary widely. For the full list, see Application Metrics below.
metric_type	String	One of: `gauge` for most Openflow metrics, a snapshot value that can increase or decrease `sum` for cumulative metrics like pod CPU time and network IO
value_type	String	The primitive type of the value produced by this metric. One of: INT DOUBLE
aggregation_temporality	String	Optional. Set to cumulative for metrics that are strictly increasing and dependent on previous values, such as pod CPU time and network IO.
is_monotonic	Boolean	Optional. For cumulative metrics, this is true to show that it is strictly increasing within the time series.

Record Example:

{
  "metric": {
    "name": "connection.queued.duration.max",
    "unit": "millisecond"
  },
  "metric_type": "gauge",
  "value_type": "INT"
}

Copy

Record Attributes¶

Logs¶

Record attributes for Logs will typically indicate where this log was sourced. For example, logs from an Openflow Runtime named testruntime could have Record Attributes of:

{
  "log.file.path": "/var/log/pods/runtime-testruntime_testruntime-0_66d80cdb-9484-40a4-bdba-f92eb0af14c7/testruntime-server/0.log",
  "log.iostream": "stdout",
  "logtag": "F"
}

Copy

System Metrics¶

System metrics like CPU usage will typically not set Record Attributes, so this will be null.

Openflow Application Metrics¶

Record Attributes for Application or “Flow” metrics provide details about the component in the data pipeline that produced the metric. This will vary based on the type of component. See Application Metrics

{
  "component": "PutSnowpipeStreaming",
  "execution.node": "ALL",
  "group.id": "c052f9d7-7f76-3013-a2c5-d3b064fa7326",
  "id": "c69e2913-22a9-36bb-a159-6a5ed1fb9d63",
  "name": "PutSnowpipeStreaming",
  "type": "processor"
}

Copy

Value¶

This column contains the raw value of the telemetry. For metrics, this will be a numeric value (integer or double). For logs, this will either be a semi-structured string value or a well-formatted JSON string.

Openflow Runtime Logs¶

Openflow Runtimes emit most logs as JSON, so applying Snowflake’s TRY_PARSE_JSON to the VALUE column allows you to further break this value into the following structured fields:

Name	Type	Description
formattedMessage	String	The actual log message emitted from the Runtime logger.
level	String	One of: ERROR WARN INFO DEBUG TRACE
loggerName	String	The fully qualified classname for the logger. Openflow processors will typically use logger names that start with `com.snowflake.openflow.runtime.processors`. This is useful to view logs for a specific processor, controller service, or bundled library.
nanoseconds	Integer	Nanosecond-level time that this log message was created, starting at milliseconds. For example, a nanosecond value of 111222333 could correspond to a timestamp value of 1749180210111 with the leftmost 3 digits of nanosecond matching the right-most 3 digits of timestamp.
threadName	String	Name of the thread handling this call. For example, `Timer-Driven Process Thread-7`
throwable	JSON Object	`null` when there is no exception or stacktrace for this log message. Otherwise, it logs the stacktrace as a JSON string with fields: `className` - the exception thrown `message` - any message logged with the exception `stepArray` - array of method calls for the stack trace, including: `className` `fileName` `lineNumber` `methodName`
timestamp	Integer	Time that this log message was created, represented as milliseconds since the UNIX epoch. For example, 1749180210044 indicates that the log was created at 2025-06-05 03:23:30.044 UTC

Application Metrics¶

Note

The following list covers all application metrics available for Openflow Runtimes. Runtimes only emit a subset of metrics relevant to Openflow Connectors to persist in a Snowflake Event Table.

Snowflake’s OpenTelemetry Reporting Task can send some or all metrics to any OTLP destination.

Connection Metrics¶

Metric Name	Unit	Description
connection.input.bytes	bytes	Size of Items Input
connection.input.count	items	Count of Items Input
connection.output.bytes	bytes	Size of Items Output
connection.output.count	items	Count of Items Output
connection.queued.bytes	bytes	Size of Items Queued
connection.queued.bytes.max	bytes	Max Size of Items Queued
connection.queued.count	items	Count of Items Queued
connection.queued.count.max	items	Max Count of Items Queued
connection.queued.duration.total	milliseconds	Total Duration of Queued Items
connection.queued.duration.max	milliseconds	Max Duration of Queued Items
connection.backpressure.threshold.bytes	bytes	The maximum size of data in bytes that can be queued in this connection before it applies back pressure.
connection.backpressure.threshold.objects	items	The configured maximum number of FlowFiles that can be queued in this connection before it applies back pressure.
connection.loadbalance.status.load_balance_not_configured	binary, 0 or 1	1 if the connection does not have a configured load balance setting. Otherwise, 0.
connection.loadbalance.status.load_balance_active	binary, 0 or 1	1 if the connection is load balancing across the cluster. Otherwise, 0.
connection.loadbalance.status.load_balance_inactive	binary, 0 or 1	1 if the connection is not load balancing across the cluster. Otherwise, 0.

Connection Record Attributes¶

Each Connection metric includes the following Record Attributes:

Attribute	Description
id	The unique identifier of the connection
name	The user-visible name of the connection
type	The fixed value `connection`
source.id	The unique identifier of the component that is sending FlowFiles to this connection
source.name	The user-visible name of the component that is sending FlowFiles to this connection
destination.id	The unique identifier of the component that is receiving FlowFiles from this connection
destination.name	The user-visible name of the component that is receiving FlowFiles from this connection
group.id	The unique identifier of the Process Group that contains this Connection

Input and Output Port Metrics¶

Input Port and Output Ports are technically two separate types of components. For consistency, metrics and attributes for Input and Output Ports are the same, with the exception of the type attribute that indicates whether it is an input port or an output port.

Metric Name	Unit	Description
port.thread.count.active	threads	Number of Active Threads
port.bytes.received	bytes	Number of Bytes Received
port.bytes.sent	bytes	Number of Bytes Sent
port.flowfiles.received	flowfiles	Number of FlowFiles Received
port.flowfiles.sent	flowfiles	Number of FlowFiles Sent
port.input.bytes	bytes	Size of Items Input
port.input.count	items	Count of Items Input
port.output.bytes	bytes	Size of Items Output
port.output.count	items	Count of Items Output

Input and Output Port Record Attributes¶

Each Port metric includes the following Record Attributes:

Attribute	Description
id	The unique identifier of the port
name	The user-visible name of the port
type	One of `port-input` or `port-output`
group.id	The unique identifier of the Process Group that contains this Port

Process Group Metrics¶

Metric Name	Unit	Description
processgroup.thread.count.active	threads	Number of Active Threads
processgroup.thread.count.stateless	threads	Number of Stateless Threads
processgroup.thread.count.terminated	threads	Number of Terminated Threads
processgroup.bytes.read	bytes	Number of Bytes Read
processgroup.bytes.received	bytes	Number of Bytes Received
processgroup.bytes.transferred	bytes	Number of Bytes Transferred
processgroup.bytes.sent	bytes	Number of Bytes Sent
processgroup.bytes.written	bytes	Number of Bytes Written
processgroup.flowfiles.received	flowfiles	Number of FlowFiles Received
processgroup.flowfiles.sent	flowfiles	Number of FlowFiles Sent
processgroup.flowfiles.transferred	flowfiles	Number of FlowFiles Transferred
processgroup.input.count	items	Number of Items Input
processgroup.input.content.size	bytes	Size of Items Input
processgroup.output.count	items	Number of Items Output
processgroup.output.content.size	bytes	Size of Items Output
processgroup.queued.count	items	Number of Items Queued
processgroup.queued.content.size	bytes	Size of Items Queued
processgroup.time.processing	nanoseconds	Time Spent Processing

Process Group Record Attributes¶

Each Process Group metric includes the following Record Attributes:

Attribute	Description
id	The unique identifier of the Process Group
name	The user-visible name of the Process Group
type	The fixed value `process-group`
tree.level	The depth of the Process Group, relative to the root process group of the flow. Process Groups at the highest level of the flow will have a tree.level of 1

Processor Metrics¶

Metric Name	Unit	Description
processor.thread.count.active	thread	Number of Active Threads
processor.thread.count.terminated	thread	Number of Terminated Threads
processor.time.lineage.average	nanosecond	Average Lineage Duration
processor.invocations	invocations	Number of Invocations
processor.bytes.read	byte	Number of Bytes Read
processor.bytes.received	byte	Number of Bytes Received
processor.bytes.sent	byte	Number of Bytes Sent
processor.bytes.written	byte	Number of Bytes Written
processor.flowfiles.received	flowfiles	Number of FlowFiles Received
processor.flowfiles.removed	flowfiles	Number of FlowFiles Removed
processor.flowfiles.sent	flowfiles	Number of FlowFiles Sent
processor.input.count	item	Number of Items Input
processor.input.content.size	bytes	Size of Items Input
processor.output.count	item	Number of Items Output
processor.output.content.size	byte	Size of Items Output
processor.time.processing	nanosecond	Time Spent Processing
processor.run.status.running	binary, 0 or 1	1 if running; 0 otherwise
processor.run.status.stopped	binary, 0 or 1	1 if stopped; 0 otherwise
processor.run.status.validating	binary, 0 or 1	1 if validating; 0 otherwise
processor.run.status.invalid	binary, 0 or 1	1 if invalid; 0 otherwise
processor.run.status.disabled	binary, 0 or 1	1 if disabled; 0 otherwise
processor.counter	count	Value of the counter

Processor Record Attributes¶

Each Processor metric includes the following Record Attributes:

Attribute	Description
id	The unique identifier of the processor
name	The user-visible and user-editable name of the Processor
type	The fixed value `processor`
component	The immutable class name of the processor.
execution.node	Either `ALL` or `PRIMARY`, depending on how this Processor is configured to run
group.id	The unique identifier of the Process Group that contains this Processor

Additional Attributes for Counters¶

In addition to the standard Processor attributes above, processor.counter metrics include the following:

Attribute	Description
type	The fixed value `counter`
counter	The user- or system-generated name of the counter

Remote Process Group Metrics¶

Metric Name	Unit	Description
remoteprocessgroup.thread.count.active	threads	Number of Active Threads
remoteprocessgroup.remote.port.count.active	ports	Number of Active Remote Ports
remoteprocessgroup.remote.port.count.inactive	ports	Number of Inactive Remote Ports
remoteprocessgroup.duration.lineage.average	nanoseconds	Average Lineage Duration
remoteprocessgroup.refresh.age	milliseconds	Time since last refresh
remoteprocessgroup.received.count	items	Number of Received Items
remoteprocessgroup.received.content.size	bytes	Size of Received Items
remoteprocessgroup.sent.count	items	Number of Sent Items
remoteprocessgroup.sent.content.size	bytes	Size of Sent Items
remoteprocessgroup.transmission.status.transmitting	binary, 0 or 1	1 if the Remote Process Group is transmitting. Otherwise, 0.
remoteprocessgroup.transmission.status.nottransmitting	binary, 0 or 1	0 if the Remote Process Group is transmitting. Otherwise, 1.

Remote Process Group Record Attributes¶

Each Remote Process Group metric includes the following Record Attributes:

Attribute	Description
id	The unique identifier of the remote process group
name	The user-visible name of the Remote Process Group
group.id	The unique identifier of the Process Group that contains this Remote Process Group
authorization.issue	The Authorization used to access the Remote Process Group
target.uri	The URI of the Remote Process Group
type	The fixed value `remote-process-group`

JVM Metrics¶

Metric Name	Unit	Description
jvm.memory.heap.used	bytes	The amount of memory currently occupied by objects on the JVM Heap
jvm.memory.heap.committed	bytes	The amount of memory guaranteed to be available for use by the JVM Heap
jvm.memory.heap.max	bytes	Maximum amount of memory allocated for the JVM Heap
jvm.memory.heap.init	bytes	Initial amount of memory allocated for the JVM Heap
jvm.memory.heap.usage	percentage	JVM Heap Usage
jvm.memory.non-heap.usage	percentage	JVM Non-Heap Usage
jvm.memory.total.init	bytes	Initial amount of memory allocated for the JVM
jvm.memory.total.used	bytes	Current amount of memory used by the JVM
jvm.memory.total.max	bytes	Maximum amount of memory that can be used by the JVM
jvm.memory.total.committed	bytes	The amount of memory guaranteed to be available for use by the JVM
jvm.threads.count	threads	Number of live threads
jvm.threads.deadlocks	threads	JVM Thread Deadlocks
jvm.threads.daemon.count	threads	Number of live daemon threads
jvm.uptime	seconds	Number of seconds the JVM process has been running
jvm.file.descriptor.usage	percentage	Percentage of available file descriptors currently in use.
jvm.gc.G1-Concurrent-GC.runs	runs	Total number of times that the G1 Concurrent Garbage Collection has run
jvm.gc.G1-Concurrent-GC.time	milliseconds	Total amount of time that the G1 Concurrent Garbage Collection has been running
jvm.gc.G1-Young-Generation.runs	runs	Total number of times that the G1 Young Generation has run
jvm.gc.G1-Young-Generation.time	milliseconds	Total amount of time that the G1 Young Generation has been running
jvm.gc.G1-Old-Generation.runs	runs	Total number of times that the G1 Old Generation has run
jvm.gc.G1-Old-Generation.time	milliseconds	Total amount of time that the G1 Old Generation has been running

JVM Record Attributes¶

JVM metrics do not provide Record Attributes.

CPU Metrics¶

Metric Name	Unit	Description
cores.available	cores	The number of available cores for the Runtime
cores.load	percentage	Either the system load average or -1 if it is not available

CPU Record Attributes¶

Attribute	Description
id	The fixed value `cpu`
name	The name of the operating system
architecture	The architecture of the operating system
version	The version of the operating system

Storage Metrics¶

Metric Name	Unit	Description
storage.free	bytes	The amount of free storage for a given repository
storage.used	bytes	The amount of used storage for a given repository

Storage Record Attributes¶

Attribute	Description
id	The unique identifier of the storage repository
name	Same as id and provided for consistency
storage.type	One of `flowfile`, `content`, or `provenance`

Example Queries¶

The following queries are examples to get you started with Openflow Telemetry.

All queries assume that Openflow is configured to send telemetry to the default Event Table of SNOWFLAKE.TELEMETRY.EVENTS. If your Snowflake Account or Openflow Deployment is configured with a different Event Table, substitute that table name where you see SNOWFLAKE.TELEMETRY.EVENTS.

Find Stuck FlowFiles¶

This query returns connections with FlowFiles that have been queued for more than some threshold, indicating that they may be stuck and require intervention. Adjust the 30 minute threshold as needed for your use case.

SELECT * FROM (
  SELECT
    resource_attributes:"openflow.dataplane.id" as Deployment_ID,
    resource_attributes:"k8s.namespace.name" as Runtime_Key,
    record_attributes:name as Connection_Name,
    record_attributes:id as Connection_ID,
    MAX(TO_NUMBER(value / 60 / 1000)) as Max_Queued_File_Minutes
  FROM snowflake.telemetry.events
  WHERE true
    AND record_type = 'METRIC'
    AND record:metric:name = 'connection.queued.duration.max'
    AND timestamp > dateadd(minutes, -30, sysdate())
  GROUP BY 1, 2, 3, 4
  ORDER BY Max_Queued_File_Minutes DESC
) WHERE Max_Queued_File_Minutes > 30;

Copy

Find Error Logs for Openflow Runtimes¶

SELECT
  timestamp,
  Deployment_ID,
  Runtime_Key,
  parsed_log:level as log_level,
  parsed_log:loggerName as logger,
  parsed_log:formattedMessage as message,
  parsed_log
FROM (
  SELECT
    timestamp,
    resource_attributes:"openflow.dataplane.id" as Deployment_ID,
    resource_attributes:"k8s.namespace.name" as Runtime_Key,
    TRY_PARSE_JSON(value) as parsed_log
  FROM snowflake.telemetry.events
  WHERE true
    AND timestamp > dateadd('minutes', -30, sysdate())
    AND record_type = 'LOG'
    AND resource_attributes:"k8s.namespace.name" like 'runtime-%'
  ORDER BY timestamp DESC
) WHERE log_level = 'ERROR';

Copy

Find Running and Non-Running Processors¶

Some flows expect that all processors are in a “running” state, even if they are not actively processing data.

This query helps you find any processors that are running or in another state, such as:

stopped
invalid
disabled

SELECT
  timestamp,
  resource_attributes:"openflow.dataplane.id" as Deployment_ID,
  resource_attributes:"k8s.namespace.name" as Runtime_Key,
  record_attributes:component as Processor,
  record_attributes:id as Processor_ID,
  TO_NUMBER(value) as Running
FROM snowflake.telemetry.events
WHERE true
  AND record:metric:name = 'processor.run.status.running'
  AND record_type = 'METRIC'
  AND timestamp > dateadd(minutes, -30, sysdate());

Copy

Find High CPU Usage for Openflow Runtimes¶

Slow data flows or reduced throughput may be the result of a bottleneck on the CPU. Openflow Runtimes scale up automatically, based on the number of minimum and maximum nodes you have configured.

If an Openflow Runtime is using its maximum number of nodes and still CPU usage remains high, consider:

Increasing the maximum number of nodes allocated to the Runtime
Troubleshoot the Connector or flow to identify the bottleneck

Snowsight Charts provide an easy way to visualize query results for CPU usage over time.

SELECT
  timestamp,
  resource_attributes:"openflow.dataplane.id" as Deployment_ID,
  resource_attributes:"k8s.namespace.name" as Runtime_Key,
  resource_attributes:"k8s.pod.name" as Runtime_Pod,
  TO_NUMBER(value, 10, 3) * 100 as CPU_Usage_Percentage
FROM snowflake.telemetry.events
WHERE true
  AND timestamp > dateadd(minute, -30, sysdate())
  AND record_type = 'METRIC'
  AND record:metric:name ilike 'container.cpu.usage'
  AND resource_attributes:"k8s.namespace.name" ilike 'runtime-%'
  AND resource_attributes:"k8s.container.name" ilike '%-server'
ORDER BY timestamp desc, CPU_Usage_Percentage desc;

Copy

Monitor Openflow¶

Accessing Openflow logs¶

Openflow Telemetry Schema¶

Resource Attributes¶

Scope¶

Record Type¶

Record¶

Record Attributes¶

Logs¶

System Metrics¶

Openflow Application Metrics¶

Value¶

Openflow Runtime Logs¶

Application Metrics¶

Connection Metrics¶

Connection Record Attributes¶

Input and Output Port Metrics¶

Input and Output Port Record Attributes¶

Process Group Metrics¶

Process Group Record Attributes¶

Processor Metrics¶

Processor Record Attributes¶

Additional Attributes for Counters​¶

Remote Process Group Metrics¶

Remote Process Group Record Attributes¶

JVM Metrics¶

JVM Record Attributes¶

CPU Metrics¶

CPU Record Attributes¶

Storage Metrics¶

Storage Record Attributes¶

Example Queries¶

Find Stuck FlowFiles¶

Find Error Logs for Openflow Runtimes¶

Find Running and Non-Running Processors¶

Find High CPU Usage for Openflow Runtimes¶

Additional Attributes for Counters¶