Monitor Openflow¶
This topic describes how to monitor the state of Openflow and troubleshoot problems.
Accessing Openflow logs¶
Snowflake sends Openflow logs to the event table you configured when you set up Openflow. Snowflake recommends that you include a timestamp in the WHERE clause of event table queries. This is particularly important because of the potential volume of data generated by various Snowflake components. By applying filters, you can retrieve a smaller subset of data, which improves query performance.
The event table includes the following columns, which provide useful information regarding the logs collected by Snowflake from Openflow:
TIMESTAMP: Shows when Snowflake collected the log.
RESOURCE_ATTRIBUTES: Provides a JSON object that identifies the Snowflake service that generated the log message. For example, it provides details such as the application and data plane ID for Openflow.
{ "application": "openflow", "cloud.service.provider": "aws", "k8s.container.name": "pg-dev-server", "k8s.container.restart_count": "0", "k8s.namespace.name": "runtime-pg-dev", "k8s.node.name": "ip-10-10-62-36.us-east-2.compute.internal", "k8s.pod.name": "pg-dev-0", "k8s.pod.start_time": "2025-04-25T22:14:29Z", "k8s.pod.uid": "94610175-1685-4c8f-b0a1-42898d1058e6", "k8s.statefulset.name": "pg-dev", "openflow.dataplane.id": "abeddb4f-95ae-45aa-95b1-b4752f30c64a" }
RECORD_ATTRIBUTES: For a Snowflake service, it identifies an error source (standard output or standard error).
{ "log.file.path": "/var/log/pods/runtime-pg-dev_pg-dev-0_94610175-1685-4c8f-b0a1-42898d1058e6/pg-dev-server/0.log", "log.iostream": "stdout", "logtag": "F" }
VALUE: Standard output and standard error are split into lines, with each line generating its own record in the event table.
"{\"timestamp\":1746655642080,\"nanoseconds\":80591397,\"level\":\"INFO\",\"threadName\":\"Clustering Tasks Thread-2\",\"loggerName\":\"org.apache.nifi.controller.cluster.ClusterProtocolHeartbeater\",\"formattedMessage\":\"Heartbeat created at 2025-05-07T22:07:22.071Z and sent to pg-dev-0.pg-dev.runtime-pg-dev.svc.cluster.local:8445 at 2025-05-07T22:07:22.080590784Z; determining Cluster Coordinator took 7 millis; DNS lookup for coordinator took 0 millis; connecting to coordinator took 0 millis; sending heartbeat took 1 millis; receiving first byte from response took 1 millis; receiving full response took 1 millis; total time was 9 millis\",\"throwable\":null}"
Examples¶
Find error-level logs for runtimes¶
SELECT
timestamp,
resource_attributes:"k8s.namespace.name" AS runtime_key,
parse_json(value::string):loggerName AS logger,
parse_json(value::string):formattedMessage AS log_value
FROM openflow.telemetry.EVENTS_<account-id>
WHERE true
AND timestamp < dateadd('days', -1, sysdate())
AND record_type = 'LOG'
AND resource_attributes:"k8s.namespace.name" LIKE 'runtime-%'
AND resource_attributes:"k8s.container.name" LIKE '%-server'
AND parse_json(value::string):level = 'ERROR'
ORDER BY timestamp desc
LIMIT 5;
Find “caused by” exceptions in the logs¶
These exceptions can be expected for intermittent connection issues, data incompatibilities, or related causes.
SELECT
timestamp,
RESOURCE_ATTRIBUTES:"k8s.namespace.name" AS Namespace,
RESOURCE_ATTRIBUTES:"k8s.pod.name" AS Pod,
RESOURCE_ATTRIBUTES:"k8s.container.name" AS Container,
value
FROM openflow.telemetry.EVENTS_<account-id>
WHERE true
AND record_type = 'LOG'
AND timestamp > dateadd(minute, -5, sysdate())
AND value LIKE '%Caused By%'
ORDER BY timestamp desc
LIMIT 10;
Find which processors are running, have stopped, or are in other states¶
SELECT
timestamp,
RECORD_ATTRIBUTES:component AS Processor,
RECORD_ATTRIBUTES:id AS Processor_ID,
value AS Running
FROM openflow.telemetry.EVENTS_<account-id>
WHERE true
AND RECORD:metric:name = 'processor.run.status.running'
AND RECORD_TYPE='METRIC'
AND timestamp > dateadd(minute, -5, sysdate());
Find high CPU usage for runtimes¶
Slow data flows or reduced throughput may be the result of a bottleneck on the CPU. Runtimes should scale up automatically, based on the number of minimum and maximum nodes you have configured. If the runtimes are using their maximum number of nodes and still CPU usage remains high, increase the maximum number of nodes allocated to your runtimes or troubleshoot the connector or flow.
SELECT
timestamp,
RESOURCE_ATTRIBUTES:"k8s.namespace.name" AS Namespace,
RESOURCE_ATTRIBUTES:"k8s.pod.name" AS Pod,
RESOURCE_ATTRIBUTES:"k8s.container.name" AS Container,
value AS CPU_Usage
FROM openflow.telemetry.EVENTS_<account-id>
WHERE TIMESTAMP > dateadd(minute, -1, sysdate())
AND RECORD_TYPE = 'METRIC'
AND RECORD:metric:name ilike 'container.cpu.usage'
AND RESOURCE_ATTRIBUTES:"k8s.namespace.name" ilike 'runtime%'
AND RESOURCE_ATTRIBUTES:"k8s.container.name" ilike '%server'
AND RESOURCE_ATTRIBUTES:"k8s.namespace.name" NOT IN ('runtime-infra', 'runtime-operator')
ORDER BY TIMESTAMP desc, CPU_Usage desc;
Available metrics¶
Metrics available in runtimes¶
The following is a list of available metrics for runtimes:
Metric |
Unit |
Type |
Description |
---|---|---|---|
cores.load |
percentage |
gauge |
Average load across all cores available to the runtime. Max value is 1, when all available cores are being fully used. |
cores.available |
CPU cores |
gauge |
Number of CPU cores available to the runtime |
storage.free |
bytes |
gauge |
Amount of free storage available per storage type to the runtime. There are three storage types available:
You can view the |
storage.used |
bytes |
gauge |
Amount of storage used per storage type. There are three storage types available:
You can view the |
Sample query for CPU metrics¶
SELECT * from events
WHERE
1 = 1
AND record_type = 'METRIC'
AND resource_attributes:application = 'openflow'
AND record:metric.name IN ('cores.load', 'cores.available')
ORDER BY timestamp desc
LIMIT 1000;
Metrics available in connectors¶
The following is a list of available metrics for connectors:
Metric |
Unit |
Type |
Description |
---|---|---|---|
processgroup.bytes.received |
bytes |
gauge |
Average number of bytes consumed from the source |
processgroup.bytes.sent |
bytes |
gauge |
Average number of bytes written to the destination |
To query these metrics from the event table, you need to find the process group name and ID from the Openflow runtime canvas
and then filter it from the RECORD_ATTRIBUTES
column.