Monitor workloads for hybrid tables

Unistore workloads that leverage hybrid tables might be different than many analytical workloads that you are running in Snowflake. For example, your workloads might contain fewer unique queries that take less time to run and are executed at a higher frequency. You have several options to monitor your workloads.

Monitor workloads with Account Usage views

To monitor your operational workloads effectively, use AGGREGATE_QUERY_HISTORY View. This view enables you to monitor the health of your workload, diagnose issues, and identify avenues for optimization. The AGGREGATE_QUERY_HISTORY view aggregates query execution statistics for a repeated parameterized query over a time interval so that it is easier and more efficient to identify patterns in your workloads and queries over time. Note that all Snowflake workloads and queries will be combined in the output of this view.

The AGGREGATE_QUERY_HISTORY view helps you answer the following questions about your workloads:

  • How many operations per second are being executed in my virtual warehouse?

  • Which queries are consuming the most total time or resources in my workload?

  • Has the performance of a specific query changed substantially over time?

To help improve performance and efficiency in your workload, individual executions of low latency operations (under one second) will not be stored in QUERY_HISTORY View nor will they generate a unique query profile. Instead, aggregate statistics for repeated executions of that query will be returned in the AGGREGATE_QUERY_HISTORY view. You will also be able to view a sampled query profile for the query over a selected time interval.

Monitor overall workload health

Use the AGGREGATE_QUERY_HISTORY view to monitor your overall workload throughput and concurrency, and to investigate unexpected spikes or drops in your workloads. For example:

SELECT
    interval_start_time
    , SUM(calls) as execution_count
    , SUM(calls) / 60 as queries_per_second
    , COUNT(DISTINCT session_id) as unique_sessions
    , COUNT(user_name) as unique_users
FROM snowflake.account_usage.aggregate_query_history
WHERE TRUE
          AND warehouse_name = '<MY_WAREHOUSE>'
          AND interval_start_time > $START_DATE
          AND interval_start_time < $END_DATE
GROUP BY
          interval_start_time
;
Copy

You can also use aggregate query history to monitor for potential problems with errors, queueing, lock blocking, or throttling. For example:

WITH time_issues AS
(
    SELECT
        interval_start_time
        , SUM(transaction_blocked_time:"SUM") as transaction_blocked_time
        , SUM(queued_provisioning_time:"SUM") as queued_provisioning_time
        , SUM(queued_repair_time:"SUM") as queued_repair_time
        , SUM(queued_overload_time:"SUM") as queued_overload_time
        , SUM(hybrid_table_requests_throttled_count) as hybrid_table_requests_throttled_count
    FROM snowflake.account_usage.aggregate_query_history
    WHERE TRUE
              AND warehouse_name = '<MY_WAREHOUSE>'
          AND interval_start_time > $START_DATE
          AND interval_start_time < $END_DATE
    GROUP BY ALL
),
errors AS
(
    SELECT
        interval_start_time
        , SUM(value:"count") as error_count
    FROM
    (
        SELECT
            a.interval_start_time
            ,e.*
        FROM
            snowflake.account_usage.aggregate_query_history a,
            TABLE(flatten(input => errors)) e
        WHERE TRUE
            AND interval_start_time > $START_DATE
                  AND interval_start_time < $END_DATE
  )
  GROUP BY ALL
)
    SELECT
        time_issues.interval_start_time
        , error_count
        , transaction_blocked_time
        , queued_provisioning_time
        , queued_repair_time
        , queued_overload_time
        , hybrid_table_requests_throttled_count
    FROM
        time_issues FULL JOIN errors ON errors.interval_start_time = time_issues.interval_start_time
;
Copy

Ordinarily, such metrics should remain low. If you see an unexpected spike, it is recommended that you investigate the cause.

Identify and investigate repeated queries

You may opt to optimize or investigate the performance of common and often executed queries to improve the efficiency of your workload. Use the AGGREGATE_QUERY_HISTORY view to identify top queries for a workload by execution count. For example:

SELECT
    query_parameterized_hash
    , any_value(query_text)
    , SUM(calls) as execution_count
FROM snowflake.account_usage.aggregate_query_history
WHERE TRUE
          AND warehouse_name = '<MY_WAREHOUSE>'
          AND interval_start_time > '2024-02-01'
          AND interval_start_time < '2024-02-08'
GROUP BY
          query_parameterized_hash
ORDER BY execution_count DESC
;
Copy

You can choose to view metrics for the slowest queries. For example:

SELECT
    query_parameterized_hash
    , any_value(query_text)
    , SUM(total_elapsed_time:"sum"::NUMBER) / SUM (calls) as avg_latency
FROM snowflake.account_usage.aggregate_query_history
WHERE TRUE
          AND warehouse_name = '<MY_WAREHOUSE>'
          AND interval_start_time > '2024-02-01'
          AND interval_start_time < '2024-02-08'
GROUP BY
          query_parameterized_hash
ORDER BY avg_latency DESC
;
Copy

You can analyze the performance of a particular query over time to gain insight into trends in latency. For example:

SELECT
    interval_start_time
    , total_elapsed_time:"avg"::number avg_elapsed_time
    , total_elapsed_time:"min"::number min_elapsed_time
    , total_elapsed_time:"p90"::number p90_elapsed_time
    , total_elapsed_time:"p99"::number p99_elapsed_time
    , total_elapsed_time:"max"::number max_elapsed_time
FROM snowflake.account_usage.aggregate_query_history
WHERE TRUE
          AND query_parameterized_hash = '<123456>'
          AND interval_start_time > '2024-02-01'
          AND interval_start_time < '2024-02-08'
ORDER BY interval_start_time DESC
;
Copy

This query calculates total query time. You can also modify the query to return more granular metrics on the different phases of a query (compilation, execution, queuing, and lock waiting). Aggregate statistics will be returned for each phase.