Categories:

시스템 함수 (Query Information)

GET_QUERY_STATS

Returns statistics about the execution of one or more queries that have completed.

You can use these statistics to understand workloads and choose optimal virtual warehouse sizes.

These statistics are also available in the query profile tab in Snowsight. The GET_QUERY_STATS() function makes the same information available via a programmatic interface.

See also:

GET_QUERY_OPERATOR_STATS

Syntax

GET_QUERY_STATS( <query_id> )
Copy

Arguments

query_id

The ID of a query. You can use a string literal (a string surrounded by single quotes), or an expression that evaluates to a query ID string (e.g. a query ID column from a view such as ACCOUNT_USAGE.QUERY_HISTORY or a call to the LAST_QUERY_ID function).

Returns

This returns a VARIANT that contains an OBJECT. The object’s key-value pairs contain information about the query. The keys vary, depending upon the type of query. For more detail, see the Usage Notes below.

Usage Notes

  • This function only returns statistics on queries that have completed.

  • This function is intended to be a lightweight function that you can run on individual SQL statements or on workloads that contain multiple queries.

    The information returned by the function does not include detailed statistics about each query operator used in the specified query. To get those statistics, call the GET_QUERY_OPERATOR_STATS function.

  • The OBJECT returned by the function includes a subset of the keys below. Depending on the particular query specified by the query ID, the OBJECT can contain different keys.

    • query_id: The query ID.

    • query_text: The text of the query (e.g. SELECT ...).

    • execution_time: The time required to execute the query (in milliseconds).

    • execution_time_breakdown: A breakdown of the execution time, which can include:

      • initialization: time spent setting up query processing.

      • processing: time spent processing the data by the CPU.

      • synchronization: time spent synchronizing activities between participating processes.

      • local_disk_io: time during which processing was blocked while waiting for local disk access.

      • remote_disk_io: time during which processing was blocked while waiting for remote disk access.

      • network_communication: time during which processing was waiting for network data transfer.

      The execution_time_breakdown numbers are percentages of total time divided by 100. For example, 3.0000000000000000e-01 (0.3) would be 30% of total query time.

    • query_statistics: Additional information about the query. This can include:

      • dml: statistics for Data Manipulation Language (DML) queries:

      • number_of_rows_inserted: number of rows inserted into a table (or tables).

      • number_of_rows_updated: number of rows updated in a table.

      • number_of_rows_deleted: number of rows deleted from a table.

      • number_of_rows_unloaded: number of rows unloaded during data export.

      • external_functions: information about calls to external functions. The following statistics are shown for external functions called by the SQL statement. The statistics are aggregated across all invocations of all external functions called by the specified query.

      • total_invocations: The number of times that an external function was called. (This can be different from the number of external function calls in the text of the SQL statement due to the number of batches that rows are divided into, the number of retries (if there are transient network problems), etc.)

      • rows_sent: The number of rows sent to external functions.

      • rows_received: The number of rows received back from external functions.

      • bytes_sent (x-region): The number of bytes sent to external functions. If the key includes (x-region), the data was sent across regions (which can impact billing).

      • bytes_received (x-region): The number of bytes received from external functions. If the key includes (x-region), the data was sent across regions (which can impact billing).

      • retries_due_to_transient_errors: The number of retries due to transient errors.

      • average_latency_per_call: The average amount of time per invocation (call) between the time Snowflake sent the data and received the returned data.

      • http_4xx_errors: total number of HTTP requests that returned a 4xx status code.

      • http_5xx_errors: total number of HTTP requests that returned a 5xx status code.

      • average_latency: average latency for successful HTTP requests.

      • avg_throttle_latency_overhead: average overhead per successful request due to a slowdown caused by throttling (HTTP 429).

      • batches_retried_due_to_throttling: number of batches that were retried due to HTTP 429 errors.

      • latency_per_successful_call_(p50): 50th percentile latency for successful HTTP requests. 50 percent of all successful requests took less than this time to complete.

      • latency_per_successful_call_(p90): 90th percentile latency for successful HTTP requests. 90 percent of all successful requests took less than this time to complete.

      • latency_per_successful_call_(p95): 95th percentile latency for successful HTTP requests. 95 percent of all successful requests took less than this time to complete.

      • latency_per_successful_call_(p99): 99th percentile latency for successful HTTP requests. 99 percent of all successful requests took less than this time to complete.

      If the value of a field, for example retries_due_to_transient_errors, is zero, then the field is not displayed.

      • extension_functions: information about calls to extension functions.

        • Java UDF handler load time: amount of time for the Java UDF handler to load.

        • Total Java UDF handler invocations: number of times the Java UDF handler is invoked.

        • Max Java UDF handler execution time: maximum amount of time for the Java UDF handler to execute.

        • Avg Java UDF handler execution time: average amount of time to execute the Java UDF handler.

        • Java UDTF process() invocations: number of times the Java UDTF process method was invoked.

        • Java UDTF process() execution time: amount of time to execute the Java UDTF process.

        • Avg Java UDTF process() execution time: average amount of time to execute the Java UDTF process.

        • Java UDTF's constructor invocations: number of times the Java UDTF constructor was invoked.

        • Java UDTF's constructor execution time: amount of time to execute the Java UDTF constructor.

        • Avg Java UDTF's constructor execution time: average amount of time to execute the Java UDTF constructor.

        • Java UDTF endPartition() invocations: number of times the Java UDTF endPartition method was invoked.

        • Java UDTF endPartition() execution time: amount of time to execute the Java UDTF endPartition method.

        • Avg Java UDTF endPartition() execution time: average amount of time to execute the Java UDTF endPartition method.

        • Max Java UDF dependency download time: maximum amount of time to download the Java UDF dependencies.

        • Max JVM memory usage: peak memory usage as reported by the JVM.

        • Java UDF inline code compile time in ms: compile time for the Java UDF inline code.

        • Total Python UDF handler invocations: number of times the Python UDF handler was invoked.

        • Total Python UDF handler execution time: total execution time for Python UDF handler.

        • Avg Python UDF handler execution time: average amount of time to execute the Python UDF handler.

        • Python sandbox max memory usage: peak memory usage by the Python sandbox environment.

        • Avg Python env creation time: Download and install packages: average amount of time to create the Python environment, including downloading and installing packages.

        • Conda solver time: amount of time to run the Conda solver to solve Python packages.

        • Conda env creation time: amount of time to create the Python environment.

        • Python UDF initialization time: amount of time to initialize the Python UDF.

        • Number of external file bytes read for UDFs: number of external file bytes read for UDFs.

        • Number of external files accessed for UDFs: number of external files accessed for UDFs.

      • io: information about the I/O (input/output) operations performed during the query:

        • scan_progress: the percentage of data scanned for a given table so far.

        • bytes_scanned: the number of bytes scanned so far.

        • percentage_scanned_from_cache: the percentage of data scanned from the local disk cache.

        • bytes_written: bytes written (e.g. when loading into a table).

        • bytes_written_to_result: bytes written to the result object. For example, select * from . . . would produce a set of results in tabular format representing each field in the selection. In general, the results object represents whatever is produced as a result of the query, and bytes_written_to_result represents the size of the returned result.

        • bytes_read_from_result: bytes read from the result object.

        • external_bytes_scanned: bytes read from an external object (e.g. a stage).

      • network: network communication:

        • bytes_sent_over_the_network: amount of data sent over the network.

      • pruning: information on table pruning:

        • partitions_scanned: number of partitions scanned so far.

        • partitions_total: total number of partitions in a given table.

      • spilling: information about disk usage for operations in which intermediate results do not fit in memory and therefore are spilled to disk:

        • bytes_spilled_local_storage: volume of data spilled to local disk.

        • bytes_spilled_remote_storage: volume of data spilled to remote disk.

  • For each individual execution of a specific query (i.e. a specific UUID), this function is deterministic; it returns the same values each time.

    However, for different executions of the same query text, this function can return different runtime statistics. The statistics depend on many factors. The following factors can have a major impact on the execution and therefore on the statistics returned by this function:

    • The volume of data.

    • The availability of materialized views, and the changes (if any) to the data since those materialized views were last refreshed.

    • The presence or absence of clustering.

    • The presence or absence of previously-cached data.

    • The size of the virtual warehouse.

    The values can also be affected by factors outside the user’s query and data. These factors are usually small. The factors include:

    • Virtual warehouse initialization time.

    • Latency with external functions.

Examples

Retrieving Data About a Single Query

This example shows the statistics for a SELECT statement that joins two small tables.

Run the SELECT statement to be analyzed:

select x1.i, x2.i
    from x1 inner join x2 on x2.i = x1.i
    order by x1.i, x2.i;
Copy

Get the query ID of the SELECT statement:

set lqid = (select last_query_id());
Copy

Call GET_QUERY_STATS() to get the statistics about the SELECT statement.

select get_query_stats($lqid);
+---------------------------------------------------------------------------------------------------------+
| GET_QUERY_STATS($LQID)                                                                                  |
|---------------------------------------------------------------------------------------------------------|
| {                                                                                                       |
|   "execution_time": 70,                                                                                 |
|   "execution_time_breakdown": {                                                                         |
|     "initialization": 1.000000000000000e+00                                                             |
|   },                                                                                                    |
|   "query_id": "01a4765b-0609-05a3-0000-000c058caa12",                                                   |
|   "query_statistics": {                                                                                 |
|     "io": {                                                                                             |
|       "bytes_scanned": 1.638400000000000e+04,                                                           |
|       "percentage_scanned_from_cache": 1.000000000000000e+00,                                           |
|       "scan_progress": 1.000000000000000e+00                                                            |
|     },                                                                                                  |
|     "pruning": {                                                                                        |
|       "partitions_scanned": 1.600000000000000e+01,                                                      |
|       "partitions_total": 8.000000000000000e+01                                                         |
|     }                                                                                                   |
|   },                                                                                                    |
|   "query_text": "select x1.i, x2.i\n    from x1 inner join x2 on x2.i = x1.i\n    order by x1.i, x2.i;" |
| }                                                                                                       |
+---------------------------------------------------------------------------------------------------------+
Copy

Retrieving Data About Multiple Queries

This example shows statistics for multiple queries. This example creates a table named uuids to store query IDs to pass to GET_QUERY_STATS.

The SELECT statements below have been formatted to make them more readable in this document. The original statements did not include extra newlines and indentation. Therefore, the query_text fields also do not contain newlines or spaces for indentation.

create or replace table t as
    select seq4() a, a + 45 b, a % 67 c, a / 45678 d, a % 123 e from table(generator(rowcount=>100000));
create or replace table uuids (uuid varchar);

select c, any_value(a), min(e)
    from t
    group by c;
select t1.a, t2.c
    from t t1 join t t2 on t1.c = t2.e;
select a, row_number() over (partition by c order by e)
    from t;

insert into uuids values
    (last_query_id(-1)),
    (last_query_id(-2)),
    (last_query_id(-3));
select get_query_stats(uuid) from uuids;
Copy

The output below has been formatted for readability (to avoid very wide lines). The actual output from the function contains one row per query.

{
    "execution_time": 360,
    "execution_time_breakdown": {
        "initialization": 1.000000000000000e+00
        },
    "query_id": "01a476c0-0009-06e5-0000-00044c99f3be",
    "query_statistics": {
        "io": {
              "bytes_scanned": 3.932160000000000e+05,
              "bytes_written_to_result": 5.385150000000000e+05,
              "percentage_scanned_from_cache": 1.000000000000000e+00,
              "scan_progress": 1.000000000000000e+00
              },
        "pruning": {
            "partitions_scanned": 1.000000000000000e+00,
            "partitions_total": 1.000000000000000e+00
            }
        },
    "query_text": "select a, row_number() over (partition by c order by e) from t;"
}

{
    "execution_time": 10792,
    "execution_time_breakdown": {
        "initialization": 1.923076923076923e-01,
        "processing": 8.076923076923077e-01
        },
    "query_id": "01a476c0-0009-06e5-0000-00044c99f3ba",
    "query_statistics": {
        "io": {
              "bytes_scanned": 4.874240000000000e+05,
              "bytes_written_to_result": 2.194200000000000e+07,
              "percentage_scanned_from_cache": 1.000000000000000e+00,
              "scan_progress": 1.000000000000000e+00
              },
        "pruning": {
            "partitions_scanned": 2.000000000000000e+00,
            "partitions_total": 2.000000000000000e+00
            }
        },
    "query_text": "select t1.a, t2.c from t t1 join t t2 on t1.c = t2.e;"
}

{
    "execution_time": 48,
    "execution_time_breakdown": {
        "initialization": 1.000000000000000e+00
        },
    "query_id": "01a476c0-0009-06dc-0000-00044c99d342",
    "query_statistics": {
        "io": {
            "bytes_scanned": 3.932160000000000e+05,
            "percentage_scanned_from_cache": 1.000000000000000e+00,
            "scan_progress": 1.000000000000000e+00
            },
        "pruning": {
            "partitions_scanned": 1.000000000000000e+00,
            "partitions_total": 1.000000000000000e+00
            }
        },
    "query_text": "select c, any_value(a), min(e) from t group by c;"
}
Copy

You can query the QUERY_HISTORY view to find queries that consumed unusually large amounts of resources. For example, the following query looks for queries that spilled more than one gigabyte to disk:

select get_query_stats(query_id):query_statistics as query_stats,
       query_stats:spilling_stats:bytes_spilled_local_storage::number as local_spill_bytes,
       query_stats:spilling_stats:bytes_spilled_remote_storage::number as remote_spill_bytes,
       query_stats:io:bytes_scanned::number as table_scan_bytes,
       remote_spill_bytes / bytes_written as remote_spill_pcnt
    from account_usage.query_history
    where warehouse_id = 123
       and
       start_time between '2022-02-20' and '2022-02-21'
       and
       (remote_spill_bytes > pow(1024, 3)  or  local_spill_bytes > pow(1024, 3))
    order by remote_spill_pcnt desc;
Copy

Retrieving Statistics Based on Query Characteristics

The following example shows how to retrieve the execution breakdown and query information for statements that have common characteristics (e.g. were run by a particular user). This example retrieves query information for a specified user on a specified warehouse during a specified time period, and restricts the results to those statements that ran for a substantial amount of time and whose resource consumption was dominated by synchronization operations.

select
        query_id,
        get_query_stats(query_id) as query_stats,
        query_stats:execution_time_breakdown as percentages,
        query_stats:query_statistics as statistics,
        percentages:synchronization::double as synchronization_pcnt
    from
        workload_uuids // any table or view with uuids, such as account_usage.query_history
    where
        warehouse_name = 'PROBLEM_WH'
        and
        user_name = 'DBUSER1'
        and
        start_time between '2022-05-20' and '2022-05-21'
        and
        execution_time > 5 * 60 * 1000 // query executing for longer than 5 minutes
        and
        synchronization_pcnt * 100 > 50    // dominated by synchronization operations
    order by synchronization_pcnt desc;
Copy