Estimating Percentile Values

Snowflake uses an improved version of the t-Digest algorithm, a space and time efficient way of estimating approximate percentile values in data sets.

Overview

Snowflake provides an improved version of an implementation of the t-Digest algorithm papers by Dunning and Ertl. It has been implemented through the APPROX_PERCENTILE family of functions.

As documented, the algorithm has a constant relative error. Note that the algorithm has substantial empirical support, but no rigorous proof of any accuracy guarantees.

SQL Functions

The following Aggregate Functions are provided for using t-Digest to approximate percentile values:

Implementation Details

  • The estimation uses a constant amount of space regardless of the size of the input.

  • The t-Digest state is independent from the percentile value. This enables calculating the t-Digest state once, and then querying the state for multiple percentile values.