Calling a UDF

You can call a user-defined function (UDF) or user-defined table function (UDTF) in the same way that you call other functions.

Calling a UDF

In general, you call a UDF same way that you call other functions.

Code in the following example calls a UDF called my_udf, specifying column_1 from my_table as an argument.

SELECT my_udf(column_1) FROM my_table;
Copy

Calling a UDTF

You can call a UDTF the way you would call any table function. When calling a UDTF in the FROM clause of a query, specify the UDTF’s name and arguments inside the parentheses that follow the TABLE keyword.

In other words, use a form such as the following for the TABLE keyword when calling a UDTF:

SELECT ...
  FROM TABLE ( udtf_name (udtf_arguments) )
Copy

Code in the following example calls the my_java_udtf table function, specifying a DATE literal in the argument '2021-01-16'::DATE.

SELECT ...
  FROM TABLE(my_java_udtf('2021-01-16'::DATE));
Copy

The argument to a table function can be an expression, not just a literal. For example, a table function can be called using a column from a table. Some examples are below, including in the Examples section.

For more information about table functions in general, see table function.

Note

You cannot call a UDF within the DEFAULT clause of a CREATE TABLE statement.

Using a Table or UDTF as Input to a UDTF

The input to a table function can come from a table or from another UDTF, as documented in Using a Table as Input to a Table Function.

The example below shows how to use a table to provide input to the UDTF split_file_into_words:

create table file_names (file_name varchar);
insert into file_names (file_name) values ('sample.txt'),
                                          ('sample_2.txt');

select f.file_name, w.word
   from file_names as f, table(split_file_into_words(f.file_name)) as w;
Copy

The output looks similar to the following:

+-------------------+------------+
| FILE_NAME         | WORD       |
+-------------------+------------+
| sample_data.txt   | some       |
| sample_data.txt   | words      |
| sample_data_2.txt | additional |
| sample_data_2.txt | words      |
+-------------------+------------+
Copy

The IMPORTS clause of the UDTF must specify the name and path of each file passed to the UDTF. For example:

create function split_file_into_words(inputFileName string)
    ...
    imports = ('@inline_jars/sample.txt', '@inline_jars/sample_2.txt')
    ...
Copy

Each file must already have been copied to a stage (in this case, the stage named @inline_jars) before the UDTF reads the file.

For an example of using a UDTF as an input to another UDTF, see Extended Examples Using Table Values and Other UDTFs as Input in the JavaScript UDTF documentation.

Table Functions and Partitions

Before rows are passed to table functions, the rows can be grouped into partitions. Partitioning has two main benefits:

  • Partitioning allows Snowflake to divide up the workload to improve parallelization and thus performance.

  • Partitioning allows Snowflake to process all rows with a common characteristic as a group. You can return results that are based on all rows in the group, not just on individual rows.

For example, you might partition stock price data into one group per stock. All stock prices for an individual company can be analyzed together, while stock prices for each company can be analyzed independently of any other company.

Data can be partitioned explicitly or implicitly.

Explicit Partitioning

Explicit Partitioning into Multiple Groups

The following statement calls the UDTF named my_udtf on individual partitions. Each partition contains all rows for which the PARTITION BY expression evaluates to the same value (e.g. the same company or stock symbol).

SELECT *
    FROM stocks_table AS st,
         TABLE(my_udtf(st.symbol, st.transaction_date, st.price) OVER (PARTITION BY st.symbol))
Copy

Explicit Partitioning into a Single Group

The following statement calls the UDTF named my_udtf on one partition. The PARTITION BY <constant> clause (in this case PARTITION BY 1) puts all rows in the same partition.

SELECT *
    FROM stocks_table AS st,
         TABLE(my_udtf(st.symbol, st.transaction_date, st.price) OVER (PARTITION BY 1))
Copy

For a more complete and realistic example, see Examples of Calling Java UDTFs in Queries, in particular the subsection titled Single Partition.

Sorting Rows for Partitions

To process each partition’s rows in a specified order, include an ORDER BY clause. This tells Snowflake to pass the rows to the per-row handler method in the specified order.

For example, if you want to calculate the moving average of a stock price over time, then order the stock prices by timestamp (and partition by stock symbol). The following example shows how to do this:

SELECT *
     FROM stocks_table AS st,
          TABLE(my_udtf(st.symbol, st.transaction_date, st.price) OVER (PARTITION BY st.symbol ORDER BY st.transaction_date))
Copy

An OVER clause can contain an ORDER BY clause even without a PARTITION BY clause.

Remember that including an ORDER BY clause inside an OVER clause is not the same as putting an ORDER BY clause at the outermost level of the query. If you want the entire query results to be ordered, you need a separate ORDER BY clause. For example:

SELECT *
    FROM stocks_table AS st,
         TABLE(my_udtf(st.symbol, st.transaction_date, st.price) OVER (PARTITION BY st.symbol ORDER BY st.transaction_date))
    ORDER BY st.symbol, st.transaction_date, st.transaction_time;
Copy

Usage Notes for Explicit Partitioning

When using a UDTF with a PARTITION BY clause, the PARTITION BY clause must use a column reference or a literal, not a general expression. For example, the following is not allowed:

SELECT * FROM udtf_table, TABLE(my_func(col1) OVER (PARTITION BY udtf_table.col2 * 2));   -- NO!
Copy

Implicit Partitioning

If a table function does not explicitly partition the rows by using a PARTITION BY clause, then Snowflake typically partitions the rows implicitly to use parallel processing to improve performance.

The number of partitions is typically based on factors such as the size of the warehouse processing the function and the cardinality of the input relation. The rows are typically assigned to specific partitions based on factors such as physical location of the rows (e.g. by micro-partition), so the partition grouping has no meaning.