Creating Java UDFs

This topic shows how to create and install a Java UDF (user-defined function).

In this Topic:

Writing the Java Code

Writing the Java Class and Method

Write a class that follows the specifications below:

  • Define the class as public.

  • Define a public method inside the class.

    If the method is not a static method, then the class that contains the method must have a zero-argument constructor, or must not have defined any constructors. (When Snowflake instantiates the class for a non-static method, Snowflake does not pass any arguments.)

    If the constructor throws an error, the error is thrown as a user error, along with the exception message.

  • If the method accepts arguments, each argument must be one of the data types specified in the Java Data Type column of the SQL-Java Type Mappings table.

    When choosing data types of Java variables, take into account the maximum and minimum possible values of the data that could be sent from (and returned to) Snowflake.

    Method arguments are bound by position, not name. The first argument passed to the UDF is the first argument received by the Java method. For example, in the code examples below, the SQL UDF argument a corresponds to the Java method argument x, and b corresponds to y:

    create function my_udf(x numeric(9, 0), y float) ...
    
    public static int my_udf(int a, float b) ...
    
  • Specify an appropriate return type. Because a Java UDF must be a scalar function, it must return one value each time that it is invoked. The return type must be one of the data types specified in the Java Data Type column of the SQL-Java Type Mappings table. The return type must be compatible with the SQL data type specified in the RETURNS clause of the CREATE FUNCTION statement.

  • Your class can contain more than one method. The method that is called by Snowflake can call other methods in the same class, or in other classes.

  • Your class can also contain more than one directly-callable method. For example, your class could contain methods call_me_1() and call_me_2(), and each of those methods could call other methods.

    If your class contains more than one directly-callable method, then create your Java UDF(s) as an pre-compiled UDF. You create one JAR file, and execute two (or more) CREATE FUNCTION statements, each of which specifies a different function in its HANDLER clause.

  • Your method (and any methods called by your method) must comply with the Snowflake-imposed constraints for Java UDFs.

Creating the Function in Snowflake

The information in this section applies to all Java UDFs, regardless of whether the code is specified in-line or pre-compiled.

You must execute a CREATE FUNCTION statement to specify:

  • The SQL function name to use.

  • The name of the Java method to call when the Java UDF is called.

The name of the UDF does not need to match the name of the handler method written in Java. The CREATE FUNCTION statement associates the UDF name with the Java method. The following diagram illustrates this:

Using the CREATE FUNCTION Statement to Associate the Handler Method With the UDF Name

When choosing a name for the UDF:

  • Follow the rules for Object Identifiers.

  • Choose a name that is unique, or follow the rules for Overloading of UDF Names.

    Important

    Unlike overloading for SQL UDFs, which distinguishes among functions based on both the number and the data types of the arguments, Java UDFs distinguish among methods based only on the number of arguments. If two Java methods have the same name and the same number of arguments, but different data types, then calling a UDF using one of those methods as a handler generates an error similar to the following:

    Cannot determine which implementation of handler “handler name” to invoke since there are multiple definitions with <number of args> arguments in function <user defined function name> with handler <class name>.<handler name>

    If a warehouse is available, the error is detected at the time that the UDF is created. Otherwise, the error occurs when the UDF is called.

    Resolving based on data types is impractical because some SQL data types can be mapped to more than one Java data type and thus potentially to more than one Java UDF signature.

Method arguments are bound by position, not name. The first argument passed to the UDF is the first argument received by the Java method. For example, in the code examples below, the SQL UDF argument a corresponds to the Java method argument x, and b corresponds to y:

create function my_udf(x numeric(9, 0), y float) ...
public static int my_udf(int a, float b) ...

For information about the data types of arguments, see SQL-Java Data Type Mappings for Parameters and Return Types.

In-line UDFs vs. Pre-compiled UDFs

The code for a Java UDF can be specified either of the following ways:

  • Pre-compiled: The CREATE FUNCTION statement specifies the location of an existing JAR file. The user must compile the Java source code and put the JAR file in a stage.

  • In-line: The CREATE FUNCTION statement specifies the Java source code. Snowflake compiles the source code and stores the compiled code in a JAR file. The user has the option of specifying a location for the JAR file.

    • If the user specifies a location for the JAR file, then Snowflake compiles the code once and keeps the JAR file for future use.

    • If the user does not specify a location for the JAR file, then Snowflake re-compiles the code for each SQL statement that calls the UDF, and Snowflake automatically cleans up the JAR file after the SQL statement finishes.

Some practical differences might affect which type you create.

  • In-line Java UDFs have the following advantages:

    • They are usually much easier to implement. You do not need to compile the code and copy the JAR file to a Snowflake stage. (Note, however, that most programmers compile and test their code before putting into production, so most Java UDF code is compiled by the developer at some point, even if the UDF is in-line.)

  • Pre-compiled Java UDFs have the following advantages:

    • You can use them when you have a JAR file but no source code.

    • You can use them if the source code is too large to paste into a CREATE FUNCTION statement. (In-line Java UDFs have an upper limit on the source code size.)

    • A pre-compiled Java UDF can contain multiple callable functions. Multiple CREATE FUNCTION statements can reference the same JAR file but specify different handler functions within the JAR file.

      In-line Java UDFs normally contain only one callable function. (That callable function can call other functions, and those other functions can be defined in the same class or can be defined in other classes defined in library JAR files.)

    • If you have tools or an environment for testing or debugging JAR files, it might be more convenient to do most of the development work on your UDF using JAR files. This is particularly true if the code is large or complex.

Creating an In-line Java UDF

For an in-line UDF, you supply the Java source code as part of the CREATE FUNCTION statement. For example:

create function add(x integer, y integer)
returns integer
language java
handler='TestAddFunc.add'
target_path='@~/TestAddFunc.jar'
as
$$
    class TestAddFunc {
        public static int add(int x, int y) {
          return x + y;
        }
    }
$$;

The Java source code is specified in the AS clause. The source code can be surrounded by either single quotes or by a pair of dollar signs ($$). Using the double dollar signs is usually easier if the source code contains embedded single quotes.

The Java source code can contain more than one class, and more than one method in a class, so the HANDLER clause specifies the class and method to call.

An in-line Java UDF (like a pre-compiled Java UDF) can call code in JAR files that are included in the IMPORTS clause.

For more details about the syntax of the CREATE FUNCTION statement, see CREATE FUNCTION.

For more examples, see in-line Java UDF examples.

Creating a Pre-compiled Java UDF

Organizing Your Files

A Java UDF is stored in a JAR file. If you plan to compile the Java code to create the JAR file yourself, you can organize the files as shown below. This example assumes that you plan to use Java’s package mechanism.

  • developmentDirectory

    • packageDirectory

      • class_file1.java

      • class_file2.java

    • classDirectory

      • class_file1.class

      • class_file2.class

    • manifest_file.manifest (optional)

    • jar_file.jar

    • put_command.sql

developmentDirectory

This directory contains the project-specific files required to create your Java UDF.

packageDirectory

This directory contains the .java files to compile and include in the package.

class_file#.java

These files contain the Java source code of the UDF.

class_file#.class

These are the .class file(s) created by compiling the .java files.

manifest_file.manifest

The optional manifest file used when combining the .class files (and optionally, dependency JAR files) into the JAR file.

jar_file.jar

The JAR file that contains the UDF code.

put_command.sql

This file contains the SQL PUT command to copy the JAR file to a Snowflake stage.

Compiling the Java Code and Creating the JAR File

To create a JAR file that contains the compiled Java code:

  • Use javac to compile your .java file to a .class file.

    If you use a compiler newer than version 11.x, you can use the “–release” option to specify that the target version is version 11.

  • Put your .class file into a JAR file. You can package multiple class files (and other JAR files) into your JAR file.

    For example:

    jar cf ./my_decrement_udf_jar.jar my_decrement_udf_package/my_decrement_udf_class.class
    

    A manifest file is required if you use a package, and optional if you do not use a package. The following example uses a manifest file:

    jar cmf my_decrement_udf_manifest.manifest ./my_decrement_udf_jar.jar my_decrement_udf_package/my_decrement_udf_class.class
    

    To build the jar file with all dependencies included, you can use Maven’s mvn package command with the maven-assembly-plugin. For more information about the maven-assembly-plugin, see:

    Snowflake automatically supplies the standard Java libraries (e.g. java.util). If your code calls those libraries, you do not need to include them in your JAR file.

    The methods that you call in libraries must follow the same Snowflake-imposed constraints as your Java method.

Copying the JAR File to Your Stage

Snowflake reads the JAR file from an external or named internal stage, so you must copy your JAR file to a stage. You can use your Snowflake user’s default stage, or another existing stage, or you can create a new stage.

The stage hosting the JAR file must be readable by the owner of the UDF.

Typically, you used a named internal stage and use the PUT command to upload the file to the stage. (Note that the PUT command cannot be executed through the Snowflake GUI. You can use SnowSQL to execute PUT.)

If you delete or rename the JAR file, you can no longer call the UDF.

Snowflake recommends following these best practices:

  • If you need to update your JAR file, then:

    • Update it while no calls to the UDF can be made.

    • If the old .jar file is still in the stage, the PUT command should include the clause OVERWRITE=TRUE.

See the Examples section for an example PUT command to copy a .jar file to a stage.

Granting Privileges on the Function

For any role other than the owner of the function to call the function, the owner must grant the appropriate privileges to the role.

The GRANT statements for a Java UDF are essentially identical to the GRANT statements for other UDFs, such as JavaScript UDFs.

For example:

GRANT USAGE ON FUNCTION my_java_udf(number, number) TO my_role;

Examples

In-Line Java UDFs

Creating and Calling a Simple In-line Java UDF

The following statements create and call an in-line Java UDF. This code simply returns the VARCHAR passed to it.

This function is declared with the optional CALLED ON NULL INPUT clause to indicate that the function is called even if the value of the input is NULL. (This function would return NULL with or without this clause, but you could modify the code to handle NULL another way, for example, to return an empty string.)

Create the UDF:

create or replace function echo_varchar(x varchar)
returns varchar
language java
called on null input
handler='TestFunc.echo_varchar'
target_path='@~/testfunc.jar'
as
'class TestFunc {
  public static String echo_varchar(String x) {
    return x;
  }
}';

Call the UDF:

SELECT echo_varchar('Hello');
+-----------------------+
| ECHO_VARCHAR('HELLO') |
|-----------------------|
| Hello                 |
+-----------------------+

Passing a NULL to an In-line Java UDF

This uses the echo_varchar() UDF defined above. The SQL NULL value is implicitly converted to Java null, and that Java null is returned and implicitly converted back to SQL NULL:

Call the UDF:

SELECT echo_varchar(NULL);
+--------------------+
| ECHO_VARCHAR(NULL) |
|--------------------|
| NULL               |
+--------------------+

Returning NULL Explicitly

The following code shows how to return a NULL value explicitly. The Java value null is converted to SQL NULL.

Create the UDF:

create or replace function return_a_null()
returns varchar
null
language java
handler='TemporaryTestLibrary.return_a_null'
target_path='@~/TemporaryTestLibrary.jar'
as
$$
class TemporaryTestLibrary {
  public static String return_a_null() {
    return null;
  }
}
$$;

Call the UDF:

SELECT return_a_null();
+-----------------+
| RETURN_A_NULL() |
|-----------------|
| NULL            |
+-----------------+

Passing an OBJECT to an In-line Java UDF

The following example uses the SQL OBJECT data type and the corresponding Java data type (Map<String, String>), and extracts a value from the OBJECT. This example also shows that you can pass multiple parameters to a Java UDF.

Create and load a table that contains a column of type OBJECT:

CREATE TABLE objectives (o OBJECT);
INSERT INTO objectives SELECT PARSE_JSON('{"outer_key" : {"inner_key" : "inner_value"} }');

Create the UDF:

create or replace function extract_from_object(x OBJECT, key VARCHAR)
returns variant
language java
handler='VariantLibrary.extract'
target_path='@~/VariantLibrary.jar'
as
$$
import java.util.Map;
class VariantLibrary {
  public static String extract(Map<String, String> m, String key) {
    return m.get(key);
  }
}
$$;

Call the UDF:

SELECT extract_from_object(o, 'outer_key'), 
       extract_from_object(o, 'outer_key')['inner_key'] FROM objectives;
+-------------------------------------+--------------------------------------------------+
| EXTRACT_FROM_OBJECT(O, 'OUTER_KEY') | EXTRACT_FROM_OBJECT(O, 'OUTER_KEY')['INNER_KEY'] |
|-------------------------------------+--------------------------------------------------|
| {                                   | "inner_value"                                    |
|   "inner_key": "inner_value"        |                                                  |
| }                                   |                                                  |
+-------------------------------------+--------------------------------------------------+

Passing an ARRAY to an In-line Java UDF

The following example uses the SQL ARRAY data type.

Create the UDF:

create or replace function multiple_functions_in_jar(array1 array)
returns varchar
language java
handler='TemporaryTestLibrary.multiple_functions_in_jar'
target_path='@~/TemporaryTestLibrary.jar'
as
$$
class TemporaryTestLibrary {
  public static String multiple_functions_in_jar(String[] array_of_strings) {
    return concatenate(array_of_strings);
  }
  public static String concatenate(String[] array_of_strings) {
    int number_of_strings = array_of_strings.length;
    String concatenated = "";
    for (int i = 0; i < number_of_strings; i++)  {
        concatenated = concatenated + " " + array_of_strings[i];
        }
    return concatenated;
  }
}
$$;

Call the UDF:

SELECT multiple_functions_in_jar(ARRAY_CONSTRUCT('Hello', 'world'));
+--------------------------------------------------------------+
| MULTIPLE_FUNCTIONS_IN_JAR(ARRAY_CONSTRUCT('HELLO', 'WORLD')) |
|--------------------------------------------------------------|
|  Hello world                                                 |
+--------------------------------------------------------------+

Pre-compiled Java UDFs

Creating and Calling a Simple Pre-compiled Java UDF

The following statements create a simple Java UDF. This sample generally follows the file and directory structure described in Organizing Your Files.

Create a java file that contains your source code:

package my_decrement_udf_package;


public class my_decrement_udf_class
{

public static int my_decrement_udf_method(int i)
{
    return i - 1;
}


public static void main(String[] argv)
{
    System.out.println("This main() function won't be called.");
}


}

Optionally, create a manifest file similar to the one shown below:

Manifest-Version: 1.0
Main-Class: my_decrement_udf_class.class

Compile the source code. This example stores the generated .class file(s) in the directory named classDirectory.

javac -d classDirectory my_decrement_udf_package/my_decrement_udf_class.java

Create the JAR file from the .class file. This example uses “-C classDirectory” to specify the location of the .class files:

jar cmf my_decrement_udf_manifest.manifest ./my_decrement_udf_jar.jar -C classDirectory my_decrement_udf_package/my_decrement_udf_class.class

Use the PUT command to copy the JAR file from the local file system to a stage. This example uses the user’s default stage, named @~:

put
    file:///Users/Me/JavaUDFExperiments/my_decrement_udf_jar.jar
    @~/my_decrement_udf_package_dir/
    auto_compress = false
    overwrite = true
    ;

You can store the PUT command in a script file and then execute that file through snowsql. The snowsql command looks similar to the following:

snowsql -a <account_identifier> -w <warehouse> -d <database> -s <schema> -u <user> -f put_command.sql

This example assumes that the user’s password is specified in the SNOWSQL_PWD environment variable.

Create the UDF:

create function my_decrement_udf(i numeric(9, 0))
    returns numeric
    language java
    imports = ('@~/my_decrement_udf_package_dir/my_decrement_udf_jar.jar')
    handler = 'my_decrement_udf_package.my_decrement_udf_class.my_decrement_udf_method'
    ;

Call the UDF:

SELECT my_decrement_udf(-15);
+-----------------------+
| MY_DECREMENT_UDF(-15) |
|-----------------------|
|                   -16 |
+-----------------------+