Creating Java UDFs

This topic shows how to create and install a Java user-defined function (UDF).

When you write a Java UDF, you write Java code for Snowflake to execute as UDF logic. This Java code is the UDF’s handler. You deploy the UDF with CREATE FUNCTION, giving the UDF a name and specifying the Java method as the handler to use when the UDF is called.

For more example code, see Java UDF Handler Examples.

In this Topic:

Writing the UDF Handler in Java

Use the following requirements and guidelines when writing your Java UDF handler.

  • Define the class as public.

  • Inside the class, declare at least one public method to use as a UDF handler.

    For an inline UDF, declare one handler method only. If instead you intend to package the class into a JAR as a pre-compiled UDF, you can declare multiple handler methods, later specifying each as a handler with the HANDLER clause of a CREATE FUNCTION statement.

    You can declare other methods, if needed, to be called by the handler method.

    Use the following requirements and guidelines for each handler method:

    • Declare the handler method as public, either static or non-static.

      If the method is non-static, your class must also declare a zero-argument constructor or no constructor at all.

      Snowflake does not pass any arguments to the constructor when it instantiates the class. If the constructor throws an error, the error is thrown as a user error, along with the exception message.

    • Specify an appropriate return type.

      The return type must be one of the data types specified in the Java Data Type column of the SQL-Java Type Mappings table. The return type must be compatible with the SQL data type specified in the RETURNS clause of the CREATE FUNCTION statement.

    • Ensure that each handler method argument (if any) is a data type specified in the Java Data Type column of the SQL-Java Type Mappings table.

      When choosing data types of Java variables, take into account the maximum and minimum possible values of the data that could be sent from (and returned to) Snowflake.

    • Comply with the Snowflake-imposed constraints for Java UDFs in each handler method and methods it calls.

Creating the Function in Snowflake

The information in this section applies to all Java UDFs, regardless of whether the code is specified in-line or pre-compiled.

You must execute a CREATE FUNCTION statement to specify aspects of the UDF, including:

  • The UDF name.

  • The name of the Java method for Snowflake to call as a handler when the UDF is called.

The UDF name does not need to match the name of the handler method written in Java. The CREATE FUNCTION statement associates the UDF name with the Java method, as shown in the following diagram:

Using the CREATE FUNCTION Statement to Associate the Handler Method With the UDF Name

When choosing a name for the UDF:

  • Follow the rules for Object Identifiers.

  • Choose a name that is unique, or follow the rules for Overloading of UDF Names.

    Important

    Unlike overloading for SQL UDFs, which distinguishes among functions based on both the number and the data types of the arguments, Java UDFs distinguish among methods based only on the number of arguments. If two Java methods have the same name and the same number of arguments, but different data types, then calling a UDF using one of those methods as a handler generates an error similar to the following:

    Cannot determine which implementation of handler “handler name” to invoke since there are multiple definitions with <number of args> arguments in function <user defined function name> with handler <class name>.<handler name>

    If a warehouse is available, the error is detected at the time that the UDF is created. Otherwise, the error occurs when the UDF is called.

    Resolving based on data types is impractical because some SQL data types can be mapped to more than one Java data type and thus potentially to more than one Java UDF signature.

Handler method arguments are bound to UDF arguments by position, not name. In other words, the first UDF argument is passed to the first method argument, the second UDF argument is passed to the second method argument, and so on.

In the examples below, the SQL UDF argument x corresponds to the Java method argument a, and y corresponds to b:

create function add_int_float(x numeric(9, 0), y float)
returns float
language java
handler = 'MyClass.addIntFloat';
public static float addIntFloat(int a, float b) {
  // ...
}

For information about the data types of arguments, see SQL-Java Data Type Mappings for Parameters and Return Types.

Note

Scalar functions (UDFs) have a limit of 500 input arguments.

It is recommended that you use the RUNTIME_VERSION parameter of the the CREATE FUNCTION statement to specify which supported Java runtime version to use because the default version may change in the future.

In-line UDFs vs. Pre-compiled UDFs

You can define a Java UDF handler in either of the following ways:

  • Pre-compiled, in which you package the compiled Java handler in a JAR and put it where Snowflake can read it.

    The CREATE FUNCTION statement specifies the location of the JAR file containing the handler. Before executing CREATE FUNCTION, you copy the JAR file to a stage from which Snowflake can read it.

  • In-line, in which you include the Java code with the UDF declaration itself.

    The CREATE FUNCTION statement itself specifies the Java source code. Snowflake compiles the source code and stores the compiled code in a JAR file. You can specify a location for the resulting JAR file with the TARGET_PATH clause when you run CREATE FUNCTION.

    • If CREATE FUNCTION specifies a location for the JAR file, then Snowflake compiles the code once and keeps the JAR file for future use.

    • If CREATE FUNCTION does not specify a location for the JAR file, then Snowflake re-compiles the code for each SQL statement that calls the UDF, and Snowflake automatically cleans up the JAR file after the SQL statement finishes.

Practical Differences

In-line Java UDF Advantages

  • They are usually easier to implement. After using your development tools to verify that your code works as it should, you can deploy it by copying it into the CREATE FUNCTION statement, then executing the statement. You can maintain the code there, updating and executing CREATE FUNCTION, without having to separately re-package the compiled output in a JAR and update it on a stage.

Pre-compiled Java UDF Advantages

  • You can use them when you have a JAR file but no source code.

  • You can use them if the source code is too large to paste into a CREATE FUNCTION statement. (In-line Java UDFs have an upper limit on the source code size.)

  • A pre-compiled Java UDF can contain multiple handler functions. Multiple CREATE FUNCTION statements can reference the same JAR file but specify different handler functions within the JAR file.

    In-line Java UDFs normally contain only one callable function. (That callable function can call other functions, and those other functions can be defined in the same class or can be defined in other classes defined in library JAR files.)

  • If you have tools or an environment for testing or debugging JAR files, it might be more convenient to do most of the development work on your UDF using JAR files. This is particularly true if the code is large or complex.

Adding Dependencies to the Classpath

When your handler code requires classes packaged in external JAR files, you can add these dependencies to the Snowflake-managed classpath available to your handler. The following describes how to add JAR files to the classpath visible to a Java UDF handler.

  1. Create a stage that’s available to your handler.

    For UDF dependencies, you can use an external stage or internal stage. If you use an internal stage, it must be a user or named stage; Snowflake does not currently support using a table stage for UDF dependencies. For more on creating a stage, see CREATE STAGE. For more on choosing an internal stage type, see Choosing an Internal Stage for Local Files.

  2. Copy the dependency JAR to the stage.

    You can copy the JAR from a local drive to a stage by using the PUT command. For command reference, see PUT. For information on staging files with PUT, see Staging Data Files from a Local File System.

  3. Reference the dependency JAR when you create the UDF.

    When you execute CREATE FUNCTION to create the UDF, specify the stage location and file path-and-name of all dependency JAR files as values of the IMPORTS clause. At runtime, Snowflake adds the JAR to the classpath. For reference information, see CREATE FUNCTION.

    Code in the following example creates a UDF called my_udf, specifying a my_handler_dependency.jar dependency on the stage @mystage.

    CREATE FUNCTION my_udf(i NUMERIC)
      RETURNS NUMERIC
      LANGUAGE JAVA
      IMPORTS = ('@mystage/dependencies/my_handler_dependency.jar')
      HANDLER = 'MyClass.myFunction'
      AS
      $$
        // Handler code omitted.
      $$
    

Creating an In-line Java UDF

For an in-line UDF, you supply the Java source code as part of the CREATE FUNCTION statement.

You put the Java source code in the AS clause, surrounding the code with single quotes or a pair of dollar signs ($$). Using the double dollar signs might be easier, such as when the source code contains embedded single quotes.

Code in the following example declares an add UDF whose handler is the add method in the TestAddFunc class.

create function add(x integer, y integer)
returns integer
language java
handler='TestAddFunc.add'
target_path='@~/TestAddFunc.jar'
as
$$
    class TestAddFunc {
        public static int add(int x, int y) {
          return x + y;
        }
    }
$$;

Given that the Java source code can contain more than one class and more than one method in a class, the HANDLER clause specifies the class and method to use as the handler.

An in-line Java UDF (like a pre-compiled Java UDF) can call code in JAR files that are included in the CREATE FUNCTION statement’s IMPORTS clause.

For more details about the syntax of the CREATE FUNCTION statement, see CREATE FUNCTION.

For more examples, see Java UDF Handler Examples.

Creating a Pre-compiled Java UDF

When you intend to create a UDF that specifies the location of an existing JAR for its handler, you develop the handler by:

Organizing Your Files

If you plan to compile the Java code to create the JAR file yourself, you can organize the files as shown below. This example assumes that you plan to use Java’s package mechanism.

  • developmentDirectory

    • packageDirectory

      • class_file1.java

      • class_file2.java

    • classDirectory

      • class_file1.class

      • class_file2.class

    • manifest_file.manifest (optional)

    • jar_file.jar

    • put_command.sql

developmentDirectory

This directory contains the project-specific files required to create your Java UDF.

packageDirectory

This directory contains the .java files to compile and include in the package.

class_file#.java

These files contain the Java source code of the UDF.

class_file#.class

These are the .class file(s) created by compiling the .java files.

manifest_file.manifest

The optional manifest file used when combining the .class files (and optionally, dependency JAR files) into the JAR file.

jar_file.jar

The JAR file that contains the UDF code.

put_command.sql

This file contains the SQL PUT command to copy the JAR file to a Snowflake stage.

Compiling the Java Code and Creating the JAR File

To create a JAR file that contains the compiled Java code:

  • Use javac to compile your .java file to a .class file.

    If you use a compiler newer than version 11.x, you can use the “–release” option to specify that the target version is version 11.

  • Put your .class file into a JAR file. You can package multiple class files (and other JAR files) into your JAR file.

    For example:

    jar cf ./my_udf.jar MyClass.class
    

    A manifest file is required if your handler class is in a package, and optional otherwise. The following example uses a manifest file:

    jar cmf my_udf.manifest ./my_udf.jar example/MyClass.class
    

    To build the jar file with all dependencies included, you can use Maven’s mvn package command with the maven-assembly-plugin. For more information about the maven-assembly-plugin, see the Maven usage page.

    Snowflake automatically supplies the standard Java libraries (e.g. java.util). If your code calls those libraries, you do not need to include them in your JAR file.

    The methods that you call in libraries must follow the same Snowflake-imposed constraints as your Java method.

Copying the JAR File to Your Stage

In order for Snowflake to read from the JAR containing your handler method, you need to copy the JAR to one of the following kinds of stage:

  • A user or named internal stage.

    Snowflake does not currently support using a table stage to store a JAR file with UDF handlers. For more on internal stages, see Choosing an Internal Stage for Local Files.

  • An external stage.

The stage hosting the JAR file must be readable by the owner of the UDF.

Typically, you upload the JAR to a named internal stage using the PUT command. Note that you can’t execute the PUT command through the Snowflake GUI; you can use SnowSQL to execute PUT. See the Java UDF Handler Examples section for an example PUT command to copy a .jar file to a stage.

For more about creating stages, see CREATE STAGE.

Caveats and Best Practices

If you delete or rename the JAR file, you can no longer call the UDF.

If you need to update your JAR file, then:

  • Update it while no calls to the UDF can be made.

  • If the old .jar file is still in the stage, the PUT command should include the clause OVERWRITE=TRUE.

Note

A user performing related to UDFs must have a role that has been assigned permissions required for the action. For more information, see Granting Privileges for User-Defined Functions.

Back to top