Java UDF Handler Examples

This topic includes simple examples of UDF handler code written in Java.

For more on using Java to create a UDF handler, see Creating Java UDFs.

In this Topic:

Creating and Calling a Simple In-line Java UDF

The following statements create and call an in-line Java UDF. This code returns the VARCHAR passed to it.

This function is declared with the optional CALLED ON NULL INPUT clause to indicate that the function is called even if the value of the input is NULL. (This function would return NULL with or without this clause, but you could modify the code to handle NULL another way, for example, to return an empty string.)

Create the UDF:

create or replace function echo_varchar(x varchar)
returns varchar
language java
called on null input
handler='TestFunc.echoVarchar'
target_path='@~/testfunc.jar'
as
'class TestFunc {
  public static String echoVarchar(String x) {
    return x;
  }
}';

Call the UDF:

SELECT echo_varchar('Hello');
+-----------------------+
| ECHO_VARCHAR('HELLO') |
|-----------------------|
| Hello                 |
+-----------------------+

Passing a NULL to an In-line Java UDF

This uses the echo_varchar() UDF defined above. The SQL NULL value is implicitly converted to Java null, and that Java null is returned and implicitly converted back to SQL NULL:

Call the UDF:

SELECT echo_varchar(NULL);
+--------------------+
| ECHO_VARCHAR(NULL) |
|--------------------|
| NULL               |
+--------------------+

Returning NULL Explicitly from an In-Line UDF

The following code shows how to return a NULL value explicitly. The Java value null is converted to SQL NULL.

Create the UDF:

create or replace function return_a_null()
returns varchar
null
language java
handler='TemporaryTestLibrary.returnNull'
target_path='@~/TemporaryTestLibrary.jar'
as
$$
class TemporaryTestLibrary {
  public static String returnNull() {
    return null;
  }
}
$$;

Call the UDF:

SELECT return_a_null();
+-----------------+
| RETURN_A_NULL() |
|-----------------|
| NULL            |
+-----------------+

Passing an OBJECT to an In-line Java UDF

The following example uses the SQL OBJECT data type and the corresponding Java data type (Map<String, String>), and extracts a value from the OBJECT. This example also shows that you can pass multiple parameters to a Java UDF.

Create and load a table that contains a column of type OBJECT:

CREATE TABLE objectives (o OBJECT);
INSERT INTO objectives SELECT PARSE_JSON('{"outer_key" : {"inner_key" : "inner_value"} }');

Create the UDF:

create or replace function extract_from_object(x OBJECT, key VARCHAR)
returns variant
language java
handler='VariantLibrary.extract'
target_path='@~/VariantLibrary.jar'
as
$$
import java.util.Map;
class VariantLibrary {
  public static String extract(Map<String, String> m, String key) {
    return m.get(key);
  }
}
$$;

Call the UDF:

SELECT extract_from_object(o, 'outer_key'), 
       extract_from_object(o, 'outer_key')['inner_key'] FROM objectives;
+-------------------------------------+--------------------------------------------------+
| EXTRACT_FROM_OBJECT(O, 'OUTER_KEY') | EXTRACT_FROM_OBJECT(O, 'OUTER_KEY')['INNER_KEY'] |
|-------------------------------------+--------------------------------------------------|
| {                                   | "inner_value"                                    |
|   "inner_key": "inner_value"        |                                                  |
| }                                   |                                                  |
+-------------------------------------+--------------------------------------------------+

Passing an ARRAY to an In-line Java UDF

The following example uses the SQL ARRAY data type.

Create the UDF:

create or replace function multiple_functions_in_jar(array1 array)
returns varchar
language java
handler='TemporaryTestLibrary.handleStrings'
target_path='@~/TemporaryTestLibrary.jar'
as
$$
class TemporaryTestLibrary {
  public static String handleStrings(String[] strings) {
    return concatenate(strings);
  }
  public static String concatenate(String[] strings) {
    int numberOfStrings = strings.length;
    String concatenated = "";
    for (int i = 0; i < numberOfStrings; i++)  {
        concatenated = concatenated + " " + strings[i];
    }
    return concatenated;
  }
}
$$;

Call the UDF:

SELECT multiple_functions_in_jar(ARRAY_CONSTRUCT('Hello', 'world'));
+--------------------------------------------------------------+
| MULTIPLE_FUNCTIONS_IN_JAR(ARRAY_CONSTRUCT('HELLO', 'WORLD')) |
|--------------------------------------------------------------|
|  Hello world                                                 |
+--------------------------------------------------------------+

Passing a GEOGRAPHY Value to an In-line Java UDF

The following example uses the SQL GEOGRAPHY data type.

Create the UDF:

create or replace function geography_equals(x geography, y geography)
returns boolean
language java
packages=('com.snowflake:snowpark:1.2.0')
handler='TestGeography.compute'
as
$$
import com.snowflake.snowpark_java.types.Geography;

class TestGeography {
  public static boolean compute(Geography geo1, Geography geo2) {
    return geo1.equals(geo2);
  }
}
$$;

You can use the PACKAGES clause to specify a Snowflake system package such as the Snowpark package. When you do, you don’t need to also include the Snowpark JAR file as a value of an IMPORTS clause. For more on PACKAGES, see CREATE FUNCTION optional parameters.

Create data and call the UDF with that data:

create table geocache_table (id INTEGER, g1 GEOGRAPHY, g2 GEOGRAPHY);

insert into geocache_table (id, g1, g2) select
    1, TO_GEOGRAPHY('POINT(-122.35 37.55)'), TO_GEOGRAPHY('POINT(-122.35 37.55)');
insert into geocache_table (id, g1, g2) select
    2, TO_GEOGRAPHY('POINT(-122.35 37.55)'), TO_GEOGRAPHY('POINT(90.0 45.0)');

select id, g1, g2, geography_equals(g1, g2) as "EQUAL?"
    from geocache_table
    order by id;

The output looks similar to:

+----+--------------------------------------------------------+---------------------------------------------------------+--------+
| ID | G1                                                     | G2                                                      | EQUAL? |
+----+--------------------------------------------------------|---------------------------------------------------------+--------+
| 1  | { "coordinates": [ -122.35, 37.55 ], "type": "Point" } | { "coordinates": [ -122.35,  37.55 ], "type": "Point" } | TRUE   |
| 2  | { "coordinates": [ -122.35, 37.55 ], "type": "Point" } | { "coordinates": [   90.0,   45.0  ], "type": "Point" } | FALSE  |
+----+--------------------------------------------------------+---------------------------------------------------------+--------+

Passing a VARIANT Value to an In-line Java UDF

When you pass a value of the SQL VARIANT type to a Java UDF, Snowflake can convert the value to the Variant type provided with the Snowpark package. Note that Variant is supported from the Snowpark package version 1.4.0 and later.

The Snowpark Variant type provides methods for converting values between Variant and other types.

To use the Snowpark Variant type, use the PACKAGES clause to specify the Snowpark package when creating the UDF. When you do, you don’t need to also include the Snowpark JAR file as a value of an IMPORTS clause. For more information on PACKAGES, see CREATE FUNCTION optional parameters.

Code in the following example receives JSON data stored as the VARIANT type, then uses the Variant type in the Snowpark library to retrieve the price value from the JSON. The received JSON has a structure similar to the JSON displayed in Sample Data Used in Examples.

create or replace function retrieve_price(v variant)
returns integer
language java
packages=('com.snowflake:snowpark:1.4.0')
handler='VariantTest.retrievePrice'
as
$$
import java.util.Map;
import com.snowflake.snowpark_java.types.Variant;

public class VariantTest {
  public static Integer retrievePrice(Variant v) throws Exception {
    Map<String, Variant> saleMap = v.asMap();
    int price = saleMap.get("vehicle").asMap().get("price").asInt();
    return price;
  }
}
$$;

Reading a File with a Java UDF Handler

You can read the contents of a file with Java UDF handler code. The file must be on a Snowflake stage that’s available to your handler. For example, you might want to read a file to process unstructured data in the handler. For more information, see Processing Unstructured Data Using Java UDFs or UDTFs.

To read the contents of staged files, your Java UDFs or UDTFs can:

  • Specify the file path and name with the IMPORTS clause, then read it from the UDF’s home directory. This can be useful when you want to access the file statically, such as during initialization.

  • Call methods in either the SnowflakeFile class or the InputStream class. You might do this if you need to access the file dynamically during compute. For more information, see Reading a File Using the SnowflakeFile Class or Reading a File Using the InputStream Class in this topic.

    SnowflakeFile provides features not available with InputStream, as described in the following table.

    Class

    Input

    Notes

    SnowflakeFile

    Scoped URL, file URL, or string path for files located in an internal or external stage

    Easily access additional file attributes, such as file size.

    InputStream

    Scoped URL, file URL, or string path for files located in an internal or external stage

Prerequisites

Before your Java handler code can read a file on a stage, you must do the following to make the file available to the code:

  1. Create a stage that’s available to your handler.

    You can use an external stage or internal stage. If you use an internal stage, it must be a user or named stage; Snowflake does not currently support using a table stage for UDF dependencies. For more on creating a stage, see CREATE STAGE. For more on choosing an internal stage type, see Choosing an Internal Stage for Local Files.

    Keep in mind that adequate privileges on the stage must be assigned to roles performing SQL actions that read from the stage. For more information, see Granting Privileges for User-Defined Functions.

  2. To the stage, copy the file that will be read by code.

    You can copy the file from a local drive to a stage by using the PUT command. For command reference, see PUT. For information on staging files with PUT, see Staging Data Files from a Local File System.

Reading a File Using IMPORTS

You can read a file by specifying the file name and stage name in the IMPORTS clause of the CREATE FUNCTION command.

When you specify a file in the IMPORTS clause, Snowflake copies that file from the stage to the UDF’s home directory (also called the import directory), which is the directory from which the UDF actually reads the file.

Because imported files are copied to a single directory and must have unique names within that directory, each file in the IMPORTS clause must have a distinct name, even if the files start out in different stages or different subdirectories within a stage.

The following example creates and calls a Java UDF that reads a file.

The Java source code below creates a Java method named readFile. This UDF uses this method.

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.stream.Stream;

class TestReadRelativeFile {
    public static String readFile(String fileName) throws IOException {
        StringBuilder contentBuilder = new StringBuilder();
        String importDirectory = System.getProperty("com.snowflake.import_directory");
        String fPath = importDirectory + fileName;
        Stream<String> stream = Files.lines(Paths.get(fPath), StandardCharsets.UTF_8);
        stream.forEach(s -> contentBuilder.append(s).append("\n"));
        return contentBuilder.toString();
    }
}

The following SQL code creates the UDF. This code assumes that the Java source code has been compiled and put into a JAR file named TestReadRelativeFile.jar, which the UDF imports. The second and third imported files, my_config_file_1.txt and my_config_file_2.txt, are configuration files that the UDF can read.

CREATE FUNCTION file_reader(file_name VARCHAR)
RETURNS VARCHAR
LANGUAGE JAVA
IMPORTS = ('@my_stage/my_package/TestReadRelativeFile.jar',
           '@my_stage/my_path/my_config_file_1.txt',
           '@my_stage/my_path/my_config_file_2.txt')
HANDLER = 'my_package.TestReadRelativeFile.readFile';

This code calls the UDF:

SELECT file_reader('my_config_file_1.txt') ...;
...
SELECT file_reader('my_config_file_2.txt') ...;

Choosing Whether to Access a File in Compressed or Uncompressed Format

Files in a stage can be stored in compressed or uncompressed format. Users can compress the file before copying it to the stage, or can tell the PUT command to compress the file.

When Snowflake copies a file compressed in GZIP format from a stage to the UDF home directory, Snowflake can write the copy as-is, or Snowflake can decompress the content before writing the file.

If the file in the stage is compressed, and if you would like the copy in the UDF home directory to also be compressed, then when you specify the file name in the IMPORTS clause, simply use the original file name (e.g. “MyData.txt.gz”) in the IMPORTS clause. For example:

... imports = ('@MyStage/MyData.txt.gz', ...)

If the file in the stage is GZIP-compressed, but you would like the copy in the UDF home directory to be uncompressed, then when you specify the file name in the IMPORTS clause, omit the “.gz” extension. For example, if your stage contains “MyData.txt.gz”, but you want your UDF to read the file in uncompressed format, then specify “MyData.txt” in the IMPORTS clause. If there is not already an uncompressed file named “MyData.txt”, then Snowflake searches for “MyData.txt.gz” and automatically writes a decompressed copy to “MyData.txt” in the UDF home directory. Your UDF can then open and read the uncompressed file “MyData.txt”.

Note that smart decompression applies only to the copy in the UDF home directory; the original file in the stage is not changed.

Follow these best practices for handling compressed files:

  • Follow proper file naming conventions. If a file is in GZIP-compressed format, then include the extension “.gz” at the end of the file name. If a file is not in GZIP-compressed format, then do not end the file name with the “.gz” extension.

  • Avoid creating files whose names differ only by the extension “.gz”. For example, do not create both “MyData.txt” and “MyData.txt.gz” in the same stage and directory, and do not try to import both “MyData.txt” and “MyData.txt.gz” in the same CREATE FUNCTION command.

  • Do not compress files twice. For example, if you compress a file manually, and then you PUT that file without using AUTO_COMPRESS=FALSE, the file will be compressed a second time. Smart decompression will decompress it only once, so the data (or JAR) file will still be compressed when it is stored in the UDF home directory.

  • In the future, Snowflake might extend smart decompression to compression algorithms other than GZIP. To prevent compatibility issues in the future, apply these best practices to files that use any type of compression.

Note

JAR files can also be stored in compressed or uncompressed format in a stage. Snowflake automatically decompresses all compressed JAR files before making them available to the Java UDF.

Reading a File Using the SnowflakeFile Class

Using methods of the SnowflakeFile class, you can read files from a stage with your Java handler code. The SnowflakeFile class is included on the classpath available to Java UDF handlers on Snowflake.

To develop your UDF code locally, add the Snowpark JAR containing SnowflakeFile to your code’s class path. For information about snowpark.jar, see Setting Up Your Development Environment for Snowpark Java. Note that Snowpark client applications cannot use this class; therefore, the class is not documented in the Snowpark topics.

When you use SnowflakeFile, it isn’t necessary to also specify either the staged file or the JAR containing SnowflakeFile with an IMPORTS clause when you create the UDF, as in SQL with a CREATE FUNCTION statement.

The SnowflakeFile class has the following methods:

Method

Description

public static native SnowflakeFile newInstance(String url)

Returns a SnowflakeFile object for the file at the location specified by the url argument.

public synchronized InputStream getInputStream()

Returns a InputStream object for reading the contents of the file.

public synchronized Long getSize()

Returns the size of the file.

Code in the following example uses SnowflakeFile to read a file from a specified stage location. Using an InputStream from the getInputStream method, it reads the file’s contents into a String variable.

create or replace function sum_total_sales(file string)
returns INTEGER
language java
handler='SalesSum.sumTotalSales'
target_path='@jar_stage/sales_functions2.jar'
as
$$
import java.io.InputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import com.snowflake.snowpark_java.types.SnowflakeFile;

public class SalesSum {

  public static int sumTotalSales(String filePath) throws IOException {
    int total = -1;

    // Use a SnowflakeFile instance to read sales data from a stage.
    SnowflakeFile file = SnowflakeFile.newInstance(filePath);
    InputStream stream = file.getInputStream();
    String contents = new String(stream.readAllBytes(), StandardCharsets.UTF_8);

    // Omitted for brevity: code to retrieve sales data from JSON and assign it to the total variable.

    return total;
  }
}
$$;

Call the UDF, specifying a file on a stage to process.

select sum_total_sales('@sales_data_stage/car_sales.json');

Reading a File Using the InputStream Class

You can read file contents directly into a java.io.InputStream by making your handler function’s argument an InputStream variable. This can be useful when the function’s caller will want to pass a file path as an argument.

Code in the following example has a handler function sumTotalSales that takes an InputStream and returns an int. At run time, Snowflake automatically assigns the contents of the file at the file variable’s path to the stream argument variable.

create or replace function sum_total_sales(file string)
returns INTEGER
language java
handler='SalesSum.sumTotalSales'
target_path='@jar_stage/sales_functions2.jar'
as
$$
import java.io.InputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;

public class SalesSum {

  public static int sumTotalSales(InputStream stream) throws IOException {
    int total = -1;
    String contents = new String(stream.readAllBytes(), StandardCharsets.UTF_8);

    // Omitted for brevity: code to retrieve sales data from JSON and assign it to the total variable.

    return total;
  }
}
$$;

Call the UDF, specifying a file on a stage to process.

select sum_total_sales('@sales_data_stage/car_sales.json');

Creating and Calling a Simple Pre-Compiled Java UDF

The following statements create a simple Java UDF. This sample generally follows the file and directory structure described in Organizing Your Files.

Create and Compile the Java Handler Code

  1. In the root directory of your project (here, my_udf), create a src subdirectory to hold the source .java files and a classes subdirectory to hold the generated .class files.

    You should have a directory hierarchy similar to the following:

    my_udf/
    |-- classes/
    |-- src/
    
  2. In the src directory, create a directory called mypackage to hold .java files whose classes are in the mypackage package.

  3. In the mypackage directory, create a MyUDFHandler.java file that contains your source code.

    package mypackage;
    
    public class MyUDFHandler {
    
      public static int decrementValue(int i)
      {
          return i - 1;
      }
    
      public static void main(String[] argv)
      {
          System.out.println("This main() function won't be called.");
      }
    }
    
  4. From your project root directory (here, my_udf), use the javac command to compile the source code.

    The javac command in the following example compiles MyUDFHandler.java to generate a MyUDFHandler.class file in the classes directory.

    javac -d classes src/mypackage/MyUDFHandler.java
    

    This example includes the following arguments:

    • -d classes – Directory into which generated class files should be written.

    • src/mypackage/MyUDFHandler.java – Path to the .java file in the form: source_directory/package_directory/Java_file_name.

Package the Compiled Code Into a JAR file

  1. Optionally, in the project root directory create a manifest file named my_udf.manifest that contains the following attributes:

    Manifest-Version: 1.0
    Main-Class: mypackage.MyUDFHandler
    
  2. From your project root directory, run the jar command to create a JAR file containing the .class file and manifest.

    The jar command in the following example puts the generated MyUDFHandler.class file in a mypackage package folder into a .jar file called my_udf.jar. The -C ./classes flag specifies the location of the .class files.

    jar cmf my_udf.manifest my_udf.jar -C ./classes mypackage/MyUDFHandler.class
    

    This example includes the following arguments:

    • cmf – Command arguments: c to create a JAR file, m to use the specified .manifest file, and f to give the JAR file the specified name.

    • my_udf.manifest – Manifest file.

    • my_udf.jar – Name of the JAR file to create.

    • -C ./classes – Directory containing the generated .class files.

    • mypackage/MyUDFHandler.class – Package and name of .class file to include in the JAR.

Upload the JAR File with the Compiled Handler to a Stage

  1. In Snowflake, create a stage called jar_stage to store the JAR file containing your UDF handler.

    For more information on creating a stage, see CREATE STAGE.

  2. Use the PUT command to copy the JAR file from the local file system to a stage.

put
    file:///Users/Me/my_udf/my_udf.jar
    @jar_stage
    auto_compress = false
    overwrite = true
    ;

You can store the PUT command in a script file and then execute that file through SnowSQL.

The snowsql command looks similar to the following:

snowsql -a <account_identifier> -w <warehouse> -d <database> -s <schema> -u <user> -f put_command.sql

This example assumes that the user’s password is specified in the SNOWSQL_PWD environment variable.

Create the UDF with the Compiled Code as Handler

Create the UDF:

create function decrement_value(i numeric(9, 0))
  returns numeric
  language java
  imports = ('@jar_stage/my_udf.jar')
  handler = 'mypackage.MyUDFHandler.decrementValue'
  ;

Call the UDF:

select decrement_value(-15);
+----------------------+
| DECREMENT_VALUE(-15) |
|----------------------|
|                  -16 |
+----------------------+

Considerations

  • If a query calls a UDF to access staged files, the operation fails with a user error if the SQL statement also queries a view that calls any UDF or UDTF, regardless if the function in the view accesses staged files or not.

  • UDTFs can process multiple files in parallel; however, UDFs currently process files serially. As a workaround, group rows in a subquery using the GROUP BY clause. See Java UDTF Examples for an example.

  • Currently, if the staged files referenced in a query are modified or deleted while the query is running, the function call fails with an error.

Back to top