Java UDF handler examples¶
This topic includes simple examples of UDF handler code written in Java.
For more on using Java to create a UDF handler, see Creating a Java UDF handler.
Creating and calling a simple in-line Java UDF¶
The following statements create and call an in-line Java UDF. This code returns the VARCHAR passed to it.
This function is declared with the optional CALLED ON NULL INPUT clause to indicate that the function is
called even if the value of the input is NULL. (This function would return NULL with or without this clause, but
you could modify the code to handle NULL another way, for example, to return an empty string.)
Create the UDF:
Call the UDF:
Passing a NULL to an in-line Java UDF¶
This uses the echo_varchar() UDF defined above. The SQL NULL value is implicitly converted to
Java null, and that Java null is returned and implicitly converted back to SQL NULL:
Call the UDF:
Passing array values¶
Java methods can receive SQL arrays in either of two ways:
Using Java’s array feature.
Using Java’s varargs (variable number of arguments) feature.
In both cases, your SQL code must pass an ARRAY.
Note
Be sure to use Java types with valid mappings to SQL types. For more information, refer to SQL-Java Data Type Mappings.
Passing via an ARRAY¶
Declare the Java parameter as an array. For example, the third parameter in the following method is a String array:
Below is a complete example:
Create and load the table:
Create the UDF:
Call the UDF:
Passing via varargs¶
Using varargs is very similar to using an array.
In your Java code, use Java’s varargs declaration style:
Below is a complete example. The only significant difference between this example and the preceding example (for arrays) is the declaration of the parameters to the method.
Create and load the table:
Create the UDF:
Call the UDF:
Returning NULL explicitly from an in-line UDF¶
The following code shows how to return a NULL value explicitly. The Java value null is converted to
SQL NULL.
Create the UDF:
Call the UDF:
Passing an OBJECT to an in-line Java UDF¶
The following example uses the SQL OBJECT data type and the corresponding Java
data type (Map<String, String>), and extracts a value from the OBJECT. This example also shows that you
can pass multiple parameters to a Java UDF.
Create and load a table that contains a column of type OBJECT:
Create the UDF:
Call the UDF:
Passing a GEOGRAPHY value to an in-line Java UDF¶
The following example uses the SQL GEOGRAPHY data type.
Create the UDF:
You can use the PACKAGES clause to specify a Snowflake system package such as the Snowpark package. When you do, you don’t need to also include the Snowpark JAR file as a value of an IMPORTS clause. For more on PACKAGES, see CREATE FUNCTION optional parameters.
Create data and call the UDF with that data:
The output looks similar to:
Passing a VARIANT value to an in-line Java UDF¶
When you pass a value of the SQL VARIANT type to a Java UDF, Snowflake can convert the value to the
Variant type
provided with the Snowpark package. Note that
Variant is supported from the Snowpark package version 1.4.0 and later.
The Snowpark Variant type provides methods for converting values between Variant and other types.
To use the Snowpark Variant type, use the PACKAGES clause to specify the Snowpark package when creating the UDF. When you do, you
don’t need to also include the Snowpark JAR file as a value of an IMPORTS clause. For more information on PACKAGES, see
CREATE FUNCTION optional parameters.
Code in the following example receives JSON data stored as the VARIANT type, then uses the Variant type in the Snowpark library
to retrieve the price value from the JSON. The received JSON has a structure similar to the JSON displayed in
Sample Data Used in Examples.
Reading a file with a Java UDF¶
You can read the contents of a file with handler code. For example, you might want to read a file to process unstructured data with the handler. For more information on processing unstructured data, along with example code, refer to Process unstructured data with UDF and procedure handlers.
The file must be on a Snowflake stage that’s available to your handler.
To read the contents of staged files, your handler can:
Read a file whose file path is statically-specified in the IMPORTS clause. At run time, your code reads the file from the UDF’s home directory.
This can be useful when you want to access the file during initialization.
Read a dynamically-specified file by calling methods of either the
SnowflakeFileclass or theInputStreamclass.You might do this if you need to access a file specified by the caller. For more information, see the following in this topic:
SnowflakeFileprovides features not available withInputStream, as described in the following table.Class
Input
Notes
SnowflakeFileURL formats:
Scoped URL to reduce the risk of file injection attacks when the function’s caller is not also its owner.
File URL or string path for files that the UDF owner has access to.
The file must be located in a named internal stage or an external stage.
Easily access additional file attributes, such as file size.
InputStreamURL formats:
Scoped URL to reduce the risk of file injection attacks when the function’s caller is not also its owner.
The file must be located in an internal or external stage.
Prerequisites¶
Before your Java handler code can read a file on a stage, you must do the following to make the file available to the code:
Create a stage that’s available to your handler.
You can use an external stage or internal stage. If you use an internal stage, it must be a user or named stage; Snowflake does not currently support using a table stage for UDF dependencies. For more on creating a stage, see CREATE STAGE. For more on choosing an internal stage type, see Choosing an internal stage for local files.
Keep in mind that adequate privileges on the stage must be assigned to roles performing SQL actions that read from the stage. For more information, see Granting privileges for user-defined functions.
To the stage, copy the file that will be read by code.
You can copy the file from a local drive to a stage by using the PUT command. For command reference, see PUT. For information on staging files with PUT, see Staging data files from a local file system.
Reading a file specified statically in IMPORTS¶
Your handler can read a file whose stage path has been specified in the IMPORTS clause of the CREATE FUNCTION command.
When you specify a file in the IMPORTS clause, Snowflake copies that file from the stage to the UDF’s home directory (also called the import directory), which is the directory from which the UDF actually reads the file.
Because imported files are copied to a single directory and must have unique names within that directory, each file in the IMPORTS clause must have a distinct name, even if the files start out in different stages or different subdirectories within a stage.
The following example creates and calls a Java UDF that reads a file.
The Java source code below creates a Java method named readFile. This UDF uses this method.
The following SQL code creates the UDF. This code assumes that the Java source code has been compiled and put into a JAR
file named TestReadRelativeFile.jar, which the UDF imports. The second and third imported files,
my_config_file_1.txt and my_config_file_2.txt, are configuration files that the UDF can read.
This code calls the UDF:
Choosing whether to access a file in compressed or uncompressed format¶
Files in a stage can be stored in compressed or uncompressed format. Users can compress the file before copying it to the stage, or can tell the PUT command to compress the file.
When Snowflake copies a file compressed in GZIP format from a stage to the UDF home directory, Snowflake can write the copy as-is, or Snowflake can decompress the content before writing the file.
If the file in the stage is compressed, and if you would like the copy in the UDF home directory to also be compressed, then when you specify the file name in the IMPORTS clause, simply use the original file name (e.g. “MyData.txt.gz”) in the IMPORTS clause. For example:
If the file in the stage is GZIP-compressed, but you would like the copy in the UDF home directory to be uncompressed, then when you specify the file name in the IMPORTS clause, omit the “.gz” extension. For example, if your stage contains “MyData.txt.gz”, but you want your UDF to read the file in uncompressed format, then specify “MyData.txt” in the IMPORTS clause. If there is not already an uncompressed file named “MyData.txt”, then Snowflake searches for “MyData.txt.gz” and automatically writes a decompressed copy to “MyData.txt” in the UDF home directory. Your UDF can then open and read the uncompressed file “MyData.txt”.
Note that smart decompression applies only to the copy in the UDF home directory; the original file in the stage is not changed.
Follow these best practices for handling compressed files:
Follow proper file naming conventions. If a file is in GZIP-compressed format, then include the extension “.gz” at the end of the file name. If a file is not in GZIP-compressed format, then do not end the file name with the “.gz” extension.
Avoid creating files whose names differ only by the extension “.gz”. For example, do not create both “MyData.txt” and “MyData.txt.gz” in the same stage and directory, and do not try to import both “MyData.txt” and “MyData.txt.gz” in the same CREATE FUNCTION command.
Do not compress files twice. For example, if you compress a file manually, and then you PUT that file without using AUTO_COMPRESS=FALSE, the file will be compressed a second time. Smart decompression will decompress it only once, so the data (or JAR) file will still be compressed when it is stored in the UDF home directory.
In the future, Snowflake might extend smart decompression to compression algorithms other than GZIP. To prevent compatibility issues in the future, apply these best practices to files that use any type of compression.
Note
JAR files can also be stored in compressed or uncompressed format in a stage. Snowflake automatically decompresses all compressed JAR files before making them available to the Java UDF.
Reading a dynamically-specified file with SnowflakeFile¶
Using methods of the SnowflakeFile class, you can read files from a stage with your Java handler code. The SnowflakeFile
class is included on the classpath available to Java UDF handlers on Snowflake.
Note
To make your code resilient to file injection attacks, always use a scoped URL when passing a file’s location to a UDF, particularly when the function’s caller is not also its owner. You can create a scoped URL in SQL using the built-in function BUILD_SCOPED_FILE_URL. For more information about what the BUILD_SCOPED_FILE_URL does, see Introduction to unstructured data.
To develop your UDF code locally, add the Snowpark JAR containing SnowflakeFile to your code’s class path. For information about
snowpark.jar, see Setting Up Your Development Environment for Snowpark Java. Note that Snowpark client applications cannot use this class.
When you use SnowflakeFile, it isn’t necessary to also specify either the staged file or the JAR containing
SnowflakeFile with an IMPORTS clause when you create the UDF, as in SQL with a CREATE FUNCTION statement.
Code in the following example uses SnowflakeFile to read a file from a specified stage location. Using an
InputStream from the getInputStream method, it reads the file’s contents into a String variable.
Call the UDF, passing the location of the file in a scoped URL to reduce the likelihood of file injection attacks.
Note
The UDF owner must have access to any files whose locations are not scoped URLs. You can read these staged files by having the handler
code call the SnowflakeFile.newInstance method with a boolean value for a new requireScopedUrl parameter.
The following example uses SnowflakeFile.newInstance while specifying that a scoped URL is not required.
Reading a dynamically-specified file with InputStream¶
You can read file contents directly into a java.io.InputStream by making your handler function’s argument an InputStream
variable. This can be useful when the function’s caller will want to pass a file path as an argument.
Note
To make your code resilient to file injection attacks, always use a scoped URL when passing a file’s location to a UDF, particularly when the function’s caller is not also its owner. You can create a scoped URL in SQL using the built-in function BUILD_SCOPED_FILE_URL. For more information about what the BUILD_SCOPED_FILE_URL does, see Introduction to unstructured data.
Code in the following example has a handler function sumTotalSales that takes an InputStream and returns an int.
At run time, Snowflake automatically assigns the contents of the file at the file variable’s path to the stream
argument variable.
Call the UDF, passing the location of the file in a scoped URL to reduce the likelihood of file injection attacks.
Creating and calling a simple staged Java UDF¶
The following statements create a simple Java UDF. This sample generally follows the file and directory structure described in Organizing your files.
Create and compile the Java handler code¶
In the root directory of your project (here,
my_udf), create asrcsubdirectory to hold the source .java files and aclassessubdirectory to hold the generated .class files.You should have a directory hierarchy similar to the following:
In the
srcdirectory, create a directory calledmypackageto hold .java files whose classes are in themypackagepackage.In the
mypackagedirectory, create aMyUDFHandler.javafile that contains your source code.From your project root directory (here,
my_udf), use thejavaccommand to compile the source code.The
javaccommand in the following example compilesMyUDFHandler.javato generate aMyUDFHandler.classfile in theclassesdirectory.This example includes the following arguments:
-d classes– Directory into which generated class files should be written.src/mypackage/MyUDFHandler.java– Path to the .java file in the form:source_directory/package_directory/Java_file_name.
Package the compiled code into a JAR file¶
Optionally, in the project root directory create a manifest file named
my_udf.manifestthat contains the following attributes:From your project root directory, run the
jarcommand to create a JAR file containing the .class file and manifest.The
jarcommand in the following example puts the generatedMyUDFHandler.classfile in amypackagepackage folder into a .jar file calledmy_udf.jar. The-C ./classesflag specifies the location of the .class files.This example includes the following arguments:
cmf– Command arguments:cto create a JAR file,mto use the specified .manifest file, andfto give the JAR file the specified name.my_udf.manifest– Manifest file.my_udf.jar– Name of the JAR file to create.-C ./classes– Directory containing the generated .class files.mypackage/MyUDFHandler.class– Package and name of .class file to include in the JAR.
Upload the JAR file with the compiled handler to a stage¶
In Snowflake, create a stage called
jar_stageto store the JAR file containing your UDF handler.For more information on creating a stage, see CREATE STAGE.
Use the
PUTcommand to copy the JAR file from the local file system to a stage.You can store the
PUTcommand in a script file and then execute that file through SnowSQL.The
snowsqlcommand looks similar to the following:This example assumes that the user’s password is specified in the SNOWSQL_PWD environment variable.
Create the UDF with the compiled code as handler¶
Create the UDF:
Call the UDF: