Common Operations in Java UDFs

This topic shows how Java UDFs can perform common operations, such as reading from a file (e.g. a configuration file).

In this Topic:

Reading a File from Inside a Java UDF

Sample Code for Reading a File

A Java UDF can read files (e.g. configuration files) that have been stored in a stage. The file name and stage name must have been specified in the IMPORTS clause of the CREATE FUNCTION command.

When a file is specified in the IMPORTS clause of a UDF, Snowflake copies that file from the stage to the UDF’s home directory (also called the import directory), which is the directory from which the UDF actually reads the file.

Because imported files are copied to a single directory and must have unique names within that directory, each file in the IMPORTS clause must have a distinct name, even if the files start out in different stages or different subdirectories within a stage.

The following example creates and calls a Java UDF that reads a file.

The Java source code below creates a Java method named read_file(). This UDF uses this method.

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.stream.Stream;

class TestReadRelativeFile {
    public static String read_file(String fileName) throws IOException {
        StringBuilder contentBuilder = new StringBuilder();
        String importDirectory = System.getProperty("com.snowflake.import_directory");
        String fPath = importDirectory + fileName;
        Stream<String> stream = Files.lines(Paths.get(fPath), StandardCharsets.UTF_8);
        stream.forEach(s -> contentBuilder.append(s).append("\n"));
        return contentBuilder.toString();
    }
}

The following SQL code creates the UDF. This code assumes that the Java source code has been compiled and put into a JAR file named TestReadRelativeFile.jar, which the UDF imports. The second and third imported files, my_config_file_1.txt and my_config_file_2.txt, are configuration files that the UDF can read.

CREATE FUNCTION file_reader(file_name VARCHAR)
RETURNS VARCHAR
LANGUAGE JAVA
IMPORTS = ('@my_stage/my_package/TestReadRelativeFile.jar',
           '@my_stage/my_path/my_config_file_1.txt',
           '@my_stage/my_path/my_config_file_2.txt')
HANDLER = 'my_package.TestReadRelativeFile.read_file';

This code calls the UDF:

SELECT file_reader('my_config_file_1.txt') ...;
...
SELECT file_reader('my_config_file_2.txt') ...;

Choosing Whether to Access a File in Compressed or Uncompressed Format

Files in a stage can be stored in compressed or uncompressed format. Users can compress the file before copying it to the stage, or can tell the PUT command to compress the file.

When Snowflake copies a file compressed in GZIP format from a stage to the UDF home directory, Snowflake can write the copy as-is, or Snowflake can decompress the content before writing the file.

If the file in the stage is compressed, and if you would like the copy in the UDF home directory to also be compressed, then when you specify the file name in the IMPORTS clause, simply use the original file name (e.g. “MyData.txt.gz”) in the IMPORTS clause. For example:

... imports = ('@MyStage/MyData.txt.gz', ...)

If the file in the stage is GZIP-compressed, but you would like the copy in the UDF home directory to be uncompressed, then when you specify the file name in the IMPORTS clause, omit the “.gz” extension. For example, if your stage contains “MyData.txt.gz”, but you want your UDF to read the file in uncompressed format, then specify “MyData.txt” in the IMPORTS clause. If there is not already an uncompressed file named “MyData.txt”, then Snowflake searches for “MyData.txt.gz” and automatically writes a decompressed copy to “MyData.txt” in the UDF home directory. Your UDF can then open and read the uncompressed file “MyData.txt”.

Note that smart decompression applies only to the copy in the UDF home directory; the original file in the stage is not changed.

Follow these best practices for handling compressed files:

  • Follow proper file naming conventions. If a file is in GZIP-compressed format, then include the extension “.gz” at the end of the file name. If a file is not in GZIP-compressed format, then do not end the file name with the “.gz” extension.

  • Avoid creating files whose names differ only by the extension “.gz”. For example, do not create both “MyData.txt” and “MyData.txt.gz” in the same stage and directory, and do not try to import both “MyData.txt” and “MyData.txt.gz” in the same CREATE FUNCTION command.

  • Do not compress files twice. For example, if you compress a file manually, and then you PUT that file without using AUTO_COMPRESS=FALSE, the file will be compressed a second time. Smart decompression will decompress it only once, so the data (or JAR) file will still be compressed when it is stored in the UDF home directory.

  • In the future, Snowflake might extend smart decompression to compression algorithms other than GZIP. To prevent compatibility issues in the future, apply these best practices to files that use any type of compression.

Note

JAR files can also be stored in compressed or uncompressed format in a stage. Snowflake automatically decompresses all compressed JAR files before making them available to the Java UDF.