Class UDTFRegistration


  • public class UDTFRegistration
    extends Object
    Provides methods to register a UDTF (user-defined table function) in the Snowflake database. Session.udtf() returns an object of this class.

    To register an UDTF, you must:

    1. Define a UDTF class.
    2. Create an instance of that class, and register that instance as a UDTF.

    The next sections describe these steps in more detail.

    Defining the UDTF Class

    Define a class that implements one of the JavaUDTF[N] interfaces (e.g. JavaUDTF0, JavaUDTF1, etc.), where "n" specifies the number of input arguments for your UDTF. For example, if your UDTF passes in 3 input arguments, implements the JavaUDTF3 interface.

    In your class, implement the following four methods:

    • process() , which is called once for each row in the input partition.
    • endPartition(), which is called once for each partition after all rows have been passed to process().
    • outputSchema(), which returns a StructType object that describes the schema for the returned rows.
    • inputSchema(), which returns a StructType object that describes the types of the input parameters. If your process() method passes in Map arguments, you must implement the inputSchema() method. If the method does not pass in Map arguments, implementing inputSchema() is optional.

    When a UDTF is called, the rows are grouped into partitions before they are passed to the UDTF:

    • If the statement that calls the UDTF specifies the PARTITION clause (explicit partitions), that clause determines how the rows are partitioned.
    • If the statement does not specify the PARTITION clause (implicit partitions), Snowflake determines how best to partition the rows.

    For an explanation of partitions, see Table Functions and Partitions

    Defining the process() Method

    This method is invoked once for each row in the input partition.

    The arguments passed to the registered UDTF are passed to process(). For each argument passed to the UDTF, you must have a corresponding argument in the signature of the process() method. Make sure that the type of the argument in the process() method matches the Snowflake data type of the corresponding argument in the UDTF.

    Snowflake supports the following data types for the parameters for a UDTF:

    Supported Snowflake Data Types and Corresponding Java Types
    SQL Type Java Type Notes
    NUMBER java.lang.Short Supported
    NUMBER java.lang.Integer Supported
    NUMBER java.lang.Long Supported
    FLOAT java.lang.Float Supported
    DOUBLE java.lang.Double Supported
    NUMBER java.math.BigDecimal Supported
    VARCHAR String Supported
    BOOL java.lang.Boolean Supported
    DATE java.sql.Date Supported
    TIMESTAMP java.sql.Timestamp Supported
    BINARY byte[] Supported
    ARRAY String[] or Variant[] Supported String[] or Variant[]
    OBJECT Map<String, String> or Map<String, Variant> Supported Map<String, String> or Map<String, Variant>
    VARIANT Variant Supported

    Defining the endPartition() Method

    This method is invoked once for each partition, after all rows in that partition have been passed to the process() method.

    You can use this method to generate output rows, based on any state information that you aggregate in the process() method.

    Defining the outputSchema() Method

    In this method, define the output schema for the rows returned by the process() and endPartition() methods.

    Construct and return a StructType object that uses an Array of StructField objects to specify the Snowflake data type of each field in a returned row.

    Snowflake supports the following DataTypes for the input types:

    Supported Snowflake Data Types and Corresponding Java Types
    Snowpark DataTypes Java Type Notes
    ShortType java.lang.Short Supported
    IntegerType java.lang.Integer Supported
    LongType java.lang.Long Supported
    FloatType java.lang.Float Supported
    DoubleType java.lang.Double Supported
    DecimalType java.math.BigDecimal Supported
    StringType String Supported
    BooleanType java.lang.Boolean Supported
    DateType java.sql.Date Supported
    TimestampType java.sql.Timestamp Supported
    BinaryType byte[] Supported
    ArrayType(StringType) or ArrayType(Variant) String[] or Variant[] Supported String[] or Variant[]
    MapType(StringTyoe, StringType) or MapType(StringTyoe, VariantType) Map<String, String> or Map<String, Variant> Supported Map<String, String> or Map<String, Variant>
    VariantType Variant Supported

    Defining the inputSchema() Method

    In this method, define the input schema for the input arguments for process(). Snowpark can infer the input schema if Map is not used as process().

    Construct and return a StructType object that uses an Array of StructField objects to specify the data type of each parameter for process().

    Snowflake supports the following DataTypes for input types

    Supported Java Types and Corresponding Snowflake Data Types
    Java Type DataTypes Notes
    java.lang.Boolean BooleanType Supported
    java.lang.Short ShortType Supported
    java.lang.Integer IntegerType Supported
    java.lang.Long LongType Supported
    java.lang.Float FloatType Supported
    java.lang.Double DoubleType Supported
    java.math.BigDecimal DecimalType Supported
    String StringType Supported
    java.sql.Date DateType Supported
    java.sql.Timestamp TimestampType Supported
    byte[] BinaryType Supported
    String[] ArrayType(StringType) Supported
    Variant[] ArrayType(Variant) Supported
    Map<String, String> MapType(StringType, StringType) Supported
    Map<String, Variant> MapType(StringType, VariantType) Supported
    com.snowflake.snowpark_java.types.Variant VariantType Supported

    Example of a UDTF Class

    The following is an example of a UDTF class that generates a range of rows.

    The UDTF passes in 2 arguments, so the class implements JavaUDTF2.

    The arguments start and count specify the starting number for the row and the number of rows to generate.

    
     import java.util.stream.Stream;
     import com.snowflake.snowpark_java.types.*;
     import com.snowflake.snowpark_java.udtf.*;
    
     class MyRangeUdtf implements JavaUDTF2<Integer, Integer> {
       public StructType outputSchema() {
         return StructType.create(new StructField("C1", DataTypes.IntegerType));
       }
    
       // Because the process() method in this example does not pass in Map arguments,
       // implementing the inputSchema() method is optional.
       public StructType inputSchema() {
         return StructType.create(
                 new StructField("start_value", DataTypes.IntegerType),
                 new StructField("value_count", DataTypes.IntegerType));
       }
    
       public Stream<Row> endPartition() {
         return Stream.empty();
       }
    
       public Stream<Row> process(Integer start, Integer count) {
         Stream.Builder<Row> builder = Stream.builder();
         for (int i = start; i < start + count ; i++) {
           builder.add(Row.create(i));
         }
         return builder.build();
       }
     }
     

    Registering and Calling the UDTF

    Next, create an instance of the new class, and register the class by calling one of the UDTFRegistration methods. You can register a temporary or permanent UDTF by name. If you don't need to call the UDTF by name, you can register an anonymous UDTF.

    Registering a Temporary UDTF By Name

    To register a temporary UDTF by name, call registerTemporary, passing in a name for the UDTF and an instance of the UDTF class. For example:
    
     // Use the MyRangeUdtf defined in previous example.
     TableFunction tableFunction = session.udtf().registerTemporary("myUdtf", new MyRangeUdtf());
     session.tableFunction(tableFunction, Functions.lit(10), Functions.lit(5)).show();
     

    Registering a Permanent UDTF By Name

    If you need to use the UDTF in subsequent sessions, register a permanent UDTF by calling registerPermanent().

    When registering a permanent UDTF, you must specify a stage where the registration method will upload the JAR files for the UDTF and its dependencies. For example:

    
     // Use the MyRangeUdtf defined in previous example.
     TableFunction tableFunction = session.udtf().registerPermanent("myUdtf", new MyRangeUdtf(), "@myStage");
     session.tableFunction(tableFunction, Functions.lit(10), Functions.lit(5)).show();
     

    Registering an Anonymous Temporary UDTF

    If you do not need to refer to a UDTF by name, use registerTemporary(JavaUDTF udtf) to create an anonymous UDTF instead. For example:
    
     // Use the MyRangeUdtf defined in previous example.
     TableFunction tableFunction = session.udtf().registerTemporary(new MyRangeUdtf());
     session.tableFunction(tableFunction, Functions.lit(10), Functions.lit(5)).show();
     
    Since:
    1.4.0
    • Method Detail

      • registerTemporary

        public TableFunction registerTemporary​(JavaUDTF udtf)
        Registers an UDTF instance as a temporary anonymous UDTF that is scoped to this session.
        Parameters:
        udtf - The UDTF instance to be registered
        Returns:
        A TableFunction that represents the corresponding FUNCTION created in Snowflake
        Since:
        1.4.0
      • registerTemporary

        public TableFunction registerTemporary​(String funcName,
                                               JavaUDTF udtf)
        Registers an UDTF instance as a temporary Snowflake UDTF that you plan to use in the session.
        Parameters:
        funcName - The name that you want to use to refer to the UDTF.
        udtf - The UDTF instance to be registered
        Returns:
        A TableFunction that represents the corresponding FUNCTION created in Snowflake
        Since:
        1.4.0
      • registerPermanent

        public TableFunction registerPermanent​(String funcName,
                                               JavaUDTF udtf,
                                               String stageLocation)
        Registers an UDTF instance as a Snowflake UDTF.

        The function uploads the JAR files that the UDTF depends upon to the specified stage. Each JAR file is uploaded to a subdirectory named after the MD5 checksum for the file.

        If you register multiple UDTFs and specify the same stage location, any dependent JAR files used by those functions will only be uploaded once. The JAR file for the UDTF code itself will be uploaded to a subdirectory named after the UDTF.

        Parameters:
        funcName - The name that you want to use to refer to the UDTF.
        udtf - The UDTF instance to be registered.
        stageLocation - Stage location where the JAR files for the UDTF and its dependencies should be uploaded
        Returns:
        A TableFunction that represents the corresponding FUNCTION created in Snowflake
        Since:
        1.4.0