Class UDTFRegistration
- java.lang.Object
-
- com.snowflake.snowpark_java.UDTFRegistration
-
public class UDTFRegistration extends Object
Provides methods to register a UDTF (user-defined table function) in the Snowflake database.Session.udtf()
returns an object of this class.To register an UDTF, you must:
- Define a UDTF class.
- Create an instance of that class, and register that instance as a UDTF.
The next sections describe these steps in more detail.
Defining the UDTF Class
Define a class that implements one of the JavaUDTF[N] interfaces (e.g.JavaUDTF0
,JavaUDTF1
, etc.), where "n" specifies the number of input arguments for your UDTF. For example, if your UDTF passes in 3 input arguments, implements theJavaUDTF3
interface.In your class, implement the following four methods:
- process() , which is called once for each row in the input partition.
- endPartition(), which is called once for each partition after all rows have been passed to process().
- outputSchema(), which returns a
StructType
object that describes the schema for the returned rows. - inputSchema(), which returns a
StructType
object that describes the types of the input parameters. If yourprocess()
method passes inMap
arguments, you must implement theinputSchema()
method. If the method does not pass inMap
arguments, implementinginputSchema()
is optional.
When a UDTF is called, the rows are grouped into partitions before they are passed to the UDTF:
- If the statement that calls the UDTF specifies the PARTITION clause (explicit partitions), that clause determines how the rows are partitioned.
- If the statement does not specify the PARTITION clause (implicit partitions), Snowflake determines how best to partition the rows.
For an explanation of partitions, see Table Functions and Partitions
Defining the process() Method
This method is invoked once for each row in the input partition.The arguments passed to the registered UDTF are passed to process(). For each argument passed to the UDTF, you must have a corresponding argument in the signature of the process() method. Make sure that the type of the argument in the process() method matches the Snowflake data type of the corresponding argument in the UDTF.
Snowflake supports the following data types for the parameters for a UDTF:
Supported Snowflake Data Types and Corresponding Java Types SQL Type Java Type Notes NUMBER java.lang.Short Supported NUMBER java.lang.Integer Supported NUMBER java.lang.Long Supported FLOAT java.lang.Float Supported DOUBLE java.lang.Double Supported NUMBER java.math.BigDecimal Supported VARCHAR String Supported BOOL java.lang.Boolean Supported DATE java.sql.Date Supported TIMESTAMP java.sql.Timestamp Supported BINARY byte[] Supported ARRAY String[] or Variant[] Supported String[] or Variant[] OBJECT Map<String, String> or Map<String, Variant> Supported Map<String, String> or Map<String, Variant> VARIANT Variant
Supported Defining the endPartition() Method
This method is invoked once for each partition, after all rows in that partition have been passed to the process() method.You can use this method to generate output rows, based on any state information that you aggregate in the process() method.
Defining the outputSchema() Method
In this method, define the output schema for the rows returned by the process() and endPartition() methods.Construct and return a
StructType
object that uses an Array ofStructField
objects to specify the Snowflake data type of each field in a returned row.Snowflake supports the following
DataTypes
for the input types:Supported Snowflake Data Types and Corresponding Java Types Snowpark DataTypes
Java Type Notes ShortType java.lang.Short Supported IntegerType java.lang.Integer Supported LongType java.lang.Long Supported FloatType java.lang.Float Supported DoubleType java.lang.Double Supported DecimalType java.math.BigDecimal Supported StringType String Supported BooleanType java.lang.Boolean Supported DateType java.sql.Date Supported TimestampType java.sql.Timestamp Supported BinaryType byte[] Supported ArrayType(StringType) or ArrayType(Variant) String[] or Variant[] Supported String[] or Variant[] MapType(StringTyoe, StringType) or MapType(StringTyoe, VariantType) Map<String, String> or Map<String, Variant> Supported Map<String, String> or Map<String, Variant> VariantType Variant
Supported Defining the inputSchema() Method
In this method, define the input schema for the input arguments for process(). Snowpark can infer the input schema if Map is not used as process().Construct and return a
StructType
object that uses an Array ofStructField
objects to specify the data type of each parameter for process().Snowflake supports the following
DataTypes
for input typesSupported Java Types and Corresponding Snowflake Data Types Java Type DataTypes
Notes java.lang.Boolean BooleanType Supported java.lang.Short ShortType Supported java.lang.Integer IntegerType Supported java.lang.Long LongType Supported java.lang.Float FloatType Supported java.lang.Double DoubleType Supported java.math.BigDecimal DecimalType Supported String StringType Supported java.sql.Date DateType Supported java.sql.Timestamp TimestampType Supported byte[] BinaryType Supported String[] ArrayType(StringType) Supported Variant[] ArrayType(Variant) Supported Map<String, String> MapType(StringType, StringType) Supported Map<String, Variant> MapType(StringType, VariantType) Supported com.snowflake.snowpark_java.types.Variant VariantType Supported Example of a UDTF Class
The following is an example of a UDTF class that generates a range of rows.The UDTF passes in 2 arguments, so the class implements JavaUDTF2.
The arguments
start
andcount
specify the starting number for the row and the number of rows to generate.import java.util.stream.Stream; import com.snowflake.snowpark_java.types.*; import com.snowflake.snowpark_java.udtf.*; class MyRangeUdtf implements JavaUDTF2<Integer, Integer> { public StructType outputSchema() { return StructType.create(new StructField("C1", DataTypes.IntegerType)); } // Because the process() method in this example does not pass in Map arguments, // implementing the inputSchema() method is optional. public StructType inputSchema() { return StructType.create( new StructField("start_value", DataTypes.IntegerType), new StructField("value_count", DataTypes.IntegerType)); } public Stream<Row> endPartition() { return Stream.empty(); } public Stream<Row> process(Integer start, Integer count) { Stream.Builder<Row> builder = Stream.builder(); for (int i = start; i < start + count ; i++) { builder.add(Row.create(i)); } return builder.build(); } }
Registering and Calling the UDTF
Next, create an instance of the new class, and register the class by calling one of the UDTFRegistration methods. You can register a temporary or permanent UDTF by name. If you don't need to call the UDTF by name, you can register an anonymous UDTF.Registering a Temporary UDTF By Name
To register a temporary UDTF by name, call registerTemporary, passing in a name for the UDTF and an instance of the UDTF class. For example:// Use the MyRangeUdtf defined in previous example. TableFunction tableFunction = session.udtf().registerTemporary("myUdtf", new MyRangeUdtf()); session.tableFunction(tableFunction, Functions.lit(10), Functions.lit(5)).show();
Registering a Permanent UDTF By Name
If you need to use the UDTF in subsequent sessions, register a permanent UDTF by callingregisterPermanent()
.When registering a permanent UDTF, you must specify a stage where the registration method will upload the JAR files for the UDTF and its dependencies. For example:
// Use the MyRangeUdtf defined in previous example. TableFunction tableFunction = session.udtf().registerPermanent("myUdtf", new MyRangeUdtf(), "@myStage"); session.tableFunction(tableFunction, Functions.lit(10), Functions.lit(5)).show();
Registering an Anonymous Temporary UDTF
If you do not need to refer to a UDTF by name, useregisterTemporary(JavaUDTF udtf)
to create an anonymous UDTF instead. For example:// Use the MyRangeUdtf defined in previous example. TableFunction tableFunction = session.udtf().registerTemporary(new MyRangeUdtf()); session.tableFunction(tableFunction, Functions.lit(10), Functions.lit(5)).show();
- Since:
- 1.4.0
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description TableFunction
registerPermanent(String funcName, JavaUDTF udtf, String stageLocation)
Registers an UDTF instance as a Snowflake UDTF.TableFunction
registerTemporary(JavaUDTF udtf)
Registers an UDTF instance as a temporary anonymous UDTF that is scoped to this session.TableFunction
registerTemporary(String funcName, JavaUDTF udtf)
Registers an UDTF instance as a temporary Snowflake UDTF that you plan to use in the session.
-
-
-
Method Detail
-
registerTemporary
public TableFunction registerTemporary(JavaUDTF udtf)
Registers an UDTF instance as a temporary anonymous UDTF that is scoped to this session.- Parameters:
udtf
- The UDTF instance to be registered- Returns:
- A TableFunction that represents the corresponding FUNCTION created in Snowflake
- Since:
- 1.4.0
-
registerTemporary
public TableFunction registerTemporary(String funcName, JavaUDTF udtf)
Registers an UDTF instance as a temporary Snowflake UDTF that you plan to use in the session.- Parameters:
funcName
- The name that you want to use to refer to the UDTF.udtf
- The UDTF instance to be registered- Returns:
- A TableFunction that represents the corresponding FUNCTION created in Snowflake
- Since:
- 1.4.0
-
registerPermanent
public TableFunction registerPermanent(String funcName, JavaUDTF udtf, String stageLocation)
Registers an UDTF instance as a Snowflake UDTF.The function uploads the JAR files that the UDTF depends upon to the specified stage. Each JAR file is uploaded to a subdirectory named after the MD5 checksum for the file.
If you register multiple UDTFs and specify the same stage location, any dependent JAR files used by those functions will only be uploaded once. The JAR file for the UDTF code itself will be uploaded to a subdirectory named after the UDTF.
- Parameters:
funcName
- The name that you want to use to refer to the UDTF.udtf
- The UDTF instance to be registered.stageLocation
- Stage location where the JAR files for the UDTF and its dependencies should be uploaded- Returns:
- A TableFunction that represents the corresponding FUNCTION created in Snowflake
- Since:
- 1.4.0
-
-