Class Column

  • Direct Known Subclasses:
    CaseExpr

    public class Column
    extends Object
    Represents a column or an expression in a DataFrame.
    Since:
    0.9.0
    • Method Detail

      • subField

        public Column subField​(String field)
        Retrieves the specified element (field) in a column that contains semi-structured data.

        The method applies case-sensitive matching to the names of the specified elements.

        This is equivalent to using bracket notation in SQL (`column['element']`).

        If the column is an OBJECT value, this function extracts the VARIANT value of the element with the specified name from the OBJECT value. If the element is not found, the method returns NULL. You must not specify an empty string for the element name.

        If the column is a VARIANT value, this function first checks if the VARIANT value contains an OBJECT value. If the VARIANT value does not contain an OBJECT value, the method returns NULL. Otherwise, the method works as described above.

        For example:

        
         df.select(df.col("src").subField("salesperson").subField("emails").subField(0))
         
        Parameters:
        field - The field name of the subfield to be extracted. You cannot specify a path.
        Returns:
        The sub-field of this column
        Since:
        0.9.0
      • subField

        public Column subField​(int index)
        Retrieves the element (field) at the specified index in a column that contains semi-structured data.

        The method applies case-sensitive matching to the names of the specified elements.

        This is equivalent to using bracket notation in SQL (`column['element']`).

        If the column is an ARRAY value, this function extracts the VARIANT value of the ARRAY element at the specified index. If the index points outside of the array boundaries or if an element does not exists at the specified index (e.g. if the array is sparsely populated), the method returns NULL.

        If the column is a VARIANT value, this functions first checks if the VARIANT value contains an ARRAY value. If the VARIANT value does not contain an ARRAY value, the method returns NULL. Otherwise, the method works as described above.

        
         df.select(df.col("src").subField(1).subField(0))
         
        Parameters:
        index - The index of the subfield to be extracted
        Returns:
        The sub-field of this column
        Since:
        0.9.0
      • getName

        public Optional<String> getName()
        Retrieves the column name (if the column has a name).
        Returns:
        An Optional String
        Since:
        0.9.0
      • as

        public Column as​(String alias)
        Retrieves s new renamed Column.
        Parameters:
        alias - The new column name
        Returns:
        A new Column object
        Since:
        0.9.0
      • alias

        public Column alias​(String alias)
        Retrieves s new renamed Column.
        Parameters:
        alias - The new column name
        Returns:
        A new Column object
        Since:
        0.9.0
      • unary_minus

        public Column unary_minus()
        Unary minus.
        Returns:
        The result column object
        Since:
        0.9.0
      • unary_not

        public Column unary_not()
        Unary not.
        Returns:
        The result column object
        Since:
        0.9.0
      • equal_to

        public Column equal_to​(Column other)
        Equal to.
        Parameters:
        other - The column being compared
        Returns:
        The result column object
        Since:
        0.9.0
      • not_equal

        public Column not_equal​(Column other)
        Not equal to.
        Parameters:
        other - The column being compared
        Returns:
        The result column object
        Since:
        0.9.0
      • gt

        public Column gt​(Column other)
        Greater than.
        Parameters:
        other - The column being compared
        Returns:
        The result column object
        Since:
        0.9.0
      • lt

        public Column lt​(Column other)
        Less than.
        Parameters:
        other - The column being compared
        Returns:
        The result column object
        Since:
        0.9.0
      • leq

        public Column leq​(Column other)
        Less than or equal to.
        Parameters:
        other - The column being compared
        Returns:
        The result column object
        Since:
        0.9.0
      • geq

        public Column geq​(Column other)
        Greater than or equal to.
        Parameters:
        other - The column being compared
        Returns:
        The result column object
        Since:
        0.9.0
      • equal_null

        public Column equal_null​(Column other)
        Equal to. This function can be used to compare against a null value.
        Parameters:
        other - The column being compared
        Returns:
        The result column object
        Since:
        0.9.0
      • equal_nan

        public Column equal_nan()
        Is NaN.
        Returns:
        The result column object
        Since:
        0.9.0
      • is_null

        public Column is_null()
        Is null.
        Returns:
        The result column object
        Since:
        0.9.0
      • is_not_null

        public Column is_not_null()
        Is not null.
        Returns:
        The result column object
        Since:
        0.9.0
      • or

        public Column or​(Column other)
        Or.
        Parameters:
        other - The other column
        Returns:
        The result column object
        Since:
        0.9.0
      • and

        public Column and​(Column other)
        And.
        Parameters:
        other - The other column
        Returns:
        The result column object
        Since:
        0.9.0
      • between

        public Column between​(Column lowerBound,
                              Column upperBound)
        Between lower bound (including) and upper bound (including).
        Parameters:
        lowerBound - the lower bound
        upperBound - the upper bound
        Returns:
        The result column object
        Since:
        0.9.0
      • plus

        public Column plus​(Column other)
        Plus
        Parameters:
        other - The column being added
        Returns:
        The result column object
        Since:
        0.9.0
      • minus

        public Column minus​(Column other)
        Minus
        Parameters:
        other - The column being subtracted
        Returns:
        The result column object
        Since:
        0.9.0
      • multiply

        public Column multiply​(Column other)
        Multiply
        Parameters:
        other - The column being multiplied
        Returns:
        The result column object
        Since:
        0.9.0
      • divide

        public Column divide​(Column other)
        Divide
        Parameters:
        other - The column being divided
        Returns:
        The result column object
        Since:
        0.9.0
      • mod

        public Column mod​(Column other)
        Remainder
        Parameters:
        other - The column being calculated
        Returns:
        The result column object
        Since:
        0.9.0
      • cast

        public Column cast​(DataType to)
        Casts the values in the Column to the specified data type.
        Parameters:
        to - The target data type
        Returns:
        The result column object
        Since:
        0.9.0
      • desc

        public Column desc()
        Sorts this column in descending order.
        Returns:
        The result column object
        Since:
        0.9.0
      • desc_nulls_first

        public Column desc_nulls_first()
        Sorts this column in descending order, null values sorted before non-null values.
        Returns:
        The result column object
        Since:
        0.9.0
      • desc_nulls_last

        public Column desc_nulls_last()
        Sorts this column in descending order, null values sorted after non-null values.
        Returns:
        The result column object
        Since:
        0.9.0
      • asc

        public Column asc()
        Sorts this column in ascending order.
        Returns:
        The result column object
        Since:
        0.9.0
      • asc_nulls_first

        public Column asc_nulls_first()
        Sorts this column in ascending order, null values sorted before non-null values.
        Returns:
        The result column object
        Since:
        0.9.0
      • asc_nulls_last

        public Column asc_nulls_last()
        Sorts this column in ascending order, null values sorted after non-null values.
        Returns:
        The result column object
        Since:
        0.9.0
      • bitor

        public Column bitor​(Column other)
        Bitwise or.
        Parameters:
        other - The column being calculated
        Returns:
        The result column object
        Since:
        0.9.0
      • bitand

        public Column bitand​(Column other)
        Bitwise and.
        Parameters:
        other - The column being calculated
        Returns:
        The result column object
        Since:
        0.9.0
      • bitxor

        public Column bitxor​(Column other)
        Bitwise xor.
        Parameters:
        other - The column being calculated
        Returns:
        The result column object
        Since:
        0.9.0
      • like

        public Column like​(Column pattern)
        Allows case-sensitive matching of strings based on comparison with a pattern.

        For details, see the Snowflake documentation on regular expressions

        Parameters:
        pattern - A regular expression
        Returns:
        The result column object
        Since:
        0.9.0
      • regexp

        public Column regexp​(Column pattern)
        Returns true if this column matches the specified regular expression.

        For details, see the Snowflake documentation on regular expressions

        Parameters:
        pattern - A regular expression
        Returns:
        The result column object
        Since:
        0.9.0
      • collate

        public Column collate​(String collateSpec)
        Returns a copy of the original Column with the specified 'collateSpec` property, rather than the original collation specification property.

        For details, see the Snowflake documentation on collation specifications

        Parameters:
        collateSpec - The collation specification
        Returns:
        The result column object
        Since:
        0.9.0
      • over

        public Column over()
        Returns a windows frame, based on the specified WindowSpec.
        Returns:
        The result column object
        Since:
        0.9.0
        See Also:
        WindowSpec
      • over

        public Column over​(WindowSpec windowSpec)
        Returns a windows frame, based on the specified WindowSpec.
        Parameters:
        windowSpec - The window frame specification
        Returns:
        The result column object
        Since:
        0.1.0
      • withinGroup

        public Column withinGroup​(Column... cols)
        Returns a Column expression that adds a WITHIN GROUP clause to sort the rows by the specified sequence of columns.

        This method is supported on Column expressions returned by some aggregate functions, including functions.array_agg, LISTAGG(), PERCENTILE_CONT(), and PERCENTILE_DISC().

        For example:

        
         df.groupBy(df.col("col1")).agg(Functions.listagg(df.col("col2"), ",")
              .withinGroup(df.col("col2").asc()))
         
        Parameters:
        cols - A list of Columns
        Returns:
        The result Column
        Since:
        1.1.0
      • in

        public Column in​(Object... values)
        Returns a conditional expression that you can pass to the filter or where method to perform the equivalent of a WHERE ... IN query with a specified list of values.

        The expression evaluates to true if the value in the column is one of the values in a specified sequence.

        For example, the following code returns a DataFrame that contains the rows where the column "a" contains the value 1, 2, or 3. This is equivalent to SELECT * FROM table WHERE a IN (1, 2, 3).

        
         df.filter(df.col("a").in(1, 2, 3))
         
        Parameters:
        values - The value list
        Returns:
        The result Column
        Since:
        1.2.0
      • in

        public Column in​(DataFrame df)
        Returns a conditional expression that you can pass to the filter or where method to perform a WHERE ... IN query with a specified subquery.

        The expression evaluates to true if the value in the column is one of the values in the column of the same name in a specified DataFrame.

        For example, the following code returns a DataFrame that contains the rows where the column "a" of `df2` contains one of the values from column "a" in `df1`. This is equivalent to SELECT * FROM table2 WHERE a IN (SELECT a FROM table1).

        
         DataFrame df1 = session.table(table1);
         DataFrame df2 = session.table(table2);
         df2.filter(df1.col("a").in(df1));
         
        Parameters:
        df - A DataFrame contains target values
        Returns:
        The return column.
        Since:
        1.2.0
      • toString

        public String toString()
        Retrieves a string representation of the expression corresponding to this Column instance.
        Overrides:
        toString in class Object
        Returns:
        A String representing this expression
        Since:
        0.9.0