You are viewing documentation about an older version (1.12.1). View latest version

snowflake.snowpark.Column

class snowflake.snowpark.Column(expr1: Union[str, Expression], expr2: Optional[str] = None)[source]

Bases: object

Represents a column or an expression in a DataFrame.

To access a Column object that refers a column in a DataFrame, you can:

  • Use the column name.

  • Use the functions.col() function.

  • Use the DataFrame.col() method.

  • Use the index operator [] on a dataframe object with a column name.

  • Use the dot operator . on a dataframe object with a column name.

>>> from snowflake.snowpark.functions import col
>>> df = session.create_dataframe([["John", 1], ["Mike", 11]], schema=["name", "age"])
>>> df.select("name").collect()
[Row(NAME='John'), Row(NAME='Mike')]
>>> df.select(col("name")).collect()
[Row(NAME='John'), Row(NAME='Mike')]
>>> df.select(df.col("name")).collect()
[Row(NAME='John'), Row(NAME='Mike')]
>>> df.select(df["name"]).collect()
[Row(NAME='John'), Row(NAME='Mike')]
>>> df.select(df.name).collect()
[Row(NAME='John'), Row(NAME='Mike')]
Copy

Snowflake object identifiers, including column names, may or may not be case sensitive depending on a set of rules. Refer to Snowflake Object Identifier Requirements for details. When you use column names with a DataFrame, you should follow these rules.

The returned column names after a DataFrame is evaluated follow the case-sensitivity rules too. The above df was created with column name “name” while the returned column name after collect() was called became “NAME”. It’s because the column is regarded as ignore-case so the Snowflake database returns the upper case.

To create a Column object that represents a constant value, use snowflake.snowpark.functions.lit():

>>> from snowflake.snowpark.functions import lit
>>> df.select(col("name"), lit("const value").alias("literal_column")).collect()
[Row(NAME='John', LITERAL_COLUMN='const value'), Row(NAME='Mike', LITERAL_COLUMN='const value')]
Copy

This class also defines utility functions for constructing expressions with Columns. Column objects can be built with the operators, summarized by operator precedence, in the following table:

Operator

Description

x[index]

Index operator to get an item out of a Snowflake ARRAY or OBJECT

**

Power

-x, ~x

Unary minus, unary not

*, /, %

Multiply, divide, remainder

+, -

Plus, minus

&

And

|

Or

==, !=, <, <=, >, >=

Equal to, not equal to, less than, less than or equal to, greater than, greater than or equal to

The following examples demonstrate how to use Column objects in expressions:

>>> df = session.create_dataframe([[20, 5], [1, 2]], schema=["a", "b"])
>>> df.filter((col("a") == 20) | (col("b") <= 10)).collect()  # use parentheses before and after the | operator.
[Row(A=20, B=5), Row(A=1, B=2)]
>>> df.filter((df["a"] + df.b) < 10).collect()
[Row(A=1, B=2)]
>>> df.select((col("b") * 10).alias("c")).collect()
[Row(C=50), Row(C=20)]
Copy

When you use |, &, and ~ as logical operators on columns, you must always enclose column expressions with parentheses as illustrated in the above example, because their order precedence is higher than ==, <, etc.

Do not use and, or, and not logical operators on column objects, for instance, (df.col1 > 1) and (df.col2 > 2) is wrong. The reason is Python doesn’t have a magic method, or dunder method for them. It will raise an error and tell you to use |, & or ~, for which Python has magic methods. A side effect is if column: will raise an error because it has a hidden call to bool(a_column), like using the and operator. Use if a_column is None: instead.

To access elements of a semi-structured Object and Array, use [] on a Column object:

>>> from snowflake.snowpark.types import StringType, IntegerType
>>> df_with_semi_data = session.create_dataframe([[{"k1": "v1", "k2": "v2"}, ["a0", 1, "a2"]]], schema=["object_column", "array_column"])
>>> df_with_semi_data.select(df_with_semi_data["object_column"]["k1"].alias("k1_value"), df_with_semi_data["array_column"][0].alias("a0_value"), df_with_semi_data["array_column"][1].alias("a1_value")).collect()
[Row(K1_VALUE='"v1"', A0_VALUE='"a0"', A1_VALUE='1')]
>>> # The above two returned string columns have JSON literal values because children of semi-structured data are semi-structured.
>>> # The next line converts JSON literal to a string
>>> df_with_semi_data.select(df_with_semi_data["object_column"]["k1"].cast(StringType()).alias("k1_value"), df_with_semi_data["array_column"][0].cast(StringType()).alias("a0_value"), df_with_semi_data["array_column"][1].cast(IntegerType()).alias("a1_value")).collect()
[Row(K1_VALUE='v1', A0_VALUE='a0', A1_VALUE=1)]
Copy

This class has methods for the most frequently used column transformations and operators. Module snowflake.snowpark.functions defines many functions to transform columns.

Methods

alias(alias)

Returns a new renamed Column.

as_(alias)

Returns a new renamed Column.

asc()

Returns a Column expression with values sorted in ascending order.

asc_nulls_first()

Returns a Column expression with values sorted in ascending order (null values sorted before non-null values).

asc_nulls_last()

Returns a Column expression with values sorted in ascending order (null values sorted after non-null values).

astype(to)

Casts the value of the Column to the specified data type.

between(lower_bound, upper_bound)

Between lower bound and upper bound.

bitand(other)

Bitwise and.

bitor(other)

Bitwise or.

bitwiseAnd(other)

Bitwise and.

bitwiseOR(other)

Bitwise or.

bitwiseXOR(other)

Bitwise xor.

bitxor(other)

Bitwise xor.

cast(to)

Casts the value of the Column to the specified data type.

collate(collation_spec)

Returns a copy of the original Column with the specified collation_spec property, rather than the original collation specification property.

contains(string)

Returns true if the column contains string for each row.

desc()

Returns a Column expression with values sorted in descending order.

desc_nulls_first()

Returns a Column expression with values sorted in descending order (null values sorted before non-null values).

desc_nulls_last()

Returns a Column expression with values sorted in descending order (null values sorted after non-null values).

endswith(other)

Returns true if this Column ends with another string.

eqNullSafe(other)

Equal to.

equal_nan()

Is NaN.

equal_null(other)

Equal to.

getField(field)

Accesses an element of ARRAY column by ordinal position, or an element of OBJECT column by key.

getItem(field)

Accesses an element of ARRAY column by ordinal position, or an element of OBJECT column by key.

getName()

Returns the column name (if the column has a name).

get_name()

Returns the column name (if the column has a name).

in_(*vals)

Returns a conditional expression that you can pass to the DataFrame.filter() or where DataFrame.where() to perform the equivalent of a WHERE .

isNotNull()

Is not null.

isNull()

Is null.

is_not_null()

Is not null.

is_null()

Is null.

isin(*vals)

Returns a conditional expression that you can pass to the DataFrame.filter() or where DataFrame.where() to perform the equivalent of a WHERE .

like(pattern)

Allows case-sensitive matching of strings based on comparison with a pattern.

name(alias)

Returns a new renamed Column.

over([window])

Returns a window frame, based on the specified WindowSpec.

regexp(pattern)

Returns true if this Column matches the specified regular expression.

rlike(pattern)

Returns true if this Column matches the specified regular expression.

startswith(other)

Returns true if this Column starts with another string.

substr(start_pos, length)

Returns a substring of this string column.

substring(start_pos, length)

Returns a substring of this string column.

try_cast(to)

Tries to cast the value of the Column to the specified data type.

within_group(*cols)

Returns a Column expression that adds a WITHIN GROUP clause to sort the rows by the specified columns.