snowflake.snowpark.Column¶
- class snowflake.snowpark.Column(expr1: Union[str, Expression], expr2: Optional[str] = None)[source]¶
Bases:
object
Represents a column or an expression in a
DataFrame
.To access a Column object that refers a column in a
DataFrame
, you can:Use the column name.
Use the
functions.col()
function.Use the
DataFrame.col()
method.Use the index operator
[]
on a dataframe object with a column name.Use the dot operator
.
on a dataframe object with a column name.
>>> from snowflake.snowpark.functions import col >>> df = session.create_dataframe([["John", 1], ["Mike", 11]], schema=["name", "age"]) >>> df.select("name").collect() [Row(NAME='John'), Row(NAME='Mike')] >>> df.select(col("name")).collect() [Row(NAME='John'), Row(NAME='Mike')] >>> df.select(df.col("name")).collect() [Row(NAME='John'), Row(NAME='Mike')] >>> df.select(df["name"]).collect() [Row(NAME='John'), Row(NAME='Mike')] >>> df.select(df.name).collect() [Row(NAME='John'), Row(NAME='Mike')]
Snowflake object identifiers, including column names, may or may not be case sensitive depending on a set of rules. Refer to Snowflake Object Identifier Requirements for details. When you use column names with a DataFrame, you should follow these rules.
The returned column names after a DataFrame is evaluated follow the case-sensitivity rules too. The above
df
was created with column name “name” while the returned column name aftercollect()
was called became “NAME”. It’s because the column is regarded as ignore-case so the Snowflake database returns the upper case.To create a Column object that represents a constant value, use
snowflake.snowpark.functions.lit()
:>>> from snowflake.snowpark.functions import lit >>> df.select(col("name"), lit("const value").alias("literal_column")).collect() [Row(NAME='John', LITERAL_COLUMN='const value'), Row(NAME='Mike', LITERAL_COLUMN='const value')]
This class also defines utility functions for constructing expressions with Columns. Column objects can be built with the operators, summarized by operator precedence, in the following table:
Operator
Description
x[index]
Index operator to get an item out of a Snowflake ARRAY or OBJECT
**
Power
-x
,~x
Unary minus, unary not
*
,/
,%
Multiply, divide, remainder
+
,-
Plus, minus
&
And
|
Or
==
,!=
,<
,<=
,>
,>=
Equal to, not equal to, less than, less than or equal to, greater than, greater than or equal to
The following examples demonstrate how to use Column objects in expressions:
>>> df = session.create_dataframe([[20, 5], [1, 2]], schema=["a", "b"]) >>> df.filter((col("a") == 20) | (col("b") <= 10)).collect() # use parentheses before and after the | operator. [Row(A=20, B=5), Row(A=1, B=2)] >>> df.filter((df["a"] + df.b) < 10).collect() [Row(A=1, B=2)] >>> df.select((col("b") * 10).alias("c")).collect() [Row(C=50), Row(C=20)]
When you use
|
,&
, and~
as logical operators on columns, you must always enclose column expressions with parentheses as illustrated in the above example, because their order precedence is higher than==
,<
, etc.Do not use
and
,or
, andnot
logical operators on column objects, for instance,(df.col1 > 1) and (df.col2 > 2)
is wrong. The reason is Python doesn’t have a magic method, or dunder method for them. It will raise an error and tell you to use|
,&
or~
, for which Python has magic methods. A side effect isif column:
will raise an error because it has a hidden call tobool(a_column)
, like using theand
operator. Useif a_column is None:
instead.To access elements of a semi-structured Object and Array, use
[]
on a Column object:>>> from snowflake.snowpark.types import StringType, IntegerType >>> df_with_semi_data = session.create_dataframe([[{"k1": "v1", "k2": "v2"}, ["a0", 1, "a2"]]], schema=["object_column", "array_column"]) >>> df_with_semi_data.select(df_with_semi_data["object_column"]["k1"].alias("k1_value"), df_with_semi_data["array_column"][0].alias("a0_value"), df_with_semi_data["array_column"][1].alias("a1_value")).collect() [Row(K1_VALUE='"v1"', A0_VALUE='"a0"', A1_VALUE='1')] >>> # The above two returned string columns have JSON literal values because children of semi-structured data are semi-structured. >>> # The next line converts JSON literal to a string >>> df_with_semi_data.select(df_with_semi_data["object_column"]["k1"].cast(StringType()).alias("k1_value"), df_with_semi_data["array_column"][0].cast(StringType()).alias("a0_value"), df_with_semi_data["array_column"][1].cast(IntegerType()).alias("a1_value")).collect() [Row(K1_VALUE='v1', A0_VALUE='a0', A1_VALUE=1)]
This class has methods for the most frequently used column transformations and operators. Module
snowflake.snowpark.functions
defines many functions to transform columns.Methods
alias
(alias)Returns a new renamed Column.
as_
(alias)Returns a new renamed Column.
asc
()Returns a Column expression with values sorted in ascending order.
Returns a Column expression with values sorted in ascending order (null values sorted before non-null values).
Returns a Column expression with values sorted in ascending order (null values sorted after non-null values).
astype
(to)Casts the value of the Column to the specified data type.
between
(lower_bound, upper_bound)Between lower bound and upper bound.
bitand
(other)Bitwise and.
bitor
(other)Bitwise or.
bitwiseAnd
(other)Bitwise and.
bitwiseOR
(other)Bitwise or.
bitwiseXOR
(other)Bitwise xor.
bitxor
(other)Bitwise xor.
cast
(to)Casts the value of the Column to the specified data type.
collate
(collation_spec)Returns a copy of the original
Column
with the specifiedcollation_spec
property, rather than the original collation specification property.contains
(string)Returns true if the column contains string for each row.
desc
()Returns a Column expression with values sorted in descending order.
Returns a Column expression with values sorted in descending order (null values sorted before non-null values).
Returns a Column expression with values sorted in descending order (null values sorted after non-null values).
endswith
(other)Returns true if this Column ends with another string.
eqNullSafe
(other)Equal to.
Is NaN.
equal_null
(other)Equal to.
getField
(field)Accesses an element of ARRAY column by ordinal position, or an element of OBJECT column by key.
getItem
(field)Accesses an element of ARRAY column by ordinal position, or an element of OBJECT column by key.
getName
()Returns the column name (if the column has a name).
get_name
()Returns the column name (if the column has a name).
in_
(*vals)Returns a conditional expression that you can pass to the
DataFrame.filter()
or whereDataFrame.where()
to perform the equivalent of a WHERE .Is not null.
isNull
()Is null.
Is not null.
is_null
()Is null.
isin
(*vals)Returns a conditional expression that you can pass to the
DataFrame.filter()
or whereDataFrame.where()
to perform the equivalent of a WHERE .like
(pattern)Allows case-sensitive matching of strings based on comparison with a pattern.
name
(alias[, variant_is_as])Returns a new renamed Column.
over
([window])Returns a window frame, based on the specified
WindowSpec
.regexp
(pattern[, parameters])Returns true if this Column matches the specified regular expression.
rlike
(pattern[, parameters])Returns true if this Column matches the specified regular expression.
startswith
(other)Returns true if this Column starts with another string.
substr
(start_pos, length)Returns a substring of this string column.
substring
(start_pos, length)Returns a substring of this string column.
try_cast
(to)Tries to cast the value of the Column to the specified data type.
within_group
(*cols)Returns a Column expression that adds a WITHIN GROUP clause to sort the rows by the specified columns.