snowflake.snowpark.functions.regexp_substr_all¶
- snowflake.snowpark.functions.regexp_substr_all(subject: Union[snowflake.snowpark.column.Column, str], pattern: Union[snowflake.snowpark.column.Column, str], position: Union[snowflake.snowpark.column.Column, str] = None, occurrence: Union[snowflake.snowpark.column.Column, str] = None, regex_parameters: Union[snowflake.snowpark.column.Column, str] = None, group_num: Union[snowflake.snowpark.column.Column, str] = None) Column[source]¶
Returns all substrings that match a regular expression within a string.
- Parameters:
subject (ColumnOrName) – The string to search for matches.
pattern (ColumnOrName) – The regular expression pattern to match.
position (ColumnOrName, optional) – The position in the string to start searching from (1-based). Defaults to 1.
occurrence (ColumnOrName, optional) – Which occurrence of the pattern to return.
regex_parameters (ColumnOrName, optional) – String of one or more characters that specifies the parameters for the regular expression. Default is ‘c’ (case-sensitive).
values (Supported) –
c: Case-sensitive matching
i: Case-insensitive matching
m: Multi-line mode
e: Extract submatches
s: Single-line mode (POSIX wildcard character . matches n)
group_num (ColumnOrName, optional) – The group number in the regular expression to extract. Defaults to None, which extracts the entire match.
- Returns:
An array containing all matching substrings.
- Return type:
- Examples::
>>> from snowflake.snowpark.functions import col, lit >>> df = session.create_dataframe([['a1_a2a3_a4A5a6']], schema=["subject"]) >>> df.select(regexp_substr_all(col("subject"), lit('a[[:digit:]]')).alias("result")).collect() [Row(RESULT='[\n "a1",\n "a2",\n "a3",\n "a4",\n "a6"\n]')]
>>> df.select(regexp_substr_all(col("subject"), lit('a[[:digit:]]'), lit(2)).alias("result")).collect() [Row(RESULT='[\n "a2",\n "a3",\n "a4",\n "a6"\n]')]
>>> df.select(regexp_substr_all(col("subject"), lit('a[[:digit:]]'), lit(1), lit(3)).alias("result")).collect() [Row(RESULT='[\n "a3",\n "a4",\n "a6"\n]')]
>>> df.select(regexp_substr_all(col("subject"), lit('a[[:digit:]]'), lit(1), lit(1), lit('i')).alias("result")).collect() [Row(RESULT='[\n "a1",\n "a2",\n "a3",\n "a4",\n "A5",\n "a6"\n]')]
>>> df.select(regexp_substr_all(col("subject"), lit('(a)([[:digit:]])'), lit(1), lit(1), lit('ie'), lit(1)).alias("result")).collect() [Row(RESULT='[\n "a",\n "a",\n "a",\n "a",\n "A",\n "a"\n]')]
>>> df.select(regexp_substr_all(col("subject"), lit('b')).alias("result")).collect() [Row(RESULT='[]')]